Nitin Kanwar

ORCID Identifier(s)


Graduation Semester and Year




Document Type


Degree Name

Master of Science in Computer Science


Computer Science and Engineering

First Advisor

Jia Rao


Machine Learning is at the forefront of every field today. The subfields of Machine Learning called Reinforcement Learning and Deep Learning, when combined have given rise to advanced algorithms which have been successful at reaching or surpassing the human-level performance at playing Atari games to defeating multiple times champion at Go. These successes of Machine Learning have attracted the interest of the financial community and have raised the question if these techniques could also be applied in detecting patterns in the financial markets. Until recently, mathematical formulations of dynamical systems in the context of Signal Processing and Control Theory have attributed to the success of Financial Engineering. But because of Reinforcement Learning, there has been improved sequential decision making leading to the development of multistage stochastic optimization, a key component in sequential portfolio optimization (asset allocation) strategies. In this thesis, we explore how to optimally distribute a fixed set of stock assets from a given set of stocks in a portfolio to maximize the long term wealth of the Deep Learning trading agent using Reinforcement Learning. We treat the problem as context-independent, meaning the learning agent directly interacts with the environment, thus allowing us to apply model free Reinforcement Learning algorithms to get optimized results. In particular, we focus on Policy Gradient and Actor Critic Methods, a class of state-of-the-art techniques which constructs an estimate of the optimal policy for the control problem by iteratively improving a parametric policy. We perform a comparative analysis of the Reinforcement Learning based portfolio optimization strategy vs the more traditional “Follow the Winner”, “Follow the Loser”, and "Uniformly Balanced" strategies, and find that Reinforcement Learning based agents either far out perform all the other strategies, or behave as good as the best of them. The analysis provides conclusive support for the ability of model-free Policy Gradient based Reinforcement Learning methods to act as universal trading agents.


Reinforcement learning, Machine learning, Deep learning, Finance, Quantitative finance, Deep reinforcement learning, Portfolio management, Portfolio optimization


Computer Sciences | Physical Sciences and Mathematics


Degree granted by The University of Texas at Arlington