All of the Reinforcement learning is about finding the best strategy to solve the targeted problem. This strategy is the policy that the agent uses to interact with the environment. All the RL algorithms directly or indirectly are about finding the optimal policy.
These policy gradient methods are methods that involve finding the policy directly.
I will be implementing the following PG algorithms
- Vannila Policy Gradients
- REINFORCE
- Actor Critic algorithms
- Deterministic Policy gradients
- TRPO
- PPO
|Readme.md
|---VPG
|---REINFORCE
|---ACTOR CRITIC
| |---A2C
| |---A3C
| |---SAC
|---DETERMINISTIC POLICY GRADIENTS
| |---DPG
| |---DDPG
| |---D4PG
|---TRPO
|---PPO
Each subfolder is structured as
|Readme.md
|---Main.py
|---Solver.py
|---UTILS.py
|---Running Trained Model.py
|---Trained Model.pt