This project is the implemetation of the Reinforcement Learning based Online PID Tuner. The Tuner is based on A2C. I trained the RL tuner and tested on Lunarlander, one of OpenAi gym env..
Init (P,I,D) of the environment
Init the policy π
for episode = 0, M do
Inint state
Set done = False
Reset the environment
while not done do
action = π(state)
next_state, reward, done = step(action)
Train π
state = next_state
end while
end for
Using Simple PID control example to build PID environment.
- MDP
- state (5,) : Set Point, feedback, error, I-term, P
- action (1,) : P
- reward (1,) : if abs(error) in a certain range, give 1. Or, give -1
Please check here - Experiment Report (Korean)
- Before training
- After training
- Training plot
Test PID control with auto tuner in Lunarlander-v2
It do not need any tuning process.
- Render
- Error Plot
Orange line represents set-points, and blue line represents feedbacks. (left) Angular controller. (Right) Vertical controller.
cd ./A2C/
python a2c_main.py
cd ./envs/
python ./LunarLanderContinuous_keyboard_agent_tuner_applied.py
tensorflow==2.5.0
scikit-learn==0.23.2
matplotlib==3.8.3
gym