In order to combine the robustness of classical control strategies with the adaptive characteristics of reinforcement learning, this hierarchical control framework was proposed. A neural networks is in the high level (HL). It determines which classical control method to be adopted with observations. In the strategy level (SL), the strategy chosen by the control signal (
Compared to the end-to-end design, this hierarchical design has following advantages:
- Easy to train. For the high level network, the action space is simplified into discrete, highly reduce the exploration space of the agent.
- Interpretability. The lower layer of the framework is not a neural network, but a classical control algorithm, thus the agent's performance can be analysed.
- Multiple timescales: The lower controller runs at the base frequency (such as the simulated world clock), while the upper layer updates the strategy at a slower frequency.
For the specific situation, i.e. robot pursuit evasion, two navigation algorithms are adopted: a dynamic window approach and a potential field approach.
The demo is based on gym and stable_baselines. Prerequisite:
pip install gym==0.14.0
pip install tensorflow==1.14
pip install stable-baselines
Then
python training_robot.py