This repository contains PyTorch code for applying the theory from the Avoiding Side Effects in Complex Environments and Conservative Agency via Attainable Utility Preservation papers by Turner et al. to the OpenAI Procgen Benchmark.
To obtain a learned auxiliary Q-function run
python pretrain.py --model ppo --env_name coinrun --q_aux_path q_aux_dir/coinrun/0.pt
To train PPO agent run
python main.py --model ppo --env_name coinrun --test True
To train PPO agent with AUP run
python main.py --model ppo_aup --env_name coinrun --q_aux_dir q_aux_dir/coinrun/ --test True
- Python 3
- PyTorch
- OpenAI baselines
In order to install requirements, follow:
# Baselines
git clone /~https://github.com/openai/baselines.git
cd baselines
pip install -e .
# Other requirements
pip install -r requirements.txt