Metroid-II-RL

Reinforcement learning plays Metroid II. 'Nuff said.

Current state

PyBoy provides "game wrappers" for various games to make AI work easier. I am working on implementing one for Metroid II and will eventually get my code pulled into the project.

Currently, a pixel-based observation approach is going to be used. A tile-based approach would be much more effecinet, and train much faster, however currently

Some incredibly simple training has been done, but its terrible unsurprisingly. This was mostly done to prove that the code environment worked, and some learning could be done.

main.py is going to mostly be used for testing purposes.

train.py is the script used to start a training run.

view.py is the script used to view a training result.

Tangent about timing info

Through testing and some math, roughly 216,000 iterations correspond to an hour of "real life" game time. This was calculated by measuring 1000 iteration's time. The output of the script was as follows. This assumes 1000 iterations

Human Time: 16.650566339492798
Machine time: 0.2655789852142334
Machine is 62.69534589139785 times faster
time per step HUMAN: 0.016650566339492797
time per step FAST: 0.0002655789852142334
One hour of human gameplay = 216208.8620049702

the math for this was simply time / (measured_time / iterations) --> 3600 / (16.65/1000)

TODO

At this point, a pull request has been made for PyBoy, and many changes need to be made to that repository, but before I do that, I want to focus more on the training and AI portion of things. I'll be modifying this code to use Pufferlib and CleanRL soon. I'll also be heavily modifying the environment to give better observations like missiles, health, etc.

Agent Milestones

explore starting area
Stop shooting randomly and "spazzing out"
Get out of starting area relatively quickly
Avoid/kill enemies in starting area
Drop down through first major shaft (requires downward jump shooting)
Find first Metroid
Kill first Metroid

Model

Add frame stacking to observation space (Added LSTM instead)
improve observations of environments to be tiles, rather than pixels
Change action space so agent can hold buttons (SML code has simmilar)
Change to Pufferlib native environment, rather than gym wrapper (?)
Define a baseline test reward function
make observations of environments pixels of screen
Write simple exploration function using game coordinate hashing
Do some kind of extremely bare-bones training to just explore

Custom Environment

Environment is pretty much set. Its plenty good enough to start training

Misc

Containerize the program to make running on other machines easy (either with conda, or docker)

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
metroid_env.py		metroid_env.py
setup.py		setup.py
train.py		train.py
view.py		view.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Metroid-II-RL

Current state

Tangent about timing info

TODO

Agent Milestones

Model

Custom Environment

Misc

About

Releases

Packages

Languages

License

benlagreca02/Metroid-II-RL

Folders and files

Latest commit

History

Repository files navigation

Metroid-II-RL

Current state

Tangent about timing info

TODO

Agent Milestones

Model

Custom Environment

Misc

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages