MIKASA-Base is a unified benchmark for memory-intensive tasks in reinforcement learning. It standardizes various memory-demanding environments into a single platform to systematically evaluate agent memory.
-
Diverse Memory Testing: Covers four fundamental memory types:
- Object Memory
- Spatial Memory
- Sequential Memory
- Memory Capacity
-
Built on the Gymnasium API, providing:
- Consistent and standardized environment interfaces
- Ease of integration with a variety of RL algorithms
- Flexibility for future extensions and customizations
For a detailed description of the tasks, see Tasks description
git clone git@github.com:CognitiveAISystems/MIKASA-Base.git
cd MIKASA-Base
pip install .
import mikasa_base
import gymnasium as gym
# custom task configuration
# env_id = 'MemoryLength-v0'
# env_kwargs = {'memory_length': 10, 'num_bits': 1}
# use predefined task
env_id = 'MemoryLengthHard-v0'
seed = 123
env = gym.make(env_id)
obs, _ = env.reset(seed)
for i in range(101):
action = env.action_space.sample()
obs, reward, terminated, truncated, info = env.step(action)
env.close()
import mikasa_base
import gymnasium as gym
def make_env(env_id, idx, capture_video, run_name, env_kwargs):
def thunk():
if capture_video and idx == 0:
env = gym.make(env_id, render_mode="rgb_array", **env_kwargs)
env = gym.wrappers.RecordVideo(env, f"videos/{run_name}")
else:
env = gym.make(env_id, **env_kwargs)
env = gym.wrappers.RecordEpisodeStatistics(env)
return env
return thunk
num_envs = 8
env_id = 'MemoryLengthHard-v0'
seed = 123
env_kwargs = {}
envs = gym.vector.AsyncVectorEnv(
[make_env(env_id, i, False, 'test', env_kwargs) for i in range(num_envs)],
)
obs, _ = envs.reset(seed)
for i in range(101):
actions = envs.action_space.sample()
obs, reward, terminated, truncated, info = envs.step(actions)
envs.close()
Code for PPO training is adapted from CleanRL
python3 baselines/ppo/ppo_mlp.py \
--env_id='MemoryLength-v0' \
--env_kwargs memory_length 20 num_bits 1 \
--num_envs 128 --total_timesteps 10_000_000 \
--num_steps 21 \
--num_eval_steps 21
python3 baselines/ppo/ppo_lstm.py \
--env_id='MemoryLength-v0' \
--env_kwargs memory_length 20 num_bits 1 \
--num_envs 128 --total_timesteps 10_000_000 \
--num_steps 21 \
--num_eval_steps 21
If you find our work useful, please cite our paper:
@misc{cherepanov2025mikasa,
title={Memory, Benchmark & Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning},
author={Egor Cherepanov and Nikita Kachaev and Alexey K. Kovalev and Aleksandr I. Panov},
year={2025},
eprint={2502.10550},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2502.10550},
}
This repository's code is based on and inspired by the work available in the following projects:
- DeepMind Research
- bsuite
- DTQN
- Endless Memory Gym
- Memory Maze
- MiniGrid
- Numpad Gym
- Memory-RL
- PopGym
- Memup
We would like to express our gratitude to the developers of these projects for providing valuable resources and inspiration.
Environment | Brief description | Memory Task | Observation Space | Action Space |
---|---|---|---|---|
MemoryCards-v0 |
Memorize the positions of revealed cards and correctly match pairs while minimizing incorrect guesses. | Capacity | vector | discrete |
Numpad-v0 |
Memorize the sequence of movements and navigate the rolling ball on a 3×3 grid by following the correct order while avoiding mistakes. | Sequential | image, vector | discrete, continuous |
MemoryLength-v0 |
Memorize the initial context signal and recall it after a given number of steps to take the correct action. | Object | vector | discrete |
Minigrid-Memory-v0 |
Memorize the object in the starting room and use this information to select the correct path at the junction. | Object | image | discrete |
Ballet-v0 |
Memorize the sequence of movements performed by each uniquely colored and shaped dancer, then identify and approach the dancer who executed the given pattern. | Sequential, Object | image | discrete |
Passive-VisualMatch-v0 |
Memorize the target color displayed on the wall during the initial phase. After a brief distractor phase, identify and select the target color among the distractors by stepping on the corresponding ground pad. | Object | image | discrete |
Passive-T-Maze-v0 |
Memorize the goal’s location upon initial observation, navigate through the maze with limited sensory input, and select the correct path at the junction. | Object | vector | discrete |
ViZDoom-two-colors-v0 |
Memorize the color of the briefly appearing pillar (green or red) and collect items of the same color to survive in the acid-filled room. | Object | image | discrete |
MemoryMaze-v0 |
Memorize the locations of objects and the maze structure using visual clues, then navigate efficiently to find objects of a specific color and score points. | Spatial | image | discrete |
MortarMayhem-v0 |
Memorize a sequence of movement commands and execute them in the correct order. | Capacity, Sequential | image | discrete |
MysteryPath-v0 |
Memorize the invisible path and navigate it without stepping off. | Capacity, Spatial | image | discrete |
RepeatFirst-v0 |
Memorize the initial value presented at the first step and recall it correctly after receiving a sequence of random values. | Object | vector | discrete |
RepeatPrevious-v0 |
Memorize the value observed at each step and recall the value from ( k ) steps earlier when required. | Sequential, Object | vector | discrete |
Autoencode-v0 |
Memorize the sequence of cards presented at the beginning and reproduce them in the same order when required. | Sequential | vector | discrete |
CountRecall-v0 |
Memorize unique values encountered and count how many times a specific value has appeared. | Object, Capacity | vector | discrete |
VelocityOnlyCartPole-v0 |
Memorize velocity data over time and integrate it to infer the position of the pole for balance control. | Sequential | vector | continuous |
MultiarmedBandit-v0 |
Memorize the reward probabilities of different slot machines by exploring them and identify the one with the highest expected reward. | Object, Capacity | vector | discrete |
Concentration-v0 |
Memorize the positions of revealed cards and match them with previously seen cards to find all matching pairs. | Capacity | vector | discrete |
Battleship-v0 |
Memorize the coordinates of previous shots and their HIT or MISS feedback to build an internal representation of the board, avoid repeat shots, and strategically target ships for maximum rewards. | Spatial | vector | discrete |
MineSweeper-v0 |
Memorize revealed grid information and use numerical clues to infer safe tiles while avoiding mines. | Spatial | vector | discrete |
LabyrinthExplore-v0 |
Memorize previously visited cells and navigate the maze efficiently to discover new, unexplored areas and maximize rewards. | Spatial | vector | discrete |
LabyrinthEscape-v0 |
Memorize the maze layout while exploring and navigate efficiently to find the exit and receive a reward. | Spatial | vector | discrete |
HigherLower-v0 |
Memorize previously revealed card ranks and predict whether the next card will be higher or lower, updating the reference card after each prediction to maximize rewards. | Object, Sequential | vector | discrete |