MIKASA-Base

Unified Benchmark for Memory-Intensive Tasks

Overview

MIKASA-Base is a unified benchmark for memory-intensive tasks in reinforcement learning. It standardizes various memory-demanding environments into a single platform to systematically evaluate agent memory.

Key Features

Diverse Memory Testing: Covers four fundamental memory types:
- Object Memory
- Spatial Memory
- Sequential Memory
- Memory Capacity
Built on the Gymnasium API, providing:
- Consistent and standardized environment interfaces
- Ease of integration with a variety of RL algorithms
- Flexibility for future extensions and customizations

List of Tasks

For a detailed description of the tasks, see Tasks description

Quick Start

Installation

git clone git@github.com:CognitiveAISystems/MIKASA-Base.git
cd MIKASA-Base
pip install .

Basic Usage

import mikasa_base
import gymnasium as gym

# custom task configuration
# env_id = 'MemoryLength-v0'
# env_kwargs = {'memory_length': 10, 'num_bits': 1}

# use predefined task
env_id = 'MemoryLengthHard-v0'
seed = 123

env = gym.make(env_id)

obs, _ = env.reset(seed)

for i in range(101):
    action = env.action_space.sample()
    obs, reward, terminated, truncated, info = env.step(action)

env.close()

Vectorize enviroments

import mikasa_base
import gymnasium as gym

def make_env(env_id, idx, capture_video, run_name, env_kwargs):
    def thunk():
        if capture_video and idx == 0:
            env = gym.make(env_id, render_mode="rgb_array", **env_kwargs)
            env = gym.wrappers.RecordVideo(env, f"videos/{run_name}")
        else:
            env = gym.make(env_id, **env_kwargs)
        env = gym.wrappers.RecordEpisodeStatistics(env)
        return env
    return thunk

num_envs = 8
env_id = 'MemoryLengthHard-v0'
seed = 123
env_kwargs = {}

envs = gym.vector.AsyncVectorEnv(
    [make_env(env_id, i, False, 'test', env_kwargs) for i in range(num_envs)],
)

obs, _ = envs.reset(seed)

for i in range(101):
    actions = envs.action_space.sample()
    obs, reward, terminated, truncated, info = envs.step(actions)

envs.close()

Example of Training

Code for PPO training is adapted from CleanRL

PPO with MLP

python3 baselines/ppo/ppo_mlp.py \
    --env_id='MemoryLength-v0' \
    --env_kwargs memory_length 20 num_bits 1 \
    --num_envs 128 --total_timesteps 10_000_000 \
    --num_steps 21 \
    --num_eval_steps 21

PPO with LSTM

python3 baselines/ppo/ppo_lstm.py \
    --env_id='MemoryLength-v0' \
    --env_kwargs memory_length 20 num_bits 1 \
    --num_envs 128 --total_timesteps 10_000_000 \
    --num_steps 21 \
    --num_eval_steps 21

Citation

If you find our work useful, please cite our paper:

@misc{cherepanov2025mikasa,
      title={Memory, Benchmark & Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning}, 
      author={Egor Cherepanov and Nikita Kachaev and Alexey K. Kovalev and Aleksandr I. Panov},
      year={2025},
      eprint={2502.10550},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2502.10550}, 
}

References

This repository's code is based on and inspired by the work available in the following projects:

We would like to express our gratitude to the developers of these projects for providing valuable resources and inspiration.

MIKASA-Base Tasks description

Environment	Brief description	Memory Task	Observation Space	Action Space
`MemoryCards-v0`	Memorize the positions of revealed cards and correctly match pairs while minimizing incorrect guesses.	Capacity	vector	discrete
`Numpad-v0`	Memorize the sequence of movements and navigate the rolling ball on a 3×3 grid by following the correct order while avoiding mistakes.	Sequential	image, vector	discrete, continuous
`MemoryLength-v0`	Memorize the initial context signal and recall it after a given number of steps to take the correct action.	Object	vector	discrete
`Minigrid-Memory-v0`	Memorize the object in the starting room and use this information to select the correct path at the junction.	Object	image	discrete
`Ballet-v0`	Memorize the sequence of movements performed by each uniquely colored and shaped dancer, then identify and approach the dancer who executed the given pattern.	Sequential, Object	image	discrete
`Passive-VisualMatch-v0`	Memorize the target color displayed on the wall during the initial phase. After a brief distractor phase, identify and select the target color among the distractors by stepping on the corresponding ground pad.	Object	image	discrete
`Passive-T-Maze-v0`	Memorize the goal’s location upon initial observation, navigate through the maze with limited sensory input, and select the correct path at the junction.	Object	vector	discrete
`ViZDoom-two-colors-v0`	Memorize the color of the briefly appearing pillar (green or red) and collect items of the same color to survive in the acid-filled room.	Object	image	discrete
`MemoryMaze-v0`	Memorize the locations of objects and the maze structure using visual clues, then navigate efficiently to find objects of a specific color and score points.	Spatial	image	discrete
`MortarMayhem-v0`	Memorize a sequence of movement commands and execute them in the correct order.	Capacity, Sequential	image	discrete
`MysteryPath-v0`	Memorize the invisible path and navigate it without stepping off.	Capacity, Spatial	image	discrete
`RepeatFirst-v0`	Memorize the initial value presented at the first step and recall it correctly after receiving a sequence of random values.	Object	vector	discrete
`RepeatPrevious-v0`	Memorize the value observed at each step and recall the value from ( k ) steps earlier when required.	Sequential, Object	vector	discrete
`Autoencode-v0`	Memorize the sequence of cards presented at the beginning and reproduce them in the same order when required.	Sequential	vector	discrete
`CountRecall-v0`	Memorize unique values encountered and count how many times a specific value has appeared.	Object, Capacity	vector	discrete
`VelocityOnlyCartPole-v0`	Memorize velocity data over time and integrate it to infer the position of the pole for balance control.	Sequential	vector	continuous
`MultiarmedBandit-v0`	Memorize the reward probabilities of different slot machines by exploring them and identify the one with the highest expected reward.	Object, Capacity	vector	discrete
`Concentration-v0`	Memorize the positions of revealed cards and match them with previously seen cards to find all matching pairs.	Capacity	vector	discrete
`Battleship-v0`	Memorize the coordinates of previous shots and their HIT or MISS feedback to build an internal representation of the board, avoid repeat shots, and strategically target ships for maximum rewards.	Spatial	vector	discrete
`MineSweeper-v0`	Memorize revealed grid information and use numerical clues to infer safe tiles while avoiding mines.	Spatial	vector	discrete
`LabyrinthExplore-v0`	Memorize previously visited cells and navigate the maze efficiently to discover new, unexplored areas and maximize rewards.	Spatial	vector	discrete
`LabyrinthEscape-v0`	Memorize the maze layout while exploring and navigate efficiently to find the exit and receive a reward.	Spatial	vector	discrete
`HigherLower-v0`	Memorize previously revealed card ranks and predict whether the next card will be higher or lower, updating the reference card after each prediction to maximize rewards.	Object, Sequential	vector	discrete

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
baselines/ppo		baselines/ppo
mikasa_base		mikasa_base
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MIKASA-Base

Unified Benchmark for Memory-Intensive Tasks

Overview

Key Features

List of Tasks

Quick Start

Installation

Basic Usage

Vectorize enviroments

Example of Training

PPO with MLP

PPO with LSTM

Citation

References

MIKASA-Base Tasks description

About

Releases

Packages

Contributors 2

Languages

License

CognitiveAISystems/MIKASA-Base

Folders and files

Latest commit

History

Repository files navigation

MIKASA-Base

Unified Benchmark for Memory-Intensive Tasks

Overview

Key Features

List of Tasks

Quick Start

Installation

Basic Usage

Vectorize enviroments

Example of Training

PPO with MLP

PPO with LSTM

Citation

References

MIKASA-Base Tasks description

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages