Skip to content

CognitiveAISystems/MIKASA-Base

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MIKASA-Base

Unified Benchmark for Memory-Intensive Tasks


Overview

MIKASA-Base is a unified benchmark for memory-intensive tasks in reinforcement learning. It standardizes various memory-demanding environments into a single platform to systematically evaluate agent memory.

Key Features

  • Diverse Memory Testing: Covers four fundamental memory types:

    • Object Memory
    • Spatial Memory
    • Sequential Memory
    • Memory Capacity
  • Built on the Gymnasium API, providing:

    • Consistent and standardized environment interfaces
    • Ease of integration with a variety of RL algorithms
    • Flexibility for future extensions and customizations

List of Tasks

For a detailed description of the tasks, see Tasks description

Quick Start

Installation

git clone git@github.com:CognitiveAISystems/MIKASA-Base.git
cd MIKASA-Base
pip install .

Basic Usage

import mikasa_base
import gymnasium as gym

# custom task configuration
# env_id = 'MemoryLength-v0'
# env_kwargs = {'memory_length': 10, 'num_bits': 1}

# use predefined task
env_id = 'MemoryLengthHard-v0'
seed = 123

env = gym.make(env_id)

obs, _ = env.reset(seed)

for i in range(101):
    action = env.action_space.sample()
    obs, reward, terminated, truncated, info = env.step(action)

env.close()

Vectorize enviroments

import mikasa_base
import gymnasium as gym

def make_env(env_id, idx, capture_video, run_name, env_kwargs):
    def thunk():
        if capture_video and idx == 0:
            env = gym.make(env_id, render_mode="rgb_array", **env_kwargs)
            env = gym.wrappers.RecordVideo(env, f"videos/{run_name}")
        else:
            env = gym.make(env_id, **env_kwargs)
        env = gym.wrappers.RecordEpisodeStatistics(env)
        return env
    return thunk

num_envs = 8
env_id = 'MemoryLengthHard-v0'
seed = 123
env_kwargs = {}

envs = gym.vector.AsyncVectorEnv(
    [make_env(env_id, i, False, 'test', env_kwargs) for i in range(num_envs)],
)

obs, _ = envs.reset(seed)

for i in range(101):
    actions = envs.action_space.sample()
    obs, reward, terminated, truncated, info = envs.step(actions)

envs.close()

Example of Training

Code for PPO training is adapted from CleanRL

PPO with MLP

python3 baselines/ppo/ppo_mlp.py \
    --env_id='MemoryLength-v0' \
    --env_kwargs memory_length 20 num_bits 1 \
    --num_envs 128 --total_timesteps 10_000_000 \
    --num_steps 21 \
    --num_eval_steps 21

PPO with LSTM

python3 baselines/ppo/ppo_lstm.py \
    --env_id='MemoryLength-v0' \
    --env_kwargs memory_length 20 num_bits 1 \
    --num_envs 128 --total_timesteps 10_000_000 \
    --num_steps 21 \
    --num_eval_steps 21

Citation

If you find our work useful, please cite our paper:

@misc{cherepanov2025mikasa,
      title={Memory, Benchmark & Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning}, 
      author={Egor Cherepanov and Nikita Kachaev and Alexey K. Kovalev and Aleksandr I. Panov},
      year={2025},
      eprint={2502.10550},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2502.10550}, 
}

References

This repository's code is based on and inspired by the work available in the following projects:

We would like to express our gratitude to the developers of these projects for providing valuable resources and inspiration.

MIKASA-Base Tasks description

Environment Brief description Memory Task Observation Space Action Space
MemoryCards-v0 Memorize the positions of revealed cards and correctly match pairs while minimizing incorrect guesses. Capacity vector discrete
Numpad-v0 Memorize the sequence of movements and navigate the rolling ball on a 3×3 grid by following the correct order while avoiding mistakes. Sequential image, vector discrete, continuous
MemoryLength-v0 Memorize the initial context signal and recall it after a given number of steps to take the correct action. Object vector discrete
Minigrid-Memory-v0 Memorize the object in the starting room and use this information to select the correct path at the junction. Object image discrete
Ballet-v0 Memorize the sequence of movements performed by each uniquely colored and shaped dancer, then identify and approach the dancer who executed the given pattern. Sequential, Object image discrete
Passive-VisualMatch-v0 Memorize the target color displayed on the wall during the initial phase. After a brief distractor phase, identify and select the target color among the distractors by stepping on the corresponding ground pad. Object image discrete
Passive-T-Maze-v0 Memorize the goal’s location upon initial observation, navigate through the maze with limited sensory input, and select the correct path at the junction. Object vector discrete
ViZDoom-two-colors-v0 Memorize the color of the briefly appearing pillar (green or red) and collect items of the same color to survive in the acid-filled room. Object image discrete
MemoryMaze-v0 Memorize the locations of objects and the maze structure using visual clues, then navigate efficiently to find objects of a specific color and score points. Spatial image discrete
MortarMayhem-v0 Memorize a sequence of movement commands and execute them in the correct order. Capacity, Sequential image discrete
MysteryPath-v0 Memorize the invisible path and navigate it without stepping off. Capacity, Spatial image discrete
RepeatFirst-v0 Memorize the initial value presented at the first step and recall it correctly after receiving a sequence of random values. Object vector discrete
RepeatPrevious-v0 Memorize the value observed at each step and recall the value from ( k ) steps earlier when required. Sequential, Object vector discrete
Autoencode-v0 Memorize the sequence of cards presented at the beginning and reproduce them in the same order when required. Sequential vector discrete
CountRecall-v0 Memorize unique values encountered and count how many times a specific value has appeared. Object, Capacity vector discrete
VelocityOnlyCartPole-v0 Memorize velocity data over time and integrate it to infer the position of the pole for balance control. Sequential vector continuous
MultiarmedBandit-v0 Memorize the reward probabilities of different slot machines by exploring them and identify the one with the highest expected reward. Object, Capacity vector discrete
Concentration-v0 Memorize the positions of revealed cards and match them with previously seen cards to find all matching pairs. Capacity vector discrete
Battleship-v0 Memorize the coordinates of previous shots and their HIT or MISS feedback to build an internal representation of the board, avoid repeat shots, and strategically target ships for maximum rewards. Spatial vector discrete
MineSweeper-v0 Memorize revealed grid information and use numerical clues to infer safe tiles while avoiding mines. Spatial vector discrete
LabyrinthExplore-v0 Memorize previously visited cells and navigate the maze efficiently to discover new, unexplored areas and maximize rewards. Spatial vector discrete
LabyrinthEscape-v0 Memorize the maze layout while exploring and navigate efficiently to find the exit and receive a reward. Spatial vector discrete
HigherLower-v0 Memorize previously revealed card ranks and predict whether the next card will be higher or lower, updating the reference card after each prediction to maximize rewards. Object, Sequential vector discrete