eQMARL: Entangled Quantum Multi-Agent Reinforcement Learning for Distributed Cooperation over Quantum Channels

This repository is the official implementation of "eQMARL: Entangled Quantum Multi-Agent Reinforcement Learning for Distributed Cooperation over Quantum Channels", published in the Thirteenth International Conference on Learning Representations (ICLR) 2025.

Installation

The codebase is provided as an installable Python package called eqmarl. To install the package via pip, you can run:

# Navigate to `eqmarl` source folder.
$ cd path/to/eqmarl/

# Install `eqmarl` package.
$ python -m pip install .

You can verify the package was successfully install by running:

$ python -c "import importlib.metadata; version=importlib.metadata.version('eqmarl'); print(version)"
1.0.0

Requirements

If instead you just want to install the requirements without the package, you can run:

$ python -m pip install -r requirements.txt -r requirements-dev.txt

Notes on Tensorflow Quantum installation with Anaconda

Installation of this repo can be little finicky because of the requirements for tensorflow-quantum on various systems.

If you are using Anaconda to manage Python on macOS, be aware that the version of Python may have been built using an outdated version of macOS. To check this, you can run:

$ python -c "from distutils import util; print(util.get_platform())"
macosx-10.9-x86_64

Notice that in the above example we see the installation of Python was built against macosx-10.9-x86_64, whereas the wheel for tensorflow-quantum requires macosx-12.1-x86_64 or later.

To circumvent this, you can download the wheel for tensorflow-quantum==0.7.2 from here https://pypi.org/project/tensorflow-quantum/0.7.2/#files and change the name of the filename from tensorflow_quantum-0.7.2-cp39-cp39-macosx_12_1_x86_64.whl to tensorflow_quantum-0.7.2-cp39-cp39-macosx_10_9_x86_64.whl. Once you've done that you can install the wheel via:

# Activate your environment.
$ conda activate myenv

# Install wheel file manually.
$ python -m pip install tensorflow_quantum-0.7.2-cp39-cp39-macosx_10_9_x86_64.whl

Training

To train using the frameworks in the paper, run this command:

$ python ./scripts/experiment_runner.py ./experiments/<experiment_name>.yml

This invokes the experiment_runner.py script, which runs experiments based on YAML configurations. Note that the option -r/--n-train-rounds can be used to train over multiple seed rounds (defaults to 1 round). The experiment configuration for each of the frameworks discussed in the paper is described as a YAML file in the experiments folder.

The full list of experiments is as follows:

Experiment YAML File	Environment	Description
`coingame_maa2c_mdp_eqmarl_noentanglement.yml`	$\texttt{CoinGame-2}$	MDP experiment using $\texttt{eQMARL}$ with $\texttt{None}$ entanglement and $L=5$ VQC layers.
`coingame_maa2c_mdp_eqmarl_phi+.yml`	$\texttt{CoinGame-2}$	MDP experiment using $\texttt{eQMARL}$ with $\Phi^{+}$ entanglement and $L=5$ VQC layers.
`coingame_maa2c_mdp_eqmarl_phi-.yml`	$\texttt{CoinGame-2}$	MDP experiment using $\texttt{eQMARL}$ with $\Phi^{-}$ entanglement and $L=5$ VQC layers.
`coingame_maa2c_mdp_eqmarl_psi+.yml`	$\texttt{CoinGame-2}$	MDP experiment using $\texttt{eQMARL}$ with $\Psi^{+}$ entanglement and $L=5$ VQC layers.
`coingame_maa2c_mdp_eqmarl_psi-.yml`	$\texttt{CoinGame-2}$	MDP experiment using $\texttt{eQMARL}$ with $\Psi^{-}$ entanglement and $L=5$ VQC layers.
`coingame_maa2c_mdp_fctde.yml`	$\texttt{CoinGame-2}$	MDP experiment using $\texttt{fCTDE}$ with $h=12$ hidden units.
`coingame_maa2c_mdp_qfctde.yml`	$\texttt{CoinGame-2}$	MDP experiment using $\texttt{qfCTDE}$ with $L=5$ VQC layers.
`coingame_maa2c_mdp_sctde.yml`	$\texttt{CoinGame-2}$	MDP experiment using $\texttt{sCTDE}$ with $h=12$ hidden units.
`coingame_maa2c_pomdp_eqmarl_noentanglement.yml`	$\texttt{CoinGame-2}$	POMDP experiment using $\texttt{eQMARL}$ with $\texttt{None}$ entanglement and $L=5$ VQC layers.
`coingame_maa2c_pomdp_eqmarl_phi+.yml`	$\texttt{CoinGame-2}$	POMDP experiment using $\texttt{eQMARL}$ with $\Phi^{+}$ entanglement and $L=5$ VQC layers.
`coingame_maa2c_pomdp_eqmarl_phi-.yml`	$\texttt{CoinGame-2}$	POMDP experiment using $\texttt{eQMARL}$ with $\Phi^{-}$ entanglement and $L=5$ VQC layers.
`coingame_maa2c_pomdp_eqmarl_psi+.yml`	$\texttt{CoinGame-2}$	POMDP experiment using $\texttt{eQMARL}$ with $\Psi^{+}$ entanglement and $L=5$ VQC layers.
`coingame_maa2c_pomdp_eqmarl_psi-.yml`	$\texttt{CoinGame-2}$	POMDP experiment using $\texttt{eQMARL}$ with $\Psi^{-}$ entanglement and $L=5$ VQC layers.
`coingame_maa2c_pomdp_fctde.yml`	$\texttt{CoinGame-2}$	POMDP experiment using $\texttt{fCTDE}$ with $h=12$ hidden units.
`coingame_maa2c_pomdp_qfctde.yml`	$\texttt{CoinGame-2}$	POMDP experiment using $\texttt{qfCTDE}$ with $L=5$ VQC layers.
`coingame_maa2c_pomdp_sctde.yml`	$\texttt{CoinGame-2}$	POMDP experiment using $\texttt{sCTDE}$ with $h=12$ hidden units.
`coingame_maa2c_mdp_eqmarl_psi+_L2.yml`	$\texttt{CoinGame-2}$	MDP experiment $\texttt{eQMARL}$ with $\Psi^{+}$ entanglement and $L=2$ VQC layers.
`coingame_maa2c_mdp_eqmarl_psi+_L10.yml`	$\texttt{CoinGame-2}$	MDP experiment $\texttt{eQMARL}$ with $\Psi^{+}$ entanglement and $L=10$ VQC layers.
`coingame_maa2c_mdp_qfctde_L2.yml`	$\texttt{CoinGame-2}$	MDP experiment using $\texttt{qfCTDE}$ with $L=2$ VQC layers.
`coingame_maa2c_mdp_qfctde_L10.yml`	$\texttt{CoinGame-2}$	MDP experiment using $\texttt{qfCTDE}$ with $L=10$ VQC layers.
`coingame_maa2c_mdp_fctde_size3.yml`	$\texttt{CoinGame-2}$	MDP experiment using $\texttt{fCTDE}$ with $h=3$ hidden units.
`coingame_maa2c_mdp_fctde_size6.yml`	$\texttt{CoinGame-2}$	MDP experiment using $\texttt{fCTDE}$ with $h=6$ hidden units.
`coingame_maa2c_mdp_fctde_size24.yml`	$\texttt{CoinGame-2}$	MDP experiment using $\texttt{fCTDE}$ with $h=24$ hidden units.
`coingame_maa2c_mdp_sctde_size3.yml`	$\texttt{CoinGame-2}$	MDP experiment using $\texttt{sCTDE}$ with $h=3$ hidden units.
`coingame_maa2c_mdp_sctde_size6.yml`	$\texttt{CoinGame-2}$	MDP experiment using $\texttt{sCTDE}$ with $h=6$ hidden units.
`coingame_maa2c_mdp_sctde_size24.yml`	$\texttt{CoinGame-2}$	MDP experiment using $\texttt{sCTDE}$ with $h=24$ hidden units.
`coingame_maa2c_pomdp_eqmarl_psi+_L2.yml`	$\texttt{CoinGame-2}$	POMDP experiment $\texttt{eQMARL}$ with $\Psi^{+}$ entanglement and $L=2$ VQC layers.
`coingame_maa2c_pomdp_eqmarl_psi+_L10.yml`	$\texttt{CoinGame-2}$	POMDP experiment $\texttt{eQMARL}$ with $\Psi^{+}$ entanglement and $L=10$ VQC layers.
`coingame_maa2c_pomdp_qfctde_L2.yml`	$\texttt{CoinGame-2}$	POMDP experiment using $\texttt{qfCTDE}$ with $L=2$ VQC layers.
`coingame_maa2c_pomdp_qfctde_L10.yml`	$\texttt{CoinGame-2}$	POMDP experiment using $\texttt{qfCTDE}$ with $L=10$ VQC layers.
`coingame_maa2c_pomdp_fctde_size3.yml`	$\texttt{CoinGame-2}$	POMDP experiment using $\texttt{fCTDE}$ with $h=3$ hidden units.
`coingame_maa2c_pomdp_fctde_size6.yml`	$\texttt{CoinGame-2}$	POMDP experiment using $\texttt{fCTDE}$ with $h=6$ hidden units.
`coingame_maa2c_pomdp_fctde_size24.yml`	$\texttt{CoinGame-2}$	POMDP experiment using $\texttt{fCTDE}$ with $h=24$ hidden units.
`coingame_maa2c_pomdp_sctde_size3.yml`	$\texttt{CoinGame-2}$	POMDP experiment using $\texttt{sCTDE}$ with $h=3$ hidden units.
`coingame_maa2c_pomdp_sctde_size6.yml`	$\texttt{CoinGame-2}$	POMDP experiment using $\texttt{sCTDE}$ with $h=6$ hidden units.
`coingame_maa2c_pomdp_sctde_size24.yml`	$\texttt{CoinGame-2}$	POMDP experiment using $\texttt{sCTDE}$ with $h=24$ hidden units.
`cartpole_maa2c_mdp_eqmarl_psi+.yml`	$\texttt{CartPole}$	MDP experiment using $\texttt{eQMARL}$ with $\Psi^{+}$ entanglement and $L=5$ VQC layers.
`cartpole_maa2c_mdp_fctde.yml`	$\texttt{CartPole}$	MDP experiment using $\texttt{fCTDE}$ with $h=12$ hidden units.
`cartpole_maa2c_mdp_qfctde.yml`	$\texttt{CartPole}$	MDP experiment using $\texttt{qfCTDE}$ with $L=5$ VQC layers.
`cartpole_maa2c_mdp_sctde.yml`	$\texttt{CartPole}$	MDP experiment using $\texttt{sCTDE}$ with $h=12$ hidden units.
`cartpole_maa2c_pomdp_eqmarl_psi+.yml`	$\texttt{CartPole}$	POMDP experiment using $\texttt{eQMARL}$ with $\Psi^{+}$ entanglement and $L=5$ VQC layers.
`cartpole_maa2c_pomdp_fctde.yml`	$\texttt{CartPole}$	POMDP experiment using $\texttt{fCTDE}$ with $h=12$ hidden units.
`cartpole_maa2c_pomdp_qfctde.yml`	$\texttt{CartPole}$	POMDP experiment using $\texttt{qfCTDE}$ with $L=5$ VQC layers.
`cartpole_maa2c_pomdp_sctde.yml`	$\texttt{CartPole}$	POMDP experiment using $\texttt{sCTDE}$ with $h=12$ hidden units.

Results

The actor-critic models trained using the frameworks described in the paper achieved the performance outlined in the sections below. Pre-trained models can be found in the supplementary materials, within a folder called pre_trained_models/, that accompanies this repository.

The training result metrics for all models reported in the paper are listed under the experiment_output folder. Each experiment was conducted over 10 seeds (using the -r 10 option as discussed in the Training section). All figures reported in the paper can be generated using the Jupyter notebook figure_generator.ipynb, which references the figure configurations outlined in the figures folder.

Entanglement Style Comparison

The training results for the comparison of entanglement styles outlined in the paper are given in the table below:

Dynamics	Entanglement	Score: 20	Score: 25	Score: Max (value)
MDP	$\Psi^{+}$	568	2332	2942 (25.67)
MDP	$\Psi^{-}$	595	1987	2849 (25.45)
MDP	$\Phi^{+}$	612	1883	2851 (25.51)
MDP	$\Phi^{-}$	691	2378	2984 (25.23)
MDP	$\mathtt{None}$	839	2337	2495 (25.12)
POMDP	$\Psi^{+}$	1049	1745	2950 (26.28)
POMDP	$\Psi^{-}$	1206	2114	2999 (25.95)
POMDP	$\Phi^{+}$	1269	-	2992 (24.1)
POMDP	$\Phi^{-}$	1838	-	2727 (22.8)
POMDP	$\mathtt{None}$	1069	1955	2841 (26.39)