Skip to content
/ eqmarl Public

This is the repository for the paper "eQMARL: Entangled Quantum Multi-Agent Reinforcement Learning for Distributed Cooperation over Quantum Channels" published in ICLR 2025

License

Notifications You must be signed in to change notification settings

news-vt/eqmarl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

eQMARL: Entangled Quantum Multi-Agent Reinforcement Learning for Distributed Cooperation over Quantum Channels

arXiv OpenReview

License: CC-BY-4.0

python tensorflow

This repository is the official implementation of "eQMARL: Entangled Quantum Multi-Agent Reinforcement Learning for Distributed Cooperation over Quantum Channels", published in the Thirteenth International Conference on Learning Representations (ICLR) 2025.

Installation

The codebase is provided as an installable Python package called eqmarl. To install the package via pip, you can run:

# Navigate to `eqmarl` source folder.
$ cd path/to/eqmarl/

# Install `eqmarl` package.
$ python -m pip install .

You can verify the package was successfully install by running:

$ python -c "import importlib.metadata; version=importlib.metadata.version('eqmarl'); print(version)"
1.0.0

Requirements

If instead you just want to install the requirements without the package, you can run:

$ python -m pip install -r requirements.txt -r requirements-dev.txt

Notes on Tensorflow Quantum installation with Anaconda

Installation of this repo can be little finicky because of the requirements for tensorflow-quantum on various systems.

If you are using Anaconda to manage Python on macOS, be aware that the version of Python may have been built using an outdated version of macOS. To check this, you can run:

$ python -c "from distutils import util; print(util.get_platform())"
macosx-10.9-x86_64

Notice that in the above example we see the installation of Python was built against macosx-10.9-x86_64, whereas the wheel for tensorflow-quantum requires macosx-12.1-x86_64 or later.

To circumvent this, you can download the wheel for tensorflow-quantum==0.7.2 from here https://pypi.org/project/tensorflow-quantum/0.7.2/#files and change the name of the filename from tensorflow_quantum-0.7.2-cp39-cp39-macosx_12_1_x86_64.whl to tensorflow_quantum-0.7.2-cp39-cp39-macosx_10_9_x86_64.whl. Once you've done that you can install the wheel via:

# Activate your environment.
$ conda activate myenv

# Install wheel file manually.
$ python -m pip install tensorflow_quantum-0.7.2-cp39-cp39-macosx_10_9_x86_64.whl

Training

To train using the frameworks in the paper, run this command:

$ python ./scripts/experiment_runner.py ./experiments/<experiment_name>.yml

This invokes the experiment_runner.py script, which runs experiments based on YAML configurations. Note that the option -r/--n-train-rounds can be used to train over multiple seed rounds (defaults to 1 round). The experiment configuration for each of the frameworks discussed in the paper is described as a YAML file in the experiments folder.

The full list of experiments is as follows:

Experiment YAML File Environment Description
coingame_maa2c_mdp_eqmarl_noentanglement.yml $\texttt{CoinGame-2}$ MDP experiment using $\texttt{eQMARL}$ with $\texttt{None}$ entanglement and $L=5$ VQC layers.
coingame_maa2c_mdp_eqmarl_phi+.yml $\texttt{CoinGame-2}$ MDP experiment using $\texttt{eQMARL}$ with $\Phi^{+}$ entanglement and $L=5$ VQC layers.
coingame_maa2c_mdp_eqmarl_phi-.yml $\texttt{CoinGame-2}$ MDP experiment using $\texttt{eQMARL}$ with $\Phi^{-}$ entanglement and $L=5$ VQC layers.
coingame_maa2c_mdp_eqmarl_psi+.yml $\texttt{CoinGame-2}$ MDP experiment using $\texttt{eQMARL}$ with $\Psi^{+}$ entanglement and $L=5$ VQC layers.
coingame_maa2c_mdp_eqmarl_psi-.yml $\texttt{CoinGame-2}$ MDP experiment using $\texttt{eQMARL}$ with $\Psi^{-}$ entanglement and $L=5$ VQC layers.
coingame_maa2c_mdp_fctde.yml $\texttt{CoinGame-2}$ MDP experiment using $\texttt{fCTDE}$ with $h=12$ hidden units.
coingame_maa2c_mdp_qfctde.yml $\texttt{CoinGame-2}$ MDP experiment using $\texttt{qfCTDE}$ with $L=5$ VQC layers.
coingame_maa2c_mdp_sctde.yml $\texttt{CoinGame-2}$ MDP experiment using $\texttt{sCTDE}$ with $h=12$ hidden units.
coingame_maa2c_pomdp_eqmarl_noentanglement.yml $\texttt{CoinGame-2}$ POMDP experiment using $\texttt{eQMARL}$ with $\texttt{None}$ entanglement and $L=5$ VQC layers.
coingame_maa2c_pomdp_eqmarl_phi+.yml $\texttt{CoinGame-2}$ POMDP experiment using $\texttt{eQMARL}$ with $\Phi^{+}$ entanglement and $L=5$ VQC layers.
coingame_maa2c_pomdp_eqmarl_phi-.yml $\texttt{CoinGame-2}$ POMDP experiment using $\texttt{eQMARL}$ with $\Phi^{-}$ entanglement and $L=5$ VQC layers.
coingame_maa2c_pomdp_eqmarl_psi+.yml $\texttt{CoinGame-2}$ POMDP experiment using $\texttt{eQMARL}$ with $\Psi^{+}$ entanglement and $L=5$ VQC layers.
coingame_maa2c_pomdp_eqmarl_psi-.yml $\texttt{CoinGame-2}$ POMDP experiment using $\texttt{eQMARL}$ with $\Psi^{-}$ entanglement and $L=5$ VQC layers.
coingame_maa2c_pomdp_fctde.yml $\texttt{CoinGame-2}$ POMDP experiment using $\texttt{fCTDE}$ with $h=12$ hidden units.
coingame_maa2c_pomdp_qfctde.yml $\texttt{CoinGame-2}$ POMDP experiment using $\texttt{qfCTDE}$ with $L=5$ VQC layers.
coingame_maa2c_pomdp_sctde.yml $\texttt{CoinGame-2}$ POMDP experiment using $\texttt{sCTDE}$ with $h=12$ hidden units.
coingame_maa2c_mdp_eqmarl_psi+_L2.yml $\texttt{CoinGame-2}$ MDP experiment $\texttt{eQMARL}$ with $\Psi^{+}$ entanglement and $L=2$ VQC layers.
coingame_maa2c_mdp_eqmarl_psi+_L10.yml $\texttt{CoinGame-2}$ MDP experiment $\texttt{eQMARL}$ with $\Psi^{+}$ entanglement and $L=10$ VQC layers.
coingame_maa2c_mdp_qfctde_L2.yml $\texttt{CoinGame-2}$ MDP experiment using $\texttt{qfCTDE}$ with $L=2$ VQC layers.
coingame_maa2c_mdp_qfctde_L10.yml $\texttt{CoinGame-2}$ MDP experiment using $\texttt{qfCTDE}$ with $L=10$ VQC layers.
coingame_maa2c_mdp_fctde_size3.yml $\texttt{CoinGame-2}$ MDP experiment using $\texttt{fCTDE}$ with $h=3$ hidden units.
coingame_maa2c_mdp_fctde_size6.yml $\texttt{CoinGame-2}$ MDP experiment using $\texttt{fCTDE}$ with $h=6$ hidden units.
coingame_maa2c_mdp_fctde_size24.yml $\texttt{CoinGame-2}$ MDP experiment using $\texttt{fCTDE}$ with $h=24$ hidden units.
coingame_maa2c_mdp_sctde_size3.yml $\texttt{CoinGame-2}$ MDP experiment using $\texttt{sCTDE}$ with $h=3$ hidden units.
coingame_maa2c_mdp_sctde_size6.yml $\texttt{CoinGame-2}$ MDP experiment using $\texttt{sCTDE}$ with $h=6$ hidden units.
coingame_maa2c_mdp_sctde_size24.yml $\texttt{CoinGame-2}$ MDP experiment using $\texttt{sCTDE}$ with $h=24$ hidden units.
coingame_maa2c_pomdp_eqmarl_psi+_L2.yml $\texttt{CoinGame-2}$ POMDP experiment $\texttt{eQMARL}$ with $\Psi^{+}$ entanglement and $L=2$ VQC layers.
coingame_maa2c_pomdp_eqmarl_psi+_L10.yml $\texttt{CoinGame-2}$ POMDP experiment $\texttt{eQMARL}$ with $\Psi^{+}$ entanglement and $L=10$ VQC layers.
coingame_maa2c_pomdp_qfctde_L2.yml $\texttt{CoinGame-2}$ POMDP experiment using $\texttt{qfCTDE}$ with $L=2$ VQC layers.
coingame_maa2c_pomdp_qfctde_L10.yml $\texttt{CoinGame-2}$ POMDP experiment using $\texttt{qfCTDE}$ with $L=10$ VQC layers.
coingame_maa2c_pomdp_fctde_size3.yml $\texttt{CoinGame-2}$ POMDP experiment using $\texttt{fCTDE}$ with $h=3$ hidden units.
coingame_maa2c_pomdp_fctde_size6.yml $\texttt{CoinGame-2}$ POMDP experiment using $\texttt{fCTDE}$ with $h=6$ hidden units.
coingame_maa2c_pomdp_fctde_size24.yml $\texttt{CoinGame-2}$ POMDP experiment using $\texttt{fCTDE}$ with $h=24$ hidden units.
coingame_maa2c_pomdp_sctde_size3.yml $\texttt{CoinGame-2}$ POMDP experiment using $\texttt{sCTDE}$ with $h=3$ hidden units.
coingame_maa2c_pomdp_sctde_size6.yml $\texttt{CoinGame-2}$ POMDP experiment using $\texttt{sCTDE}$ with $h=6$ hidden units.
coingame_maa2c_pomdp_sctde_size24.yml $\texttt{CoinGame-2}$ POMDP experiment using $\texttt{sCTDE}$ with $h=24$ hidden units.
cartpole_maa2c_mdp_eqmarl_psi+.yml $\texttt{CartPole}$ MDP experiment using $\texttt{eQMARL}$ with $\Psi^{+}$ entanglement and $L=5$ VQC layers.
cartpole_maa2c_mdp_fctde.yml $\texttt{CartPole}$ MDP experiment using $\texttt{fCTDE}$ with $h=12$ hidden units.
cartpole_maa2c_mdp_qfctde.yml $\texttt{CartPole}$ MDP experiment using $\texttt{qfCTDE}$ with $L=5$ VQC layers.
cartpole_maa2c_mdp_sctde.yml $\texttt{CartPole}$ MDP experiment using $\texttt{sCTDE}$ with $h=12$ hidden units.
cartpole_maa2c_pomdp_eqmarl_psi+.yml $\texttt{CartPole}$ POMDP experiment using $\texttt{eQMARL}$ with $\Psi^{+}$ entanglement and $L=5$ VQC layers.
cartpole_maa2c_pomdp_fctde.yml $\texttt{CartPole}$ POMDP experiment using $\texttt{fCTDE}$ with $h=12$ hidden units.
cartpole_maa2c_pomdp_qfctde.yml $\texttt{CartPole}$ POMDP experiment using $\texttt{qfCTDE}$ with $L=5$ VQC layers.
cartpole_maa2c_pomdp_sctde.yml $\texttt{CartPole}$ POMDP experiment using $\texttt{sCTDE}$ with $h=12$ hidden units.

Results

The actor-critic models trained using the frameworks described in the paper achieved the performance outlined in the sections below. Pre-trained models can be found in the supplementary materials, within a folder called pre_trained_models/, that accompanies this repository.

The training result metrics for all models reported in the paper are listed under the experiment_output folder. Each experiment was conducted over 10 seeds (using the -r 10 option as discussed in the Training section). All figures reported in the paper can be generated using the Jupyter notebook figure_generator.ipynb, which references the figure configurations outlined in the figures folder.

Entanglement Style Comparison

The training results for the comparison of entanglement styles outlined in the paper are given in the table below:

Dynamics Entanglement Score: 20 Score: 25 Score: Max (value)
MDP $\Psi^{+}$ 568 2332 2942 (25.67)
MDP $\Psi^{-}$ 595 1987 2849 (25.45)
MDP $\Phi^{+}$ 612 1883 2851 (25.51)
MDP $\Phi^{-}$ 691 2378 2984 (25.23)
MDP $\mathtt{None}$ 839 2337 2495 (25.12)
POMDP $\Psi^{+}$ 1049 1745 2950 (26.28)
POMDP $\Psi^{-}$ 1206 2114 2999 (25.95)
POMDP $\Phi^{+}$ 1269 - 2992 (24.1)
POMDP $\Phi^{-}$ 1838 - 2727 (22.8)
POMDP $\mathtt{None}$ 1069 1955 2841 (26.39)

The figures that aggregate the metric performance for each of the experiments are given in the table below:

Figure Dynamics Metric
fig_maa2c_mdp_entanglement_compare-undiscounted_reward.pdf MDP Score
fig_maa2c_mdp_entanglement_compare-coins_collected.pdf MDP Total coins collected
fig_maa2c_mdp_entanglement_compare-own_coin_rate.pdf MDP Own coin rate
fig_maa2c_mdp_entanglement_compare-own_coins_collected.pdf MDP Own coins collected
fig_maa2c_pomdp_entanglement_compare-undiscounted_reward.pdf POMDP Score
fig_maa2c_pomdp_entanglement_compare-coins_collected.pdf POMDP Total coins collected
fig_maa2c_pomdp_entanglement_compare-own_coin_rate.pdf POMDP Own coin rate
fig_maa2c_pomdp_entanglement_compare-own_coins_collected.pdf POMDP Own coins collected

CoinGame experiments

The training results for the comparison of the frameworks in the $\texttt{CoinGame-2}$ environment outlined in the paper are given in the table below:

Dynamics Framework Score: 20 Score: 25 Score: Max (value) Own coin rate: 0.95 Own coin rate: 1.0 Own coin rate: Max (value)
MDP $\texttt{eQMARL-}\Psi^{+}$ 568 2332 2942 (25.67) 376 2136 2136 (1.0)
MDP $\texttt{qfCTDE}$ 678 - 2378 (23.38) 397 - 2832 (0.9972)
MDP $\texttt{sCTDE}$ 1640 2615 2631 (25.3) 1511 - 2637 (0.9864)
MDP $\texttt{fCTDE}$ 1917 - 2925 (23.67) 1700 - 2909 (0.9857)
POMDP $\texttt{eQMARL-}\Psi^{+}$ 1049 1745 2950 (26.28) 773 - 2533 (0.9997)
POMDP $\texttt{qfCTDE}$ 1382 2124 2871 (26.09) 1038 2887 2887 (1.0)
POMDP $\texttt{sCTDE}$ 1738 2750 2999 (25.33) 1588 - 2956 (0.9894)
POMDP $\texttt{fCTDE}$ 1798 2658 2824 (25.49) 1574 - 2963 (0.9894)

The figures that aggregate the metric performance for each of the experiments are given in the table below:

Figure Dynamics Metric
fig_maa2c_mdp-undiscounted_reward.pdf MDP Score
fig_maa2c_mdp-coins_collected.pdf MDP Total coins collected
fig_maa2c_mdp-own_coin_rate.pdf MDP Own coin rate
fig_maa2c_mdp-own_coins_collected.pdf MDP Own coins collected
fig_maa2c_pomdp-undiscounted_reward.pdf POMDP Score
fig_maa2c_pomdp-coins_collected.pdf POMDP Total coins collected
fig_maa2c_pomdp-own_coin_rate.pdf POMDP Own coin rate
fig_maa2c_pomdp-own_coins_collected.pdf POMDP Own coins collected

CartPole experiments

The training results for the comparison of the frameworks in the $\texttt{CartPole}$ environment outlined in the paper are given in the tables below:

Dynamics Framework Reward: Mean Reward: Std. Dev. Reward: 95% CI
MDP $\texttt{eQMARL-}\Psi^{+}$ 79.11 50.62 (77.40, 81.16)
MDP $\texttt{qfCTDE}$ 121.35 110.13 (118.29, 125.12)
MDP $\texttt{sCTDE}$ 16.38 35.97 (16.29, 16.48)
MDP $\texttt{fCTDE}$ 15.15 24.17 (15.09, 15.22)
POMDP $\texttt{eQMARL-}\Psi^{+}$ 82.28 44.24 (80.60, 83.89)
POMDP $\texttt{qfCTDE}$ 79.03 44.06 (76.80, 80.98)
POMDP $\texttt{sCTDE}$ 40.56 37.36 (38.17, 43.70)
POMDP $\texttt{fCTDE}$ 13.93 29.84 (13.62, 14.19)
Dynamics Framework Reward: Mean (value) Reward: Max (value)
MDP $\texttt{eQMARL-}\Psi^{+}$ 166 (79.11) 555 (134.16)
MDP $\texttt{qfCTDE}$ 189 (121.35) 810 (262.43)
MDP $\texttt{sCTDE}$ 9 (16.38) 931 (23.59)
MDP $\texttt{fCTDE}$ 9 (15.15) 38 (18.55)
POMDP $\texttt{eQMARL-}\Psi^{+}$ 251 (82.28) 770 (127.6)
POMDP $\texttt{qfCTDE}$ 276 (79.03) 648 (137.66)
POMDP $\texttt{sCTDE}$ 680 (40.56) 999 (167.32)
POMDP $\texttt{fCTDE}$ 9 (13.93) 999 (28.66)

The figures that aggregate the metric performance for each of the experiments are given in the table below:

Figure Dynamics Metric
fig_cartpole_maa2c_mdp-reward_mean.pdf MDP Average reward
fig_cartpole_maa2c_pomdp-reward_mean.pdf POMDP Average reward

MiniGrid experiments

The training results for the comparison of the frameworks in the $\texttt{MiniGrid}$ environment outlined in the paper are given in the tables below:

Dynamics Framework Reward: Mean (value) Reward: 95% CI Number of Trainable Critic Parameters
POMDP $\texttt{fCTDE}$ -63.04 (-65.16, -61.06) 29,601
POMDP $\texttt{qfCTDE}$ -85.86 (-87.03, -84.72) 3,697
POMDP $\texttt{sCTDE}$ -88.02 (-88.69, -87.10) 29,801
POMDP $\texttt{eQMARL}-\Psi^+$ -13.32 (-14.68, -11.91) 3,697

The figures that aggregate the metric performance for each of the experiments are given in the table below:

Figure Dynamics Metric
fig_minigrid-reward_mean.pdf POMDP Average reward

Ablation experiments

The training results for the ablation experiment using in the $\texttt{CoinGame-2}$ environment outlined in the paper are given in the tables below:

Dynamics Framework Parameters Score: Mean Score: Std. Dev. Score: 95% CI Own coin rate: Mean Own coin rate: Std. Dev. Own coin rate: 95% CI
MDP $\texttt{fCTDE-3}$ 223 2.42 2.35 (2.35, 2.49) 0.6720 0.2024 (0.6685, 0.6769)
MDP $\texttt{fCTDE-6}$ 445 7.41 3.46 (7.19, 7.65) 0.7658 0.1414 (0.7610, 0.7712)
MDP $\texttt{fCTDE-12}$ 889 12.36 4.41 (12.09, 12.67) 0.8202 0.1379 (0.8139, 0.8262)
MDP $\texttt{fCTDE-24}$ 1777 17.63 2.58 (17.25, 17.91) 0.8823 0.0751 (0.8770, 0.8875)
MDP $\texttt{sCTDE-3}$ 229 3.24 3.09 (3.16, 3.33) 0.6852 0.1991 (0.6821, 0.6897)
MDP $\texttt{sCTDE-6}$ 457 8.54 3.67 (8.29, 8.78) 0.7857 0.1327 (0.7804, 0.7924)
MDP $\texttt{sCTDE-12}$ 913 14.18 2.69 (13.90, 14.60) 0.8504 0.0928 (0.8454, 0.8553)
MDP $\texttt{sCTDE-24}$ 1825 18.18 2.41 (17.84, 18.53) 0.8936 0.0673 (0.8896, 0.8979)
MDP $\texttt{qfCTDE-L2}$ 121 6.58 3.92 (6.47, 6.66) 0.8482 0.1921 (0.8435, 0.8518)
MDP $\texttt{qfCTDE-L5}$ 265 19.41 6.23 (19.23, 19.59) 0.9398 0.1020 (0.9366, 0.9426)
MDP $\texttt{qfCTDE-L10}$ 505 22.08 2.22 (21.91, 22.26) 0.9691 0.0247 (0.9665, 0.9723)
MDP $\texttt{eQMARL-}\Psi^{+}\texttt{-L2}$ 121 5.38 3.74 (5.30, 5.46) 0.8271 0.2213 (0.8234, 0.8300)
MDP $\texttt{eQMARL-}\Psi^{+}\texttt{-L5}$ 265 21.11 2.65 (20.92, 21.35) 0.9640 0.0347 (0.9601, 0.9667)
MDP $\texttt{eQMARL-}\Psi^{+}\texttt{-L10}$ 505 22.45 2.23 (22.28, 22.62) 0.9719 0.0219 (0.9685, 0.9745)
POMDP $\texttt{fCTDE-3}$ 169 2.98 2.47 (2.91, 3.05) 0.7082 0.1890 (0.7039, 0.7123)
POMDP $\texttt{fCTDE-6}$ 337 7.15 3.06 (6.95, 7.37) 0.7711 0.1388 (0.7658, 0.7781)
POMDP $\texttt{fCTDE-12}$ 673 13.46 3.24 (13.09, 13.76) 0.8443 0.1026 (0.8396, 0.8506)
POMDP $\texttt{fCTDE-24}$ 1345 17.38 2.65 (17.06, 17.73) 0.8889 0.0752 (0.8840, 0.8945)
POMDP $\texttt{sCTDE-3}$ 175 2.68 2.60 (2.61, 2.74) 0.6834 0.1942 (0.6792, 0.6866)
POMDP $\texttt{sCTDE-6}$ 349 6.35 3.53 (6.18, 6.54) 0.7677 0.1488 (0.7633, 0.7725)
POMDP $\texttt{sCTDE-12}$ 697 13.70 2.79 (13.44, 13.99) 0.8466 0.0985 (0.8411, 0.8515)
POMDP $\texttt{sCTDE-24}$ 1393 17.97 2.60 (17.67, 18.25) 0.8948 0.0723 (0.8898, 0.9004)
POMDP $\texttt{qfCTDE-L2}$ 745 12.34 7.56 (12.09, 12.60) 0.8335 0.2058 (0.8277, 0.8386)
POMDP $\texttt{qfCTDE-L5}$ 817 16.79 4.66 (16.45, 17.04) 0.9040 0.1135 (0.8994, 0.9091)
POMDP $\texttt{qfCTDE-L10}$ 937 18.14 4.28 (17.83, 18.31) 0.9476 0.0660 (0.9443, 0.9508)
POMDP $\texttt{eQMARL-}\Psi^{+}\texttt{-L2}$ 745 17.14 3.98 (16.77, 17.47) 0.8834 0.1106 (0.8769, 0.8896)
POMDP $\texttt{eQMARL-}\Psi^{+}\texttt{-L5}$ 817 18.49 3.91 (18.23, 18.80) 0.9226 0.0831 (0.9172, 0.9272)
POMDP $\texttt{eQMARL-}\Psi^{+}\texttt{-L10}$ 937 19.09 3.44 (18.86, 19.46) 0.9485 0.0603 (0.9458, 0.9523)
Framework Ablation Selection Model MDP dynamics POMDP dynamics
$\texttt{eQMARL}$ $L=5$ Actor 136 412
$\texttt{eQMARL}$ $L=5$ Critic 265 (132 per agent, 1 central) 817 (408 per agent, 1 central)
$\texttt{qfCTDE}$ $L=5$ Actor 136 412
$\texttt{qfCTDE}$ $L=5$ Critic 265 817
$\texttt{fCTDE}$ $h=12$ Actor 496 388
$\texttt{fCTDE}$ $h=12$ Critic 889 673
$\texttt{sCTDE}$ $h=12$ Actor 496 388
$\texttt{sCTDE}$ $h=12$ Critic 913 (444 per agent, 25 central) 697 (336 per agent, 25 central)

The figures that aggregate the metric performance for each of the experiments are given in the table below:

Figure Dynamics Metric
fig_coingame2_maa2c_mdp_ablation_eqmarl_psi+-undiscounted_reward.pdf MDP Score
fig_coingame2_maa2c_mdp_ablation_eqmarl_psi+-coins_collected.pdf MDP Total coins collected
fig_coingame2_maa2c_mdp_ablation_eqmarl_psi+-own_coin_rate.pdf MDP Own coin rate
fig_coingame2_maa2c_mdp_ablation_eqmarl_psi+-own_coins_collected.pdf MDP Own coins collected
fig_coingame2_maa2c_mdp_ablation_qfctde-undiscounted_reward.pdf MDP Score
fig_coingame2_maa2c_mdp_ablation_qfctde-coins_collected.pdf MDP Total coins collected
fig_coingame2_maa2c_mdp_ablation_qfctde-own_coin_rate.pdf MDP Own coin rate
fig_coingame2_maa2c_mdp_ablation_qfctde-own_coins_collected.pdf MDP Own coins collected
fig_coingame2_maa2c_mdp_ablation_fctde-undiscounted_reward.pdf MDP Score
fig_coingame2_maa2c_mdp_ablation_fctde-coins_collected.pdf MDP Total coins collected
fig_coingame2_maa2c_mdp_ablation_fctde-own_coin_rate.pdf MDP Own coin rate
fig_coingame2_maa2c_mdp_ablation_fctde-own_coins_collected.pdf MDP Own coins collected
fig_coingame2_maa2c_mdp_ablation_sctde-undiscounted_reward.pdf MDP Score
fig_coingame2_maa2c_mdp_ablation_sctde-coins_collected.pdf MDP Total coins collected
fig_coingame2_maa2c_mdp_ablation_sctde-own_coin_rate.pdf MDP Own coin rate
fig_coingame2_maa2c_mdp_ablation_sctde-own_coins_collected.pdf MDP Own coins collected

Authors

Citation

If you use the code in this repository for your research or publication, please cite our paper published in ICLR 2025 using the following BibTeX entry (also available in CITATION.bib):

@inproceedings{derieux2025eqmarl,
    title={e{QMARL}: Entangled Quantum Multi-Agent Reinforcement Learning for Distributed Cooperation over Quantum Channels},
    author={Alexander DeRieux and Walid Saad},
    booktitle={The Thirteenth International Conference on Learning Representations},
    year={2025},
    url={https://openreview.net/forum?id=cR5GTis5II},
    doi={10.48550/arXiv.2405.17486}
}

About

This is the repository for the paper "eQMARL: Entangled Quantum Multi-Agent Reinforcement Learning for Distributed Cooperation over Quantum Channels" published in ICLR 2025

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published