We provide efficient and streamlined implementations of the TOFU, MUSE unlearning benchmarks while supporting 5 unlearning methods, 3+ datasets, 6+ evaluation metrics, and 7+ LLMs. Each of these can be easily extended to incorporate more variants.
We invite the LLM unlearning community to collaborate by adding new benchmarks, unlearning methods, datasets and evaluation metrics here to expand OpenUnlearning's features, gain feedback from wider usage and drive progress in the field.
β οΈ Notice (Updated: February 27, 2025)
This repository replaces the original TOFU codebase, which can be found atgithub.com/locuslab/tofu
and isn't maintained anymore.
We provide several variants for each of the components in the unlearning pipeline.
Component | Available Options |
---|---|
Benchmarks | TOFU, MUSE |
Unlearning Methods | GradAscent, GradDiff, NPO, SimNPO, DPO |
Evaluation Metrics | Verbatim Probability, Verbatim ROUGE, QA-ROUGE, MIA Attacks, TruthRatio, Model Utility |
Datasets | MUSE-News (BBC), MUSE-Books (Harry Potter), TOFU (different splits) |
Model Families | TOFU: LLaMA-3.2, LLaMA-3.1, LLaMA-2; MUSE: LLaMA-2, ICLM; Additional: Phi-3.5, Phi-1.5, Gemma |
- π Overview
- ποΈ Available Components
- β‘ Quickstart
- π οΈ Environment Setup
- πΎ Data Setup
- π Updated TOFU benchmark
- π§ͺ Running Experiments
- π Perform Unlearning
- π Perform an Evaluation
- π Running Baseline Experiments
- β How to Add New Components
- π Further Documentation
- π Support & Contributors
- π Citing this work
- π€ Acknowledgements
- π License
conda create -n unlearning python=3.11
conda activate unlearning
pip install .
pip install --no-build-isolation flash-attn==2.6.3
Download the log files containing metric results from the models used in the supported benchmarks (including the retain model logs used to compare the unlearned models against).
python setup_data.py # populates saves/eval with evaluation results of the uploaded models
We've updated Open-Unlearning's TOFU benchmark target models to use a wider variety of newer architectures with sizes varying from 1B to 8B. These include LLaMA 3.2 1B, LLaMA 3.2 3B, LLaMA 3.1 8B, and the original LLaMA-2 7B from the old version of TOFU.
For each architecture, we have finetuned with four different splits of the TOFU datasets: full
, retain90
, retain95
, retain99
, for a total of 16 finetuned models. The first serves as the target (base model for unlearning) and the rest are retain models used to measure performance against for each forget split. These models are on HuggingFace and the paths to these models can be set in the experimental configs or in command-line overrides.
We provide an easily configurable interface for running evaluations by leveraging Hydra configs. For a more detailed documentation of aspects like running experiments, commonly overriden arguments, interfacing with configurations, distributed training and simple finetuning of models, refer docs/experiments.md
.
An example command for launching an unlearning process with GradAscent
on the TOFU forget10
split:
python src/train.py --config-name=unlearn.yaml experiment=unlearn/tofu/default \
forget_split=forget10 retain_split=retain90 trainer=GradAscent task_name=SAMPLE_UNLEARN
experiment
- Path to the Hydra config fileconfigs/experiment/unlearn/muse/default.yaml
with default experimental settings for TOFU unlearning, e.g. train dataset, eval benchmark details, model paths etc..forget_split/retain_split
- Sets the forget and retain dataset splits.trainer
- Loadconfigs/trainer/GradAscent.yaml
and override the unlearning method with the handler (see config) implemented insrc/trainer/unlearn/grad_ascent.py
.
An example command for launching a TOFU evaluation process on forget10
split:
python src/eval.py --config-name=eval.yaml experiment=eval/tofu/default \
model=Llama-3.2-1B-Instruct \
model.model_args.pretrained_model_name_or_path=open-unlearning/tofu_Llama-3.2-1B-Instruct_full \
task_name=SAMPLE_EVAL
experiment
- Path to the evaluation configurationconfigs/experiment/eval/tofu/default.yaml
.model
- Sets up the model and tokenizer configs for theLlama-3.2-1B-Instruct
model.model.model_args.pretrained_model_name_or_path
- Overrides the default experiment config to evaluate a model from a HuggingFace ID (can use a local model checkpoint path as well).
For more details about creating and running evaluations, refer docs/evaluation.md
.
The scripts below execute standard baseline unlearning experiments on the TOFU and MUSE datasets, evaluated using their corresponding benchmarks. The expected results for these are in docs/results.md
.
bash scripts/tofu_unlearn.sh
bash scripts/muse_unlearn.sh
Adding a new component (trainer, evaluation metric, benchmark, model, or dataset) requires defining a new class, registering it, and creating a configuration file. Learn more about adding new components in docs/components.md
.
Please feel free to raise a pull request for any new features after setting up the environment in development mode.
pip install .[dev]
For more in-depth information on specific aspects of the framework, refer to the following documents:
Documentation | Contains |
---|---|
docs/components.md |
Instructions on how to add new components such as trainers, benchmarks, metrics, models, datasets, etc. |
docs/evaluation.md |
Detailed instructions on creating and running evaluation metrics and benchmarks. |
docs/experiments.md |
Guide on running experiments in various configurations and settings, including distributed training, fine-tuning, and overriding arguments. |
docs/hydra.md |
Explanation of the Hydra features used in configuration management for experiments. |
docs/results.md |
Reference results from various unlearning methods run using this framework on TOFU and MUSE benchmarks. |
Developed and maintained by Vineeth Dorna (@Dornavineeth) and Anmol Mekala (@molereddy).
If you encounter any issues or have questions, feel free to raise an issue in the repository π οΈ.
If you use OpenUnlearning in your research, please cite:
@misc{openunlearning2025,
title={OpenUnlearning: A Unified Framework for LLM Unlearning Benchmarks},
author={Dorna, Vineeth and Mekala, Anmol and Zhao, Wenlong and McCallum, Andrew and Kolter, J Zico and Maini, Pratyush},
year={2025},
howpublished={\url{/~https://github.com/locuslab/open-unlearning}},
note={Accessed: February 27, 2025}
}
@inproceedings{maini2024tofu,
title={TOFU: A Task of Fictitious Unlearning for LLMs},
author={Maini, Pratyush and Feng, Zhili and Schwarzschild, Avi and Lipton, Zachary Chase and Kolter, J Zico},
booktitle={First Conference on Language Modeling},
year={2024}
}
Expand for bibtex to cite other benchmarks used from OpenUnlearning
@article{shi2024muse,
title={Muse: Machine unlearning six-way evaluation for language models},
author={Shi, Weijia and Lee, Jaechan and Huang, Yangsibo and Malladi, Sadhika and Zhao, Jieyu and Holtzman, Ari and Liu, Daogao and Zettlemoyer, Luke and Smith, Noah A and Zhang, Chiyuan},
journal={arXiv preprint arXiv:2407.06460},
year={2024}
}
- This repo is inspired from LLaMA-Factory.
- The TOFU and MUSE benchmarks served as the foundation for our re-implementation.
This project is licensed under the MIT License. See the LICENSE
file for details.