This repository is a "starter package" for the Health Privacy Competition that runs within CAMDA Conference 2025. The Health Privacy Challenge is organized in the context of the European Lighthouse on Safe and Secure AI (ELSA, https://elsa-ai.eu).
The Health Privacy Challenge consists of two tracks:
Track I runs in a “Blue Team (🫐) vs Red Team (🍅)” scheme.
- The blue teams develop novel privacy preserving generative methods that can mitigate privacy risks while preserving biological insights for bulk gene expression datasets,
- The red teams launch trustworthy and realistic membership inference attacks (MIA) against blue teams’ solutions to assess whether these generative methods can withstand privacy attacks.
Track II invites participants to explore the privacy and utility of synthetic single-cell gene expression (scRNA-seq) data. Participants are encouraged to:
- investigate and reveal potential privacy risks linked to generating synthetic scRNA-seq datasets.
- develop privacy-preserving generative methods that balances data privacy and utility.
- propose novel evaluation metrics and strategies to assess both utility and privacy preservation in a multi-sample donor setting.
We are looking forward to engaging with you and working together to deepen our understanding of privacy in healthcare. 🤗
This repository contains:
- 👩💻 Baseline code for generative methods (Blue Teams) and Membership Inference Attack algorithms (Red teams).
- 📝 Documentation that details setup and submission instructions for the competition.
- 📎 Submission templates to base your submissions on.
Other resources:
- 💬 CAMDA Health Privacy Challenge Google Groups: Join us for questions, discussions and further announcements.
- 🌐 CAMDA Challenge website: Follow CAMDA 2025 for conference announcements.
- 🌐 ELSA Benchmark method submission platform: The platform to register, to download datasets, and to submit your benchmark methods.
- 📚 Relevant papers: https://arxiv.org/abs/2402.04912
Both teams, please check out Getting Started to set up and use the starter package!
Datasets are available for download in ELSA Benchmarks Competition platform after registration and signing the data download agreement.
We re-distribute pre-processed versions of two open-access TCGA RNA-seq datasets, available through the GDC portal:
-
TCGA-BRCA RNASeq
Dimensions: <1089 x 978> Details: Suitable for cancer subtype prediction (5 subtypes)
-
TCGA COMBINED RNASeq (with 10 different cancer tissues )
Dimensions: <4323 x 978> Details: Suitable for cancer tissue of origin prediction (10 tissues)
Navigate here for details about the pre-processing steps.
We re-distribute raw counts of OneK1K single-cell RNA-seq dataset (https://onek1k.org/), a cohort containing 1.26 million peripheral blood mononuclear cells (PBMCs) of 981 donors, generously provided by Joseph Powell and the authors (Yazar et al., 2022) in Garvan Institute of Medical Research.
- Train dataset: <633711 cells from 490 donors x 25834 genes >
- Test dataset: <634022 cells from 491 donors x 25834 genes >
Navigate Track II homepage for details about the pre-processing steps.
This competition is designed as a collaborative effort between European Molecular Biology Laboratory (EMBL), CISPA Helmholtz Center for Information Security, and the University of Helsinki with the support of Barcelona Computer Vision Center (CVC) within the context of ELSA Project.
- EMBL: Hakime Öztürk, Julio Saez-Rodriguez and Oliver Stegle
- CISPA: Tejumade Afonja, Ruta Binkyte and Mario Fritz
- University of Helsinki: Joonas Jälkö and Antti Honkela
and in collaboration with Saez-Rodriguez group in Track II and the review process:
- University of Heidelberg: Sebastian Lobentanzer, Pablo R. Mier, Attila Gabor.
We also thank Katharina Mikulik (DKFZ), Kevin Domanegg (DKFZ), and Danai Vaigaki (EMBL) for helpful feedback.