Skip to content

A Python simulator for decentralized federated learning systems, supporting PoW, PoS, and committee-based consensus, multiple validation and aggregation methods, and malicious trainer behaviors like label flipping. Enables IID and N-IID data setups for flexible testing and analysis of system resilience.

License

Notifications You must be signed in to change notification settings

federicocaroli/FedBlockParadox

Repository files navigation

FedBlockParadox

This repository contains a custom-built simulator for decentralized federated learning systems, developed as part of our master's degree thesis between February 2024 and October 2024. The simulator is implemented in Python and is designed to replicate the behavior of a blockchain-assisted federated learning system in a fully decentralized environment. It supports the simulation of various configurations, allowing experimentation with different consensus mechanisms, validation techniques, and aggregation methods.

The simulator aims to facilitate research and analysis in decentralized federated learning, providing a powerful tool to study vulnerabilities, test defensive mechanisms, and evaluate system performance under diverse configurations.


🚀 Key Features

  • Flexible Configuration: Use a JSON configuration file to customize datasets, node behaviors, consensus algorithms (e.g., PoW, PoS, committee-based), and validation mechanisms.
  • Malicious Node Behavior: Simulate common malicious trainer behaviors such as label flipping, additive noise, and targeted data poisoning.
  • Dataset Management: Partition datasets into IID or N-IID subsets, allowing for diverse training scenarios.
  • Consensus Algorithms: Explore the impact of Proof-of-Work, Proof-of-Stake, and committee-based consensus algorithms on federated learning.
  • Validation and Aggregation: Test multiple validation and aggregation mechanisms to evaluate their effectiveness in improving system robustness.

📂 Repository Structure

|-- datasets/                   # Examples of publicly available datasets pre-processed for use with the simulator
|-- docs/                       # Docs
|-- models/                     # Examples of neural network architectures and initial weights
|-- examples/                   # Examples of simulations (e.g., JSON configurations, output files, and analyses performed by means of logger_to_graph.py)
|-- src/                        # Source code
    |-- shared/                 # Baseline modules; incomplete as they are not specialized for any specific consensus algorithm
    |-- pos/                    # Extensions of the shared modules, specialized for the Proof-of-Stake consensus algorithm
    |-- pow/                    # Extensions of the shared modules, specialized for the Proof-of-Work consensus algorithm
    |-- committee/              # Extensions of the shared modules, specialized for the 'Committee-based' consensus algorithm
    |-- __init__.py
    |-- main.py
|-- dataset_creator.ipynb       # Notebook for manipulating datasets to prepare them for use with the simulator
|-- datasets_models_attacks_visualizer.ipynb   # Notebook that shows the core ideas behind the simulator. It shows the manipulations needed to use certain datasets, the creation of neural networks and the core behavior of some malicious attacks
|-- model_creator.ipynb         # Notebook for creating neural network architectures and initial weights required for simulations conducted for our thesis
|-- label_flipping_score.py     # Script to evaluate the effectiveness of label-flipping attacks on the global model trained during a simulation
|-- targeted_poisoning_score.py # Script to evaluate the effectiveness of targeted data poisoning (e.g., backdoor attacks) on the global model trained during a simulation
|-- logger_to_graph.py          # Script to generate visual insights from simulation log files
|-- LICENSE                     # License file for the repository
|-- README.md                   # Documentation for the repository

🧑‍💻 How to Use

1. Clone the Repository

git clone /~https://github.com/federicocaroli/FedBlockParadox.git
cd FedBlocKParadox

2. Install Dependencies

Install the required Python packages, including specialized NVIDIA libraries:

python -m pip install --extra-index-url https://pypi.nvidia.com \
    numpy==1.25.2 scipy==1.11.4 matplotlib==3.9.1 tabulate==0.9.0 \
    psutil==5.9.5 datasets==2.19.2 flwr_datasets==0.2.0 flwr==1.9.0 \
    pympler==1.1 tensorrt-bindings==8.6.1 tensorrt-libs==8.6.1 \
    tensorflow[and-cuda]==2.15.0 setproctitle==1.3.3

3. Prepare Configuration

Ensure you have a valid JSON configuration file for the simulation. Example configuration files are available in the examples/ directory.

4. Run the Simulator

Execute the simulator using the main script and specify your configuration file:

python -m src.main "config_path" \
    2>./tmp.txt 1>./tmp.txt
  • Replace config_path with your specific configuration file.
  • Note: tmp.txt will contain general logs generated by various Python modules. The actual log file path is specified in the JSON configuration file.

5. Analyze Results

  • Review the log file to analyze the simulation's progress and outcomes.
  • Use visualization scripts like logger_to_graph.py to gain insights.
  • If the simulations involve malicious nodes performing label flipping or targeted data poisoning attacks, evaluate their impacts using label_flipping_score.py or targeted_poisoning_score.py.

🔧 Configuration

  • Update dependencies and paths as needed.
  • Modify scripts to reflect your specific experimental setup.
  • Customize datasets and models to fit your use case.

📜 License

This project is licensed under the MIT License. See the LICENSE file for details.


🤝 Contributions

Contributions are welcome! Feel free to fork this repository, submit issues, or create pull requests.


Happy researching and exploring new possibilities! 😊

About

A Python simulator for decentralized federated learning systems, supporting PoW, PoS, and committee-based consensus, multiple validation and aggregation methods, and malicious trainer behaviors like label flipping. Enables IID and N-IID data setups for flexible testing and analysis of system resilience.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published