This repository contains a custom-built simulator for decentralized federated learning systems, developed as part of our master's degree thesis between February 2024 and October 2024. The simulator is implemented in Python and is designed to replicate the behavior of a blockchain-assisted federated learning system in a fully decentralized environment. It supports the simulation of various configurations, allowing experimentation with different consensus mechanisms, validation techniques, and aggregation methods.
The simulator aims to facilitate research and analysis in decentralized federated learning, providing a powerful tool to study vulnerabilities, test defensive mechanisms, and evaluate system performance under diverse configurations.
- Flexible Configuration: Use a JSON configuration file to customize datasets, node behaviors, consensus algorithms (e.g., PoW, PoS, committee-based), and validation mechanisms.
- Malicious Node Behavior: Simulate common malicious trainer behaviors such as label flipping, additive noise, and targeted data poisoning.
- Dataset Management: Partition datasets into IID or N-IID subsets, allowing for diverse training scenarios.
- Consensus Algorithms: Explore the impact of Proof-of-Work, Proof-of-Stake, and committee-based consensus algorithms on federated learning.
- Validation and Aggregation: Test multiple validation and aggregation mechanisms to evaluate their effectiveness in improving system robustness.
|-- datasets/ # Examples of publicly available datasets pre-processed for use with the simulator
|-- docs/ # Docs
|-- models/ # Examples of neural network architectures and initial weights
|-- examples/ # Examples of simulations (e.g., JSON configurations, output files, and analyses performed by means of logger_to_graph.py)
|-- src/ # Source code
|-- shared/ # Baseline modules; incomplete as they are not specialized for any specific consensus algorithm
|-- pos/ # Extensions of the shared modules, specialized for the Proof-of-Stake consensus algorithm
|-- pow/ # Extensions of the shared modules, specialized for the Proof-of-Work consensus algorithm
|-- committee/ # Extensions of the shared modules, specialized for the 'Committee-based' consensus algorithm
|-- __init__.py
|-- main.py
|-- dataset_creator.ipynb # Notebook for manipulating datasets to prepare them for use with the simulator
|-- datasets_models_attacks_visualizer.ipynb # Notebook that shows the core ideas behind the simulator. It shows the manipulations needed to use certain datasets, the creation of neural networks and the core behavior of some malicious attacks
|-- model_creator.ipynb # Notebook for creating neural network architectures and initial weights required for simulations conducted for our thesis
|-- label_flipping_score.py # Script to evaluate the effectiveness of label-flipping attacks on the global model trained during a simulation
|-- targeted_poisoning_score.py # Script to evaluate the effectiveness of targeted data poisoning (e.g., backdoor attacks) on the global model trained during a simulation
|-- logger_to_graph.py # Script to generate visual insights from simulation log files
|-- LICENSE # License file for the repository
|-- README.md # Documentation for the repository
git clone /~https://github.com/federicocaroli/FedBlockParadox.git
cd FedBlocKParadox
Install the required Python packages, including specialized NVIDIA libraries:
python -m pip install --extra-index-url https://pypi.nvidia.com \
numpy==1.25.2 scipy==1.11.4 matplotlib==3.9.1 tabulate==0.9.0 \
psutil==5.9.5 datasets==2.19.2 flwr_datasets==0.2.0 flwr==1.9.0 \
pympler==1.1 tensorrt-bindings==8.6.1 tensorrt-libs==8.6.1 \
tensorflow[and-cuda]==2.15.0 setproctitle==1.3.3
Ensure you have a valid JSON configuration file for the simulation. Example configuration files are available in the examples/
directory.
Execute the simulator using the main
script and specify your configuration file:
python -m src.main "config_path" \
2>./tmp.txt 1>./tmp.txt
- Replace
config_path
with your specific configuration file. - Note: tmp.txt will contain general logs generated by various Python modules. The actual log file path is specified in the JSON configuration file.
- Review the log file to analyze the simulation's progress and outcomes.
- Use visualization scripts like
logger_to_graph.py
to gain insights. - If the simulations involve malicious nodes performing label flipping or targeted data poisoning attacks, evaluate their impacts using
label_flipping_score.py
ortargeted_poisoning_score.py
.
- Update dependencies and paths as needed.
- Modify scripts to reflect your specific experimental setup.
- Customize datasets and models to fit your use case.
This project is licensed under the MIT License. See the LICENSE file for details.
Contributions are welcome! Feel free to fork this repository, submit issues, or create pull requests.
Happy researching and exploring new possibilities! 😊