Federated Learning in Chemical Engineering: A Tutorial on a Framework for Privacy-Preserving Collaboration Across Distributed Data Sources
Welcome to the Review on Federated Learning (FL) for Chemical Engineering repository. Federated Learning is where multiple decentralized devices collaboratively train a model without sharing their local data. Each device trains the model on its own data and only shares the model updates.
A Review paper is coming soon for the Industrial & Engineering Chemistry Research Journal! - Link - {WIP}
The comprehensive results section can be seen:
Experiment | Dataset | Train Accuracy | Test Accuracy | Train Loss | Test Loss | Time (sec) |
---|---|---|---|---|---|---|
Centralized Learning Experiments (Num Epochs: 25) | ||||||
PILL - Standard_CentralNN_PILL.ipynb | PILL: Pharmaceutical Dataset | 93.75% | 82.39% | 0.2600 | 0.5700 | 226.75 |
DNA - Standard_CentralNN_DNA.ipynb | Human DNA Sequence | 100.00% | 94.50% | 0.0000 | 0.5080 | 97.49 |
DNA (MMoE) - Standard_CentralNN_DNA+MRI.ipynb | Brain MRI Scan + Human DNA Sequence | 99.24% | 84.64% | 0.0274 | 0.7513 | 350.32 |
MRI (MMoE) - Standard_CentralNN_DNA+MRI.ipynb | Brain MRI Scan + Human DNA Sequence | 99.72% | 90.63% | 0.0115 | 0.8555 | |
HIV - Standard_CentralNN_HIV.ipynb | Human DNA Sequence | 96.77% | 95.45% | 0.1640 | 0.1770 | 100.79 |
Federated Learning Experiments (Num Rounds: 20, Num Clients: 10, Num Epochs: 10) | ||||||
PILL - Standard_FedNN_PILL.ipynb | PILL: Pharmaceutical Dataset | 93.54% | 94.79% | 0.2800 | 0.2070 | 2215.40 |
DNA - Standard_FedNN_DNA.ipynb | Human DNA Sequence | 100.00% | 94.09% | 0.0000 | 1.2030 | 3921.43 |
DNA (MMoE) - Standard_FedNN_DNA+MRI.ipynb | Brain MRI Scan + Human DNA Sequence | 99.00% | 94.75% | 0.0776 | 0.4167 | 5543.29 |
MRI (MMoE) - Standard_FedNN_DNA+MRI.ipynb | Brain MRI Scan + Human DNA Sequence | 99.38% | 85.56% | 0.1997 | 1.0720 | |
HIV - Standard_FedNN_HIV.ipynb | Human DNA Sequence | 96.31% | 95.34% | 0.1790 | 0.1870 | 1042.82 |
Results of Central and FL Experiments. In FL, both train loss and accuracy refer to a single client.
.
├── LICENSE
├── README.md
├── run-cpu.sh
├── run-gpu.sh
├── run-tff-cpu.sh
├── run-tff-gpu.sh
└── src
├── README.md
├── Standard_CentralNN_DNA.ipynb
├── Standard_CentralNN_DNA+MRI.ipynb
├── Standard_CentralNN_HIV.ipynb
├── Standard_CentralNN_PILL.ipynb
├── Standard_CentralNN_Wafer.ipynb
├── Standard_FedNN_DNA.ipynb
├── Standard_FedNN_DNA+MRI.ipynb
├── Standard_FedNN_HIV.ipynb
├── Standard_FedNN_PILL.ipynb
├── Standard_FedNN_Wafer.ipynb
├── Standard_FedQNN_DNA.ipynb
├── Standard_FedQNN_DNA+MRI.ipynb
├── Standard_FedQNN_HIV.ipynb
├── Standard_FedQNN_PILL.ipynb
├── Standard_FedQNN_Wafer.ipynb
├── TFF_FedNN_PILL.ipynb
└── utils
├── common.py
├── data_setup.py
├── engine.py
└── __init__.py
git clone https://github.com/elucidator8918/Fed-ML-Chem.git
cd Fed-ML-Chem
conda create -n fed python=3.10.12 anaconda
conda init
conda activate fed
bash run-cpu.sh
conda create -n fed python=3.10.12 anaconda
conda init
conda activate fed
bash run-gpu.sh
Choose the appropriate notebook based on your dataset and encryption preference:
-
DNA Sequence Dataset:
- Notebook:
src/Standard_FedNN_DNA.ipynb
- Description: This dataset includes DNA sequences used for various biological and genetic studies, focusing on sequence classification and pattern recognition.
- Notebook:
-
DNA+MRI Multimodal Dataset:
- Notebook:
src/Standard_FedNN_DNA+MRI.ipynb
- Description: It is used as a MoE with Multimodaility leveraging both DNA Sequence and MRI scans data to develop and evaluate models for detecting and interpreting tumors and dna classes.
- Notebook:
-
PILL Dataset:
- Notebook:
src/Standard_FedNN_PILL.ipynb
- Description: This dataset includes images of pharmaceutical pills from the Pill Dataset used for various pharmaceutical studies, focusing on classification and pattern recognition.
- Notebook:
-
DNA Sequence Dataset:
- Notebook:
src/Standard_FedQNN_DNA.ipynb
- Description: This dataset includes DNA sequences used for various biological and genetic studies, focusing on sequence classification and pattern recognition.
- Notebook:
-
DNA+MRI Multimodal Dataset:
- Notebook:
src/Standard_FedQNN_DNA+MRI.ipynb
- Description: It is used as a MoE with Multimodaility leveraging both DNA Sequence and MRI scans data to develop and evaluate models for detecting and interpreting tumors and dna classes.
- Notebook:
-
PILL Dataset:
- Notebook:
src/Standard_FedQNN_PILL.ipynb
- Description: This dataset includes images of pharmaceutical pills from the Pill Dataset used for various pharmaceutical studies, focusing on classification and pattern recognition.
- Notebook:
This project is licensed under the MIT License. See the LICENSE file for details.