This is the code of research project Representer Sketch in Federated Learning. Code is based on paper Federated Learning on Non-IID Data Silos: An Experimental Study.
Here is one example to run this code:
python experiment_new.py --model=mlp \
--dataset=mnist \
--alg=fedavg \
--lr=0.01 \
--batch-size=64 \
--epochs=10 \
--n_parties=10 \
--mu=0.01 \
--rho=0.9 \
--comm_round=50 \
--partition=noniid-labeldir \
--beta=0.5\
--device='cuda:0'\
--datadir='./data/' \
--logdir='./logs/' \
--noise=0 \
--sample=1 \
--init_seed=0
For MNIST dataset, use experiment_new.py
file. For CIFAR10 dataset, use exp_new.py
file.
Parameter | Description |
---|---|
model |
The model architecture. Options: mlp , RS . Default = mlp . |
lr |
Learning rate for the local models, default = 0.01 . |
batch-size |
Batch size, default = 64 . |
epochs |
Number of local training epochs, default = 5 . |
n_parties |
Number of parties, default = 2 . |
mu |
The proximal term parameter for FedProx, default = 1 . |
rho |
The parameter controlling the momentum SGD, default = 0 . |
comm_round |
Number of communication rounds to use, default = 50 . |
partition |
The partition way. Options: homo , noniid-labeldir , noniid-#label1 (or 2, 3, ..., which means the fixed number of labels each party owns), real , iid-diff-quantity . Default = homo |
beta |
The concentration parameter of the Dirichlet distribution for heterogeneous partition, default = 0.5 . |
device |
Specify the device to run the program, default = cuda:0 . |
datadir |
The path of the dataset, default = ./data/ . |
logdir |
The path to store the logs, default = ./logs/ . |
noise |
Maximum variance of Gaussian noise we add to local party, default = 0 . |
sample |
Ratio of parties that participate in each communication round, default = 1 . |
init_seed |
The initial seed, default = 0 . |
optimizer |
The optimizer used during training. Options: sdg ,adam , amsgrad , default = 0 . |
reg |
Regularization term, default = 1e-5 . |
Parameter | Description |
---|---|
model |
The model architecture. Options: mlp , RS . Default = mlp . |
dataset |
Dataset to use. Options: mnist , cifar10_pre , cifar100_pre . Default = mnist . |
lr |
Learning rate for the local models, default = 0.01 . |
batch-size |
Batch size, default = 64 . |
epochs |
Number of local training epochs, default = 5 . |
n_parties |
Number of parties, default = 2 . |
mu |
The proximal term parameter for FedProx, default = 1 . |
rho |
The parameter controlling the momentum SGD, default = 0 . |
comm_round |
Number of communication rounds to use, default = 50 . |
partition |
The partition way. Options: homo , noniid-labeldir , noniid-#label1 (or 2, 3, ..., which means the fixed number of labels each party owns), real , iid-diff-quantity . Default = homo |
beta |
The concentration parameter of the Dirichlet distribution for heterogeneous partition, default = 0.5 . |
device |
Specify the device to run the program, default = cuda:0 . |
datadir |
The path of the dataset, default = ./data/ . |
logdir |
The path to store the logs, default = ./logs/ . |
noise |
Maximum variance of Gaussian noise we add to local party, default = 0 . |
sample |
Ratio of parties that participate in each communication round, default = 1 . |
init_seed |
The initial seed, default = 0 . |
optimizer |
The optimizer used during training. Options: sdg ,adam , amsgrad , default = 0 . |
reg |
Regularization term, default = 1e-5 . |
pretrain |
Use pretrained model or not. Options: pre , no . default = no . |
You can call function get_partition_dict()
in experiments.py
to access net_dataidx_map
. net_dataidx_map
is a dictionary. Its keys are party ID, and the value of each key is a list containing index of data assigned to this party. For our experiments, we usually set init_seed=0
. When we repeat experiments of some setting, we change init_seed
to 1 or 2. The default value of noise
is 0 unless stated. We list the way to get our data partition as follow.
- Quantity-based label imbalance:
partition
=noniid-#label1
,noniid-#label2
ornoniid-#label3
- Distribution-based label imbalance:
partition
=noniid-labeldir
,beta
=0.5
or0.1
- Noise-based feature imbalance:
partition
=homo
,noise
=0.1
(actually noise does not affectnet_dataidx_map
) - Synthetic feature imbalance & Real-world feature imbalance:
partition
=real
- Quantity Skew:
partition
=iid-diff-quantity
,beta
=0.5
or0.1
- IID Setting:
partition
=homo
- Mixed skew:
partition
=mixed
for mixture of distribution-based label imbalance and quantity skew;partition
=noniid-labeldir
andnoise
=0.1
for mixture of distribution-based label imbalance and noise-based feature imbalance.
Here is explanation of parameter for function get_partition_dict()
.
Parameter | Description |
---|---|
dataset |
Dataset to use. Options: mnist , cifar10 , fmnist , svhn , generated , femnist , a9a , rcv1 , covtype . |
partition |
Tha partition way. Options: homo , noniid-labeldir , noniid-#label1 (or 2, 3, ..., which means the fixed number of labels each party owns), real , iid-diff-quantity |
n_parties |
Number of parties. |
init_seed |
The initial seed. |
datadir |
The path of the dataset. |
logdir |
The path to store the logs. |
beta |
The concentration parameter of the Dirichlet distribution for heterogeneous partition. |