Skip to content

jiamubai/RS-in-Federated-Learning

Repository files navigation

NIID-Bench

This is the code of research project Representer Sketch in Federated Learning. Code is based on paper Federated Learning on Non-IID Data Silos: An Experimental Study.

Usage

Here is one example to run this code:

python experiment_new.py --model=mlp \
    --dataset=mnist \
    --alg=fedavg \
    --lr=0.01 \
    --batch-size=64 \
    --epochs=10 \
    --n_parties=10 \
    --mu=0.01 \
    --rho=0.9 \
    --comm_round=50 \
    --partition=noniid-labeldir \
    --beta=0.5\
    --device='cuda:0'\
    --datadir='./data/' \
    --logdir='./logs/' \
    --noise=0 \
    --sample=1 \
    --init_seed=0

For MNIST dataset, use experiment_new.py file. For CIFAR10 dataset, use exp_new.py file.

MNIST

Parameter Description
model The model architecture. Options: mlp, RS. Default = mlp.
lr Learning rate for the local models, default = 0.01.
batch-size Batch size, default = 64.
epochs Number of local training epochs, default = 5.
n_parties Number of parties, default = 2.
mu The proximal term parameter for FedProx, default = 1.
rho The parameter controlling the momentum SGD, default = 0.
comm_round Number of communication rounds to use, default = 50.
partition The partition way. Options: homo, noniid-labeldir, noniid-#label1 (or 2, 3, ..., which means the fixed number of labels each party owns), real, iid-diff-quantity. Default = homo
beta The concentration parameter of the Dirichlet distribution for heterogeneous partition, default = 0.5.
device Specify the device to run the program, default = cuda:0.
datadir The path of the dataset, default = ./data/.
logdir The path to store the logs, default = ./logs/.
noise Maximum variance of Gaussian noise we add to local party, default = 0.
sample Ratio of parties that participate in each communication round, default = 1.
init_seed The initial seed, default = 0.
optimizer The optimizer used during training. Options: sdg,adam, amsgrad, default = 0.
reg Regularization term, default = 1e-5.

CIFAR

Parameter Description
model The model architecture. Options: mlp, RS. Default = mlp.
dataset Dataset to use. Options: mnist, cifar10_pre, cifar100_pre. Default = mnist.
lr Learning rate for the local models, default = 0.01.
batch-size Batch size, default = 64.
epochs Number of local training epochs, default = 5.
n_parties Number of parties, default = 2.
mu The proximal term parameter for FedProx, default = 1.
rho The parameter controlling the momentum SGD, default = 0.
comm_round Number of communication rounds to use, default = 50.
partition The partition way. Options: homo, noniid-labeldir, noniid-#label1 (or 2, 3, ..., which means the fixed number of labels each party owns), real, iid-diff-quantity. Default = homo
beta The concentration parameter of the Dirichlet distribution for heterogeneous partition, default = 0.5.
device Specify the device to run the program, default = cuda:0.
datadir The path of the dataset, default = ./data/.
logdir The path to store the logs, default = ./logs/.
noise Maximum variance of Gaussian noise we add to local party, default = 0.
sample Ratio of parties that participate in each communication round, default = 1.
init_seed The initial seed, default = 0.
optimizer The optimizer used during training. Options: sdg,adam, amsgrad, default = 0.
reg Regularization term, default = 1e-5.
pretrain Use pretrained model or not. Options: pre, no. default = no.

Data Partition Map

You can call function get_partition_dict() in experiments.py to access net_dataidx_map. net_dataidx_map is a dictionary. Its keys are party ID, and the value of each key is a list containing index of data assigned to this party. For our experiments, we usually set init_seed=0. When we repeat experiments of some setting, we change init_seed to 1 or 2. The default value of noise is 0 unless stated. We list the way to get our data partition as follow.

  • Quantity-based label imbalance: partition=noniid-#label1, noniid-#label2 or noniid-#label3
  • Distribution-based label imbalance: partition=noniid-labeldir, beta=0.5 or 0.1
  • Noise-based feature imbalance: partition=homo, noise=0.1 (actually noise does not affect net_dataidx_map)
  • Synthetic feature imbalance & Real-world feature imbalance: partition=real
  • Quantity Skew: partition=iid-diff-quantity, beta=0.5 or 0.1
  • IID Setting: partition=homo
  • Mixed skew: partition = mixed for mixture of distribution-based label imbalance and quantity skew; partition = noniid-labeldir and noise = 0.1 for mixture of distribution-based label imbalance and noise-based feature imbalance.

Here is explanation of parameter for function get_partition_dict().

Parameter Description
dataset Dataset to use. Options: mnist, cifar10, fmnist, svhn, generated, femnist, a9a, rcv1, covtype.
partition Tha partition way. Options: homo, noniid-labeldir, noniid-#label1 (or 2, 3, ..., which means the fixed number of labels each party owns), real, iid-diff-quantity
n_parties Number of parties.
init_seed The initial seed.
datadir The path of the dataset.
logdir The path to store the logs.
beta The concentration parameter of the Dirichlet distribution for heterogeneous partition.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published