learning-hive

Easy datasets: MNIST, FashionMNIST Harder dataset: KMNIST, CIFAR100

Sharing Data

To figure out whether we should learn from labels or not. And whether to align labels...

Sharing gradients

MonoGrad would probably works for Easy-Data (more zero-shot generalization than Mono) but not Hard-Data because using one model to learn all is too difficult (basically increase the no. of tasks that a model has to learn)

ModularGrad can probably maintain the previous performance as in Modular because we are allowed dynamic modules. We again can obtain zero-shot improvement on short-sequence easy dataset. For longer-sequence Hard-Data, the local flexibility of ModularGrad allows us to learn.

Sharing Modules

In longer-sequence Hard-Data, task distribution can overlap still, which create opportunities for sharing modules.

TODO:

Only FedAvg on modules (and not task-specific structures)
[] implement FedAvg with momentum (see Flower module)
[] check ray.put to see if we need deepcopy
[] use fixed projections across all agents but still random across tasks?

Using contrastive loss primarily and does not allow cross entropy gradients to flow back through the encoding.

Notes

For remap_labels=True, modular will not work if we train with a lot of init_tasks because during assimilation we don't change the components, and these components already learn to map labels to some previous labels. Encountering new labels without changing the modules (but only re-restructuring as in modular as opposed to monolithic) means that the model will just not learn. Solutions:

Either do remap_labels=False as before in the original paper (this might limit transfer across tasks though).
Reduce the number of init_tasks and original modules to limit overfitting. It doesn't really solve the root of the problem. ][root][INFO] - task: 3 loss: 0.18648204755543465 acc: 0.9266331658291457 [2023-03-22 16:36:53,891][root][INFO] - task: 4 loss: 0.17842456090797498 acc: 0.9373786407766991 [2023-03-22 16:36:53,892][root][INFO] - task: 5 loss: 0.3132778021653727 acc: 0.8885616102110947 [2023-03-22 16:36:53,892][root][INFO] - task: 6 loss: 0.09932567352207437 acc: 0.9765886287625418 [2023-03-22 16:36:53,893][root][INFO] - task: 7 loss: 0.46160813278435825 acc: 0.8252134605725766 [2023-03-22 16:36:53,893][root][INFO] - task: 8 loss: 0.4760802147345016 acc: 0.8140407288317256 [2023-03-22 16:36:53,894][root][INFO] - task: 9 loss: 0.10803565069026523 acc: 0.9919697685403873 [2023-03-22 16:36:53,895][root][INFO] - task: avg loss: 0.21408524074298851 acc: 0.9248043895895405 [2023-03-22 16:36:53,904][root][INFO] - W/update: 0.99, WO/update: 0.99 [2023-03-22 16:36:53,904][root][INFO] - Not keeping new module. Total: 8 [2023-03-22 16:36:54,594][root][INFO] - epochs: 201, training task: 9 [2023-03-22 16:36:54,594][root][INFO] - task: 0 loss: 0.11923505289038432 acc: 0.9567010309278351 [2023-03-22 16:36:54,595][root][INFO] - task: 1 loss: 0.027654294873049757 acc: 0.9914974019839395 [2023-03-22 16:36:54,596][root][INFO] - task: 2 loss: 0.17072897730646908 acc: 0.9394594594594594 [2023-03-22 16:36:54,596][root][INFO] - task: 3 loss: 0.18648204755543465 acc: 0.9266331658291457 [2023-03-22 16:36:54,597][root][INFO] - task: 4 loss: 0.17842456090797498 acc: 0.9373786407766991 [2023-03-22 16:36:54,598][root][INFO] - task: 5 loss: 0.3132778021653727 acc: 0.8885616102110947 [2023-03-22 16:36:54,598][root][INFO] - task: 6 loss: 0.09932567352207437 acc: 0.9765886287625418 [2023-03-22 16:36:54,599][root][INFO] - task: 7 loss: 0.46160813278435825 acc: 0.8252134605725766 [2023-03-22 16:36:54,600][root][INFO] - task: 8 loss: 0.4760802147345016 acc: 0.8140407288317256 [2023-03-22 16:36:54,600][root][INFO] - task: 9 loss: 0.10803565069026523 acc: 0.9891355692017005 [2023-03-22 16:36:54,601][root][INFO] - task: avg loss: 0.21408524074298851 acc: 0.9245209696556718 [2023-03-22 16:36:54,642][root][INFO] - final components: 8 Results saved in less_overfit_results/ Also typically for remap_labels=True, we have to train much longer like 200 epochs.
Train representation learning through supcontrast (completely label-agnostic) and do prototype classifier downstream, or prevent gradient from the entropy to affect the representation! NOTE: this is currently NOT working for some reasons, maybe need the projector?? Still running into the pairwise contrastive bias problem...

The random structures across tasks might be problematic for aligning these representations.

TODO: pick all ones and keep them fixed.

How are we not losing info. by just using some random linear transformation like this? MNIST are mostly black and white so it's a hail-mary judgement?

Experiments Logs

Comparing modular vs monolithic vs contrastive. on full and reduced datasets.

Recommendation stuff: contrastive objectives might be a bit terrible.

Should not use random projection per task: define the purpose of forward transfer. At least, we'll need a task-specific adapter before for it to work. Note that the soft layer ordering paper is about multi-task learning so it worked out for them. See random_projection.py, with the same projection, it takes the 4th task 1 epoch to reach back 90%. With random projection per task, it takes 4 epochs (Mono). (Mod): same projection takes 1 epoch to get back to 92%. The model didn't make any new components (keep components=3). With random per task projection, Mod took 5 epochs to get 90%, and have to use a new module. Generalization across tasks is non-existent!

TODO: multi-agent coordination is all over the place... We should not parallelize the sending part because the receiver must be present at the end of the pipe when the caller makes the call.

FashionMNIST data augmentation

transforms.RandomResizedCrop( size=self.net.i_size[0], scale=(0.2, 1.), antialias=True), transforms.RandomHorizontalFlip(), maybe also salt pepper (gaussian) noise like: https://github.com/1202kbs/EBCLR/blob/main/configs_ebclr/fmnist_b64.yaml?

train for 100 epochs. modular fashionmnist False 0.930831 True 0.925538

increase to 150 epochs modular fashionmnist False 0.940206 True 0.935081

Use transforms.RandomHorizontalFlip(), transforms.GaussianBlur( kernel_size=3, sigma=(0.1, 2.0)), transforms.Lambda( lambda x: x + torch.randn(x.size()).to(self.net.device) * 0.05), -> modular fashionmnist True 0.936975

Changing contrast_mode to "all" for curiosity.

Cifar100. Why does the results for cifar100 monolithic False is so high?? (Diff: replacement=True, one encoder for everything) algo dataset use_contrastive modular cifar100 False 0.78525 True 0.73130 monolithic cifar100 False 0.73740 True 0.74390

Jorge paper: no comp. should be around 51.6% Our shows 73%.

Rerunning with replacement=False. God, plz no, maybe have to do different encoders stupid trick again.

Bugs

CIFAR SHIT

Stuff to try, maybe even freeze the projector.

invasive modification: do a lot of backward update on the projector ONLY (and NOT the structure), one hacky way to do this is to concat megadataset to trainset during normal dynamic training.

Debug why contrastive is using so few components.

Visualize the accuracy over time.

dataset     algo  use_contrastive    seed agent_id   avg_acc  final_acc   forward  backward  catastrophic

0 cifar100 modular True seed_0 agent_1 0.786419 0.7828 0.587529 0.008353 1.09 0 cifar100 modular True seed_0 agent_2 0.778686 0.7729 0.658471 0.007807 1.01 0 cifar100 modular True seed_0 agent_6 0.809990 0.7875 0.645294 0.008492 1.78 0 cifar100 modular True seed_0 agent_7 0.800894 0.7753 0.648235 0.008674 1.30 0 cifar100 modular True seed_0 agent_4 0.794289 0.7610 0.621412 0.008332 2.00 0 cifar100 modular True seed_0 agent_3 0.781079 0.7869 0.622353 0.009155 0.35 0 cifar100 modular True seed_0 agent_0 0.787815 0.7741 0.617059 0.007626 1.69 0 cifar100 modular True seed_0 agent_5 0.794106 0.7884 0.629176 0.009594 0.28

0 cifar100 modular False seed_0 agent_1 0.774651 0.7837 0.715176 0.017884 0.05 0 cifar100 modular False seed_0 agent_2 0.768384 0.7783 0.715529 0.017011 0.13 0 cifar100 modular False seed_0 agent_6 0.808626 0.7995 0.715412 0.020621 -0.84 0 cifar100 modular False seed_0 agent_7 0.793669 0.7871 0.713176 0.018579 -0.12 0 cifar100 modular False seed_0 agent_4 0.789613 0.7736 0.691765 0.020095 0.70 0 cifar100 modular False seed_0 agent_3 0.764018 0.7798 0.696941 0.018021 -0.44 0 cifar100 modular False seed_0 agent_0 0.780143 0.7917 0.711059 0.017463 -0.12 0 cifar100 modular False seed_0 agent_5 0.798304 0.8009 0.699765 0.018853 -0.38

There's a bug somewhere in our code!! No dropout for cifar is significantly WORSE than normal.

NOTE TO FL STUFF

One thing that happens with modular is the previous structure is not optimized again, and so they became fcked.

Interesting Note

0.9713408393039918 with training the contrastive stuff alone (mode='cl') and use a random decoder, we still get 97% accuracy on MNIST.

with mode='both' (contrast + cross entropy) 0.9790174002047083

with mode='ce_finetune' (only train the last layer keeping the random encoder fixed) 0.5015353121801432

Name		Name	Last commit message	Last commit date
Latest commit History 259 Commits
Figures copy		Figures copy
Figures		Figures
experiments		experiments
generated_graphs		generated_graphs
method_notes		method_notes
notebook		notebook
slurm_outs/fed_prox		slurm_outs/fed_prox
src		src
.gitignore		.gitignore
FL_results.md		FL_results.md
LICENSE		LICENSE
README.md		README.md
TODOS.md		TODOS.md
analayze_grad.py		analayze_grad.py
analyze.py		analyze.py
analyze_budget.py		analyze_budget.py
analyze_calibration.py		analyze_calibration.py
analyze_divergence.py		analyze_divergence.py
analyze_divergence_budget.py		analyze_divergence_budget.py
analyze_modmod_task_sim.py		analyze_modmod_task_sim.py
analyze_no_components.py		analyze_no_components.py
cancel.sh		cancel.sh
combined_dataset_engineering.ipynb		combined_dataset_engineering.ipynb
comm_features_plot.ipynb		comm_features_plot.ipynb
data_availability_engineering.ipynb		data_availability_engineering.ipynb
data_sharing.ipynb		data_sharing.ipynb
data_zip.sh		data_zip.sh
debug.txt		debug.txt
debug_fedavg_mod.ipynb		debug_fedavg_mod.ipynb
debug_fedprox.ipynb		debug_fedprox.ipynb
debug_joint_agent.py		debug_joint_agent.py
debug_logs.md		debug_logs.md
debug_logs.py		debug_logs.py
debug_modmod_result.ipynb		debug_modmod_result.ipynb
debug_slow_mono.ipynb		debug_slow_mono.ipynb
debug_update_modules.ipynb		debug_update_modules.ipynb
do_tar.sh		do_tar.sh
download_data.sh		download_data.sh
download_data_cluster.sh		download_data_cluster.sh
download_zip_data.sh		download_zip_data.sh
env.yml		env.yml
fed_learning_engineering.ipynb		fed_learning_engineering.ipynb
grad_results.md		grad_results.md
heuristic_data_engineering.ipynb		heuristic_data_engineering.ipynb
heuristic_mono.sh		heuristic_mono.sh
heuristic_mono_eaton.sh		heuristic_mono_eaton.sh
init.png		init.png
map.py		map.py
merge_folders.py		merge_folders.py
model_sync.md		model_sync.md
model_sync_experiments.md		model_sync_experiments.md
modmod_cheese_burge_all_curves.png		modmod_cheese_burge_all_curves.png
modmod_cheese_burger_all_bars.png		modmod_cheese_burger_all_bars.png
modmod_cheese_burger_transfer_decoder_and_structure_all_sparse.png		modmod_cheese_burger_transfer_decoder_and_structure_all_sparse.png
modmod_cheese_burger_transfer_decoder_and_structure_no_sparse_basis.png		modmod_cheese_burger_transfer_decoder_and_structure_no_sparse_basis.png
modmod_engineering_transfer_decoder.ipynb		modmod_engineering_transfer_decoder.ipynb
move_folders.sh		move_folders.sh
network_engineering.ipynb		network_engineering.ipynb
output.png		output.png
plot.ipynb		plot.ipynb
plot.py		plot.py
plot_acc.ipynb		plot_acc.ipynb
plot_budget.ipynb		plot_budget.ipynb
plot_budgetv2.ipynb		plot_budgetv2.ipynb
plot_grad.ipynb		plot_grad.ipynb
plot_grad_old.ipynb		plot_grad_old.ipynb
plot_lc_modmod_dataset_agg_dfs.pkl		plot_lc_modmod_dataset_agg_dfs.pkl
plot_lc_modmod_dataset_seed_aucs.pkl		plot_lc_modmod_dataset_seed_aucs.pkl
plot_learning_curve.ipynb		plot_learning_curve.ipynb
plot_networks.py		plot_networks.py
plot_no_components.ipynb		plot_no_components.ipynb
plots.png		plots.png
random_projection.ipynb		random_projection.ipynb
random_projection.py		random_projection.py
recv_ablation.py		recv_ablation.py
recv_calibration_loss.ipynb		recv_calibration_loss.ipynb
recv_engineering.md		recv_engineering.md
recv_ood_analysis.ipynb		recv_ood_analysis.ipynb
recv_ood_analysis.py		recv_ood_analysis.py
recv_ood_engineering.py		recv_ood_engineering.py
recv_ood_overfit_to_our_setup.ipynb		recv_ood_overfit_to_our_setup.ipynb
recv_prefilter_engineering.py		recv_prefilter_engineering.py
recv_test.py		recv_test.py
recv_train_contrast_past.ipynb		recv_train_contrast_past.ipynb
replay.ipynb		replay.ipynb
results.md		results.md
run_cluster.sh		run_cluster.sh
run_cluster_batch.sh		run_cluster_batch.sh
run_cluster_batch_cifar.sh		run_cluster_batch_cifar.sh
run_cluster_grad.sh		run_cluster_grad.sh
run_cluster_grad_batch.sh		run_cluster_grad_batch.sh
run_cluster_grad_prox_batch.sh		run_cluster_grad_prox_batch.sh
run_cluster_recv_batch.sh		run_cluster_recv_batch.sh
run_cluster_temp.sh		run_cluster_temp.sh
run_cluster_topology_fedavg_batch.sh		run_cluster_topology_fedavg_batch.sh
run_cluster_topology_fedavg_batch_random.sh		run_cluster_topology_fedavg_batch_random.sh
run_cluster_topology_fedavg_batch_random_tmp.sh		run_cluster_topology_fedavg_batch_random_tmp.sh
run_cluster_topology_modmod_batch.sh		run_cluster_topology_modmod_batch.sh
run_cluster_topology_modmod_batch_random.sh		run_cluster_topology_modmod_batch_random.sh
run_data.sh		run_data.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

learning-hive

Sharing Data

Sharing gradients

Sharing Modules

TODO:

Notes

Experiments Logs

Bugs

CIFAR SHIT

NOTE TO FL STUFF

Interesting Note

About

Releases

Packages

Languages

License

vlongle/learning-hive

Folders and files

Latest commit

History

Repository files navigation

learning-hive

Sharing Data

Sharing gradients

Sharing Modules

TODO:

Notes

Experiments Logs

Bugs

CIFAR SHIT

NOTE TO FL STUFF

Interesting Note

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages