You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Mar 19, 2024. It is now read-only.
If there are no obvious error in "what you observed" provided above,
please tell us the expected behavior.
Problem statement:
We are using the pretrained imagenet model weights to perform supervised learning on our own dataset, consisting of ~60000 train images and ~14000 test images, there are a total of 1139 classes. I have changed the MLP head in the yaml file to reflect 1139 classes.
Expected: Stable training
What’s happening? Train accuracy increases too quickly reaching almost 90% in ~170 epochs but the test accuracy doesn’t improve at all, remai log (3).txt
ning close to 0 for the most part. While performing supervised training in Pytorch we are able to get 70% accuracy.
Any insights on why this might be happening? Suggestions to effectively utilize the VISSL pipelines will be appreciated.
Architecture x86_64
CPU op-mode(s) 32-bit, 64-bit
Byte Order Little Endian
CPU(s) 12
On-line CPU(s) list 0-11
Thread(s) per core 2
Core(s) per socket 6
Socket(s) 1
NUMA node(s) 1
Vendor ID GenuineIntel
CPU family 6
Model 85
Model name Intel(R) Core(TM) i7-7800X CPU @ 3.50GHz
Stepping 4
CPU MHz 3999.959
CPU max MHz 4000.0000
CPU min MHz 1200.0000
BogoMIPS 6999.82
Virtualization VT-x
L1d cache 32K
L1i cache 32K
L2 cache 1024K
L3 cache 8448K
NUMA node0 CPU(s) 0-11
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Expected behavior:
If there are no obvious error in "what you observed" provided above,
please tell us the expected behavior.
Problem statement:
We are using the pretrained imagenet model weights to perform supervised learning on our own dataset, consisting of ~60000 train images and ~14000 test images, there are a total of 1139 classes. I have changed the MLP head in the yaml file to reflect 1139 classes.
Expected: Stable training
What’s happening? Train accuracy increases too quickly reaching almost 90% in ~170 epochs but the test accuracy doesn’t improve at all, remai
log (3).txt
ning close to 0 for the most part. While performing supervised training in Pytorch we are able to get 70% accuracy.
Any insights on why this might be happening? Suggestions to effectively utilize the VISSL pipelines will be appreciated.
Command:
python3 tools/run_distributed_engines.py hydra.verbose=true config=benchmark/fulltune/imagenet1k/train.yaml config.DATA.TRAIN.DATASET_NAMES=[dummy_data_folder] config.DATA.TRAIN.DATA_SOURCES=[disk_folder] config.DATA.TRAIN.LABEL_SOURCES=[disk_folder] config.DATA.TRAIN.BATCHSIZE_PER_REPLICA=16 config.DATA.TRAIN.DATA_PATHS=["/home/images/train"] config.DATA.TEST.DATA_SOURCES=[disk_folder] config.DATA.TEST.LABEL_SOURCES=[disk_folder] config.DATA.TEST.DATASET_NAMES=[dummy_data_folder] config.DATA.TEST.BATCHSIZE_PER_REPLICA=16 config.DATA.TEST.DATA_PATHS=["/home/images/test"] config.OPTIMIZER.num_epochs=250 config.OPTIMIZER.param_schedulers.lr.values=[0.01,0.001] config.OPTIMIZER.param_schedulers.lr.milestones=[1] config.DISTRIBUTED.NUM_NODES=1 config.DISTRIBUTED.NUM_PROC_PER_NODE=1 config.HOOKS.TENSORBOARD_SETUP.USE_TENSORBOARD=true config.HOOKS.MEMORY_SUMMARY.PRINT_MEMORY_SUMMARY=false config.CHECKPOINT.DIR="/home/new_exp/checkpoint_supervised_2" config.MODEL.WEIGHTS_INIT.PARAMS_FILE="/home/resnet50-19c8e357.pth" config.MODEL.WEIGHTS_INIT.APPEND_PREFIX="trunk._feature_blocks." config.MODEL.WEIGHTS_INIT.STATE_DICT_KEY_NAME=""
Environment:
Provide your environment information using the following command:
sys.platform linux
Python 3.6.9 (default, Jun 29 2022, 11:45:57) [GCC 8.4.0]
numpy 1.19.5
Pillow 8.4.0
vissl 0.1.6 @/home/vissl/vissl
GPU available True
GPU 0 Quadro GV100
CUDA_HOME /usr
torchvision 0.9.0+cu101 @/home/.local/lib/python3.6/site-packages/torchvision
hydra 1.0.7
@/home/.local/lib/python3.6/site-packages/hydra
classy_vision 0.7.0.dev @/home/.local/lib/python3.6/site-packages/classy_vision
tensorboard 2.9.1
apex 0.1
@/home/.local/lib/python3.6/site-packages/apex
cv2 4.6.0
PyTorch 1.8.0+cu101
@/home/.local/lib/python3.6/site-packages/torch
PyTorch debug build False
PyTorch built with:
CPU info:
Architecture x86_64
CPU op-mode(s) 32-bit, 64-bit
Byte Order Little Endian
CPU(s) 12
On-line CPU(s) list 0-11
Thread(s) per core 2
Core(s) per socket 6
Socket(s) 1
NUMA node(s) 1
Vendor ID GenuineIntel
CPU family 6
Model 85
Model name Intel(R) Core(TM) i7-7800X CPU @ 3.50GHz
Stepping 4
CPU MHz 3999.959
CPU max MHz 4000.0000
CPU min MHz 1200.0000
BogoMIPS 6999.82
Virtualization VT-x
L1d cache 32K
L1i cache 32K
L2 cache 1024K
L3 cache 8448K
NUMA node0 CPU(s) 0-11
The text was updated successfully, but these errors were encountered: