swin_transformer_v2.py RuntimeError Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument max in method wrapper_CUDA_clamp_Tensor) #376
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
when train with swin-transformer-v2 , RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument max in method wrapper_CUDA_clamp_Tensor) happend.
i fixed code
models/swin_transformer_v2.py line 159
Before :
logit_scale = torch.clamp(self.logit_scale, max=torch.log(torch.tensor(1. / 0.01))).exp()
After :
logit_scale = torch.clamp(self.logit_scale, max=torch.log(torch.tensor(1. / 0.01).cuda())).exp()
This is how to reproduce error .
python -m torch.distributed.launch --nproc_per_node 1 --master_port 12345 main.py --eval --cfg ./configs/swinv2/swinv2_tiny_patch4_window8_256.yaml --resume ././Swin-model-1k/swinv2/swinv2_tiny_patch4_window8_256.pth --data-path imagenet
WARNING: CPU IP/backtrace sampling not supported, disabling.
Try the 'nsys status --environment' command to learn more.
WARNING: CPU context switch tracing not supported, disabling.
Try the 'nsys status --environment' command to learn more.
WARNING: CUDA backtraces will not be collected because CPU sampling is disabled.
/home/oongjoon/Desktop/Github/flashattn/lib/python3.10/site-packages/torch/distributed/launch.py:208: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use-env is set by default in torchrun.
If your script expects
--local-rank
argument to be set, pleasechange it to read from
os.environ['LOCAL_RANK']
instead. Seehttps://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions
main()
Tutel has not been installed. To use Swin-MoE, please install Tutel; otherwise, just ignore this.
To use FusedLAMB or FusedAdam, please install apex.
=> merge config from ./configs/swinv2/swinv2_tiny_patch4_window8_256.yaml
RANK and WORLD_SIZE in environ: 0/1
[rank0]:[W115 16:39:15.575750744 ProcessGroupNCCL.cpp:4115] [PG ID 0 PG GUID 0 Rank 0] using GPU 0 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device,or call init_process_group() with a device_id.
[2025-01-15 16:39:15 swinv2_tiny_patch4_window8_256](main.py 434): INFO Full config saved to output/swinv2_tiny_patch4_window8_256/default/config.json
[2025-01-15 16:39:15 swinv2_tiny_patch4_window8_256](main.py 437): INFO AMP_ENABLE: true
AMP_OPT_LEVEL: ''
AUG:
AUTO_AUGMENT: rand-m9-mstd0.5-inc1
COLOR_JITTER: 0.4
CUTMIX: 1.0
CUTMIX_MINMAX: null
MIXUP: 0.8
MIXUP_MODE: batch
MIXUP_PROB: 1.0
MIXUP_SWITCH_PROB: 0.5
RECOUNT: 1
REMODE: pixel
REPROB: 0.25
BASE:
DATA:
BATCH_SIZE: 128
CACHE_MODE: part
DATASET: imagenet
DATA_PATH: imagenet
IMG_SIZE: 256
INTERPOLATION: bicubic
MASK_PATCH_SIZE: 32
MASK_RATIO: 0.6
NUM_WORKERS: 8
PIN_MEMORY: true
ZIP_MODE: false
ENABLE_AMP: false
EVAL_MODE: true
FUSED_LAYERNORM: false
FUSED_WINDOW_PROCESS: false
LOCAL_RANK: 0
MODEL:
DROP_PATH_RATE: 0.2
DROP_RATE: 0.0
LABEL_SMOOTHING: 0.1
NAME: swinv2_tiny_patch4_window8_256
NUM_CLASSES: 1000
PRETRAINED: ''
RESUME: ././Swin-model-1k/swinv2/swinv2_tiny_patch4_window8_256.pth
SIMMIM:
NORM_TARGET:
ENABLE: false
PATCH_SIZE: 47
SWIN:
APE: false
DEPTHS:
EMBED_DIM: 96
IN_CHANS: 3
MLP_RATIO: 4.0
NUM_HEADS:
PATCH_NORM: true
PATCH_SIZE: 4
QKV_BIAS: true
QK_SCALE: null
WINDOW_SIZE: 7
SWINV2:
APE: false
DEPTHS:
EMBED_DIM: 96
IN_CHANS: 3
MLP_RATIO: 4.0
NUM_HEADS:
PATCH_NORM: true
PATCH_SIZE: 4
PRETRAINED_WINDOW_SIZES:
QKV_BIAS: true
WINDOW_SIZE: 8
SWIN_MLP:
APE: false
DEPTHS:
EMBED_DIM: 96
IN_CHANS: 3
MLP_RATIO: 4.0
NUM_HEADS:
PATCH_NORM: true
PATCH_SIZE: 4
WINDOW_SIZE: 7
SWIN_MOE:
APE: false
AUX_LOSS_WEIGHT: 0.01
CAPACITY_FACTOR: 1.25
COSINE_ROUTER: false
COSINE_ROUTER_DIM: 256
COSINE_ROUTER_INIT_T: 0.5
DEPTHS:
EMBED_DIM: 96
GATE_NOISE: 1.0
INIT_STD: 0.02
IN_CHANS: 3
IS_GSHARD_LOSS: false
MLP_FC2_BIAS: true
MLP_RATIO: 4.0
MOE_BLOCKS:
MOE_DROP: 0.0
NORMALIZE_GATE: false
NUM_HEADS:
NUM_LOCAL_EXPERTS: 1
PATCH_NORM: true
PATCH_SIZE: 4
PRETRAINED_WINDOW_SIZES:
QKV_BIAS: true
QK_SCALE: null
TOP_VALUE: 1
USE_BPR: true
WINDOW_SIZE: 7
TYPE: swinv2
OUTPUT: output/swinv2_tiny_patch4_window8_256/default
PRINT_FREQ: 10
SAVE_FREQ: 1
SEED: 0
TAG: default
TEST:
CROP: true
SEQUENTIAL: false
SHUFFLE: false
THROUGHPUT_MODE: false
TRAIN:
ACCUMULATION_STEPS: 1
AUTO_RESUME: true
BASE_LR: 0.000125
CLIP_GRAD: 5.0
EPOCHS: 300
LAYER_DECAY: 1.0
LR_SCHEDULER:
DECAY_EPOCHS: 30
DECAY_RATE: 0.1
GAMMA: 0.1
MULTISTEPS: []
NAME: cosine
WARMUP_PREFIX: true
MIN_LR: 1.25e-06
MOE:
SAVE_MASTER: false
OPTIMIZER:
BETAS:
EPS: 1.0e-08
MOMENTUM: 0.9
NAME: adamw
START_EPOCH: 0
USE_CHECKPOINT: false
WARMUP_EPOCHS: 20
WARMUP_LR: 1.25e-07
WEIGHT_DECAY: 0.05
[2025-01-15 16:39:15 swinv2_tiny_patch4_window8_256](main.py 438): INFO {"cfg": "./configs/swinv2/swinv2_tiny_patch4_window8_256.yaml", "opts": null, "batch_size": null, "data_path": "imagenet", "zip": false, "cache_mode": "part", "pretrained": null, "resume": "././Swin-model-1k/swinv2/swinv2_tiny_patch4_window8_256.pth", "accumulation_steps": null, "use_checkpoint": false, "disable_amp": false, "amp_opt_level": null, "output": "output", "tag": null, "eval": true, "throughput": false, "fused_window_process": false, "fused_layernorm": false, "optim": null}
local rank 0 / global rank 0 successfully build train dataset
local rank 0 / global rank 0 successfully build val dataset
[2025-01-15 16:39:17 swinv2_tiny_patch4_window8_256](main.py 93): INFO Creating model:swinv2/swinv2_tiny_patch4_window8_256
/home/oongjoon/Desktop/Github/flashattn/lib/python3.10/site-packages/torch/functional.py:534: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3595.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
[2025-01-15 16:39:17 swinv2_tiny_patch4_window8_256](main.py 95): INFO SwinTransformerV2(
(patch_embed): PatchEmbed(
(proj): Conv2d(3, 96, kernel_size=(4, 4), stride=(4, 4))
(norm): LayerNorm((96,), eps=1e-05, elementwise_affine=True)
)
(pos_drop): Dropout(p=0.0, inplace=False)
(layers): ModuleList(
(0): BasicLayer(
dim=96, input_resolution=(64, 64), depth=2
(blocks): ModuleList(
(0): SwinTransformerBlock(
dim=96, input_resolution=(64, 64), num_heads=3, window_size=8, shift_size=0, mlp_ratio=4.0
(norm1): LayerNorm((96,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
dim=96, window_size=(8, 8), pretrained_window_size=(0, 0), num_heads=3
(cpb_mlp): Sequential(
(0): Linear(in_features=2, out_features=512, bias=True)
(1): ReLU(inplace=True)
(2): Linear(in_features=512, out_features=3, bias=False)
)
(qkv): Linear(in_features=96, out_features=288, bias=False)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=96, out_features=96, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): Identity()
(norm2): LayerNorm((96,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=96, out_features=384, bias=True)
(act): GELU(approximate='none')
(fc2): Linear(in_features=384, out_features=96, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(1): SwinTransformerBlock(
dim=96, input_resolution=(64, 64), num_heads=3, window_size=8, shift_size=4, mlp_ratio=4.0
(norm1): LayerNorm((96,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
dim=96, window_size=(8, 8), pretrained_window_size=(0, 0), num_heads=3
(cpb_mlp): Sequential(
(0): Linear(in_features=2, out_features=512, bias=True)
(1): ReLU(inplace=True)
(2): Linear(in_features=512, out_features=3, bias=False)
)
(qkv): Linear(in_features=96, out_features=288, bias=False)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=96, out_features=96, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath()
(norm2): LayerNorm((96,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=96, out_features=384, bias=True)
(act): GELU(approximate='none')
(fc2): Linear(in_features=384, out_features=96, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
)
(downsample): PatchMerging(
input_resolution=(64, 64), dim=96
(reduction): Linear(in_features=384, out_features=192, bias=False)
(norm): LayerNorm((192,), eps=1e-05, elementwise_affine=True)
)
)
(1): BasicLayer(
dim=192, input_resolution=(32, 32), depth=2
(blocks): ModuleList(
(0): SwinTransformerBlock(
dim=192, input_resolution=(32, 32), num_heads=6, window_size=8, shift_size=0, mlp_ratio=4.0
(norm1): LayerNorm((192,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
dim=192, window_size=(8, 8), pretrained_window_size=(0, 0), num_heads=6
(cpb_mlp): Sequential(
(0): Linear(in_features=2, out_features=512, bias=True)
(1): ReLU(inplace=True)
(2): Linear(in_features=512, out_features=6, bias=False)
)
(qkv): Linear(in_features=192, out_features=576, bias=False)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=192, out_features=192, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath()
(norm2): LayerNorm((192,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=192, out_features=768, bias=True)
(act): GELU(approximate='none')
(fc2): Linear(in_features=768, out_features=192, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(1): SwinTransformerBlock(
dim=192, input_resolution=(32, 32), num_heads=6, window_size=8, shift_size=4, mlp_ratio=4.0
(norm1): LayerNorm((192,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
dim=192, window_size=(8, 8), pretrained_window_size=(0, 0), num_heads=6
(cpb_mlp): Sequential(
(0): Linear(in_features=2, out_features=512, bias=True)
(1): ReLU(inplace=True)
(2): Linear(in_features=512, out_features=6, bias=False)
)
(qkv): Linear(in_features=192, out_features=576, bias=False)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=192, out_features=192, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath()
(norm2): LayerNorm((192,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=192, out_features=768, bias=True)
(act): GELU(approximate='none')
(fc2): Linear(in_features=768, out_features=192, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
)
(downsample): PatchMerging(
input_resolution=(32, 32), dim=192
(reduction): Linear(in_features=768, out_features=384, bias=False)
(norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
)
)
(2): BasicLayer(
dim=384, input_resolution=(16, 16), depth=6
(blocks): ModuleList(
(0): SwinTransformerBlock(
dim=384, input_resolution=(16, 16), num_heads=12, window_size=8, shift_size=0, mlp_ratio=4.0
(norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
dim=384, window_size=(8, 8), pretrained_window_size=(0, 0), num_heads=12
(cpb_mlp): Sequential(
(0): Linear(in_features=2, out_features=512, bias=True)
(1): ReLU(inplace=True)
(2): Linear(in_features=512, out_features=12, bias=False)
)
(qkv): Linear(in_features=384, out_features=1152, bias=False)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=384, out_features=384, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath()
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=384, out_features=1536, bias=True)
(act): GELU(approximate='none')
(fc2): Linear(in_features=1536, out_features=384, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(1): SwinTransformerBlock(
dim=384, input_resolution=(16, 16), num_heads=12, window_size=8, shift_size=4, mlp_ratio=4.0
(norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
dim=384, window_size=(8, 8), pretrained_window_size=(0, 0), num_heads=12
(cpb_mlp): Sequential(
(0): Linear(in_features=2, out_features=512, bias=True)
(1): ReLU(inplace=True)
(2): Linear(in_features=512, out_features=12, bias=False)
)
(qkv): Linear(in_features=384, out_features=1152, bias=False)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=384, out_features=384, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath()
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=384, out_features=1536, bias=True)
(act): GELU(approximate='none')
(fc2): Linear(in_features=1536, out_features=384, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(2): SwinTransformerBlock(
dim=384, input_resolution=(16, 16), num_heads=12, window_size=8, shift_size=0, mlp_ratio=4.0
(norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
dim=384, window_size=(8, 8), pretrained_window_size=(0, 0), num_heads=12
(cpb_mlp): Sequential(
(0): Linear(in_features=2, out_features=512, bias=True)
(1): ReLU(inplace=True)
(2): Linear(in_features=512, out_features=12, bias=False)
)
(qkv): Linear(in_features=384, out_features=1152, bias=False)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=384, out_features=384, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath()
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=384, out_features=1536, bias=True)
(act): GELU(approximate='none')
(fc2): Linear(in_features=1536, out_features=384, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(3): SwinTransformerBlock(
dim=384, input_resolution=(16, 16), num_heads=12, window_size=8, shift_size=4, mlp_ratio=4.0
(norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
dim=384, window_size=(8, 8), pretrained_window_size=(0, 0), num_heads=12
(cpb_mlp): Sequential(
(0): Linear(in_features=2, out_features=512, bias=True)
(1): ReLU(inplace=True)
(2): Linear(in_features=512, out_features=12, bias=False)
)
(qkv): Linear(in_features=384, out_features=1152, bias=False)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=384, out_features=384, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath()
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=384, out_features=1536, bias=True)
(act): GELU(approximate='none')
(fc2): Linear(in_features=1536, out_features=384, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(4): SwinTransformerBlock(
dim=384, input_resolution=(16, 16), num_heads=12, window_size=8, shift_size=0, mlp_ratio=4.0
(norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
dim=384, window_size=(8, 8), pretrained_window_size=(0, 0), num_heads=12
(cpb_mlp): Sequential(
(0): Linear(in_features=2, out_features=512, bias=True)
(1): ReLU(inplace=True)
(2): Linear(in_features=512, out_features=12, bias=False)
)
(qkv): Linear(in_features=384, out_features=1152, bias=False)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=384, out_features=384, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath()
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=384, out_features=1536, bias=True)
(act): GELU(approximate='none')
(fc2): Linear(in_features=1536, out_features=384, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(5): SwinTransformerBlock(
dim=384, input_resolution=(16, 16), num_heads=12, window_size=8, shift_size=4, mlp_ratio=4.0
(norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
dim=384, window_size=(8, 8), pretrained_window_size=(0, 0), num_heads=12
(cpb_mlp): Sequential(
(0): Linear(in_features=2, out_features=512, bias=True)
(1): ReLU(inplace=True)
(2): Linear(in_features=512, out_features=12, bias=False)
)
(qkv): Linear(in_features=384, out_features=1152, bias=False)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=384, out_features=384, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath()
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=384, out_features=1536, bias=True)
(act): GELU(approximate='none')
(fc2): Linear(in_features=1536, out_features=384, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
)
(downsample): PatchMerging(
input_resolution=(16, 16), dim=384
(reduction): Linear(in_features=1536, out_features=768, bias=False)
(norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
)
)
(3): BasicLayer(
dim=768, input_resolution=(8, 8), depth=2
(blocks): ModuleList(
(0-1): 2 x SwinTransformerBlock(
dim=768, input_resolution=(8, 8), num_heads=24, window_size=8, shift_size=0, mlp_ratio=4.0
(norm1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
dim=768, window_size=(8, 8), pretrained_window_size=(0, 0), num_heads=24
(cpb_mlp): Sequential(
(0): Linear(in_features=2, out_features=512, bias=True)
(1): ReLU(inplace=True)
(2): Linear(in_features=512, out_features=24, bias=False)
)
(qkv): Linear(in_features=768, out_features=2304, bias=False)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=768, out_features=768, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath()
(norm2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=768, out_features=3072, bias=True)
(act): GELU(approximate='none')
(fc2): Linear(in_features=3072, out_features=768, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
)
)
)
(norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(avgpool): AdaptiveAvgPool1d(output_size=1)
(head): Linear(in_features=768, out_features=1000, bias=True)
)
[2025-01-15 16:39:17 swinv2_tiny_patch4_window8_256](main.py 98): INFO number of params: 28347154
[2025-01-15 16:39:17 swinv2_tiny_patch4_window8_256](main.py 101): INFO number of GFLOPs: 5.925697536
/home/oongjoon/Desktop/Github/flashattn_test/Swin-Transformer/utils.py:203: FutureWarning:
torch.cuda.amp.GradScaler(args...)
is deprecated. Please usetorch.amp.GradScaler('cuda', args...)
instead.self._scaler = torch.cuda.amp.GradScaler()
All checkpoints founded in output/swinv2_tiny_patch4_window8_256/default: []
[2025-01-15 16:39:17 swinv2_tiny_patch4_window8_256](main.py 151): INFO no checkpoint found in output/swinv2_tiny_patch4_window8_256/default, ignoring auto resume
[2025-01-15 16:39:17 swinv2_tiny_patch4_window8_256](utils.py 19): INFO ==============> Resuming form ././Swin-model-1k/swinv2/swinv2_tiny_patch4_window8_256.pth....................
/home/oongjoon/Desktop/Github/flashattn_test/Swin-Transformer/utils.py:24: FutureWarning: You are using
torch.load
withweights_only=False
(the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value forweights_only
will be flipped toTrue
. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user viatorch.serialization.add_safe_globals
. We recommend you start settingweights_only=True
for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.checkpoint = torch.load(config.MODEL.RESUME, map_location='cpu')
[2025-01-15 16:39:17 swinv2_tiny_patch4_window8_256](utils.py 26): INFO
/home/oongjoon/Desktop/Github/flashattn_test/Swin-Transformer/main.py:308: FutureWarning:
torch.cuda.amp.autocast(args...)
is deprecated. Please usetorch.amp.autocast('cuda', args...)
instead.with torch.cuda.amp.autocast(enabled=config.AMP_ENABLE):
[rank0]: Traceback (most recent call last):
[rank0]: File "/home/oongjoon/Desktop/Github/flashattn_test/Swin-Transformer/main.py", line 440, in
[rank0]: main(config)
[rank0]: File "/home/oongjoon/Desktop/Github/flashattn_test/Swin-Transformer/main.py", line 155, in main
[rank0]: acc1, acc5, loss = validate(config, data_loader_val, model)
[rank0]: File "/home/oongjoon/Desktop/Github/flashattn/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/home/oongjoon/Desktop/Github/flashattn_test/Swin-Transformer/main.py", line 314, in validate
[rank0]: output = model(images)
[rank0]: File "/home/oongjoon/Desktop/Github/flashattn/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/home/oongjoon/Desktop/Github/flashattn/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/home/oongjoon/Desktop/Github/flashattn/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1643, in forward
[rank0]: else self._run_ddp_forward(*inputs, **kwargs)
[rank0]: File "/home/oongjoon/Desktop/Github/flashattn/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1459, in _run_ddp_forward
[rank0]: return self.module(*inputs, **kwargs) # type: ignore[index]
[rank0]: File "/home/oongjoon/Desktop/Github/flashattn/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/home/oongjoon/Desktop/Github/flashattn/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1844, in _call_impl
[rank0]: return inner()
[rank0]: File "/home/oongjoon/Desktop/Github/flashattn/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1790, in inner
[rank0]: result = forward_call(*args, **kwargs)
[rank0]: File "/home/oongjoon/Desktop/Github/flashattn_test/Swin-Transformer/models/swin_transformer_v2.py", line 627, in forward
[rank0]: x = self.forward_features(x)
[rank0]: File "/home/oongjoon/Desktop/Github/flashattn_test/Swin-Transformer/models/swin_transformer_v2.py", line 619, in forward_features
[rank0]: x = layer(x)
[rank0]: File "/home/oongjoon/Desktop/Github/flashattn/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/home/oongjoon/Desktop/Github/flashattn/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1844, in _call_impl
[rank0]: return inner()
[rank0]: File "/home/oongjoon/Desktop/Github/flashattn/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1790, in inner
[rank0]: result = forward_call(*args, **kwargs)
[rank0]: File "/home/oongjoon/Desktop/Github/flashattn_test/Swin-Transformer/models/swin_transformer_v2.py", line 434, in forward
[rank0]: x = blk(x)
[rank0]: File "/home/oongjoon/Desktop/Github/flashattn/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]: return self._call_impl(args, **kwargs)
[rank0]: File "/home/oongjoon/Desktop/Github/flashattn/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1844, in _call_impl
[rank0]: return inner()
[rank0]: File "/home/oongjoon/Desktop/Github/flashattn/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1790, in inner
[rank0]: result = forward_call(args, **kwargs)
[rank0]: File "/home/oongjoon/Desktop/Github/flashattn_test/Swin-Transformer/models/swin_transformer_v2.py", line 292, in forward
[rank0]: attn_windows = self.attn(x_windows, mask=self.attn_mask) # nWB, window_sizewindow_size, C
[rank0]: File "/home/oongjoon/Desktop/Github/flashattn/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/home/oongjoon/Desktop/Github/flashattn/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1844, in _call_impl
[rank0]: return inner()
[rank0]: File "/home/oongjoon/Desktop/Github/flashattn/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1790, in inner
[rank0]: result = forward_call(*args, **kwargs)
[rank0]: File "/home/oongjoon/Desktop/Github/flashattn_test/Swin-Transformer/models/swin_transformer_v2.py", line 159, in forward
[rank0]: logit_scale = torch.clamp(self.logit_scale, max=torch.log( torch.tensor(1. / 0.01) ) ).exp()
[rank0]: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument max in method wrapper_CUDA_clamp_Tensor)
[rank0]:[W115 16:39:20.040987096 ProcessGroupNCCL.cpp:1250] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator())
E0115 16:39:21.135000 14576 torch/distributed/elastic/multiprocessing/api.py:869] failed (exitcode: 1) local_rank: 0 (pid: 14603) of binary: /home/oongjoon/Desktop/Github/flashattn/bin/python
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/oongjoon/Desktop/Github/flashattn/lib/python3.10/site-packages/torch/distributed/launch.py", line 208, in
main()
File "/home/oongjoon/Desktop/Github/flashattn/lib/python3.10/site-packages/typing_extensions.py", line 2853, in wrapper
return arg(*args, **kwargs)
File "/home/oongjoon/Desktop/Github/flashattn/lib/python3.10/site-packages/torch/distributed/launch.py", line 204, in main
launch(args)
File "/home/oongjoon/Desktop/Github/flashattn/lib/python3.10/site-packages/torch/distributed/launch.py", line 189, in launch
run(args)
File "/home/oongjoon/Desktop/Github/flashattn/lib/python3.10/site-packages/torch/distributed/run.py", line 910, in run
elastic_launch(
File "/home/oongjoon/Desktop/Github/flashattn/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 138, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/oongjoon/Desktop/Github/flashattn/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
main.py FAILED
Failures:
<NO_OTHER_FAILURES>
Root Cause (first observed failure):
[0]:
time : 2025-01-15_16:39:21
host : oongjoon-System-Product-Name
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 14603)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
Generated: