Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] cannot use GPU when conduct evaluation #1759

Open
2 tasks done
takagi97 opened this issue Dec 12, 2024 · 0 comments
Open
2 tasks done

[Bug] cannot use GPU when conduct evaluation #1759

takagi97 opened this issue Dec 12, 2024 · 0 comments
Assignees

Comments

@takagi97
Copy link

takagi97 commented Dec 12, 2024

Prerequisite

Type

I have modified the code (config is not considered code), or I'm working on my own tasks/models/datasets.

Environment

{'CUDA available': True,
'CUDA_HOME': '/usr/local/cuda',
'GCC': 'gcc (GCC) 8.3.1 20191121 (Red Hat 8.3.1-5)',
'GPU 0,1,2,3,4,5,6,7': 'NVIDIA A100-SXM4-40GB',
'MMEngine': '0.10.5',
'MUSA available': False,
'NVCC': 'Cuda compilation tools, release 12.1, V12.1.66',
'OpenCV': '4.10.0',
'PyTorch': '2.4.0+cu121',
'PyTorch compiling details': 'PyTorch built with:\n'
' - GCC 9.3\n'
' - C++ Version: 201703\n'
' - Intel(R) oneAPI Math Kernel Library Version '
'2022.2-Product Build 20220804 for Intel(R) 64 '
'architecture applications\n'
' - Intel(R) MKL-DNN v3.4.2 (Git Hash '
'1137e04ec0b5251ca2b4400a4fd3c667ce843d67)\n'
' - OpenMP 201511 (a.k.a. OpenMP 4.5)\n'
' - LAPACK is enabled (usually provided by '
'MKL)\n'
' - NNPACK is enabled\n'
' - CPU capability usage: AVX512\n'
' - CUDA Runtime 12.1\n'
' - NVCC architecture flags: '
'-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90\n'
' - CuDNN 90.1 (built against CUDA 12.4)\n'
' - Magma 2.6.1\n'
' - Build settings: BLAS_INFO=mkl, '
'BUILD_TYPE=Release, CUDA_VERSION=12.1, '
'CUDNN_VERSION=9.1.0, '
'CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, '
'CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 '
'-fabi-version=11 -fvisibility-inlines-hidden '
'-DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO '
'-DLIBKINETO_NOROCTRACER -DUSE_FBGEMM '
'-DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK '
'-DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC '
'-Wall -Wextra -Werror=return-type '
'-Werror=non-virtual-dtor -Werror=bool-operation '
'-Wnarrowing -Wno-missing-field-initializers '
'-Wno-type-limits -Wno-array-bounds '
'-Wno-unknown-pragmas -Wno-unused-parameter '
'-Wno-unused-function -Wno-unused-result '
'-Wno-strict-overflow -Wno-strict-aliasing '
'-Wno-stringop-overflow -Wsuggest-override '
'-Wno-psabi -Wno-error=pedantic '
'-Wno-error=old-style-cast -Wno-missing-braces '
'-fdiagnostics-color=always -faligned-new '
'-Wno-unused-but-set-variable '
'-Wno-maybe-uninitialized -fno-math-errno '
'-fno-trapping-math -Werror=format '
'-Wno-stringop-overflow, LAPACK_INFO=mkl, '
'PERF_WITH_AVX=1, PERF_WITH_AVX2=1, '
'PERF_WITH_AVX512=1, TORCH_VERSION=2.4.0, '
'USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=1, '
'USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, '
'USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, '
'USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, '
'USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, '
'USE_ROCM_KERNEL_ASSERT=OFF, \n',
'Python': '3.10.14 (main, May 6 2024, 19:42:50) [GCC 11.2.0]',
'TorchVision': '0.19.0+cu121',
'lmdeploy': "not installed:No module named 'lmdeploy'",
'numpy_random_seed': 2147483648,
'opencompass': '0.3.7+f333be1',
'sys.platform': 'linux',
'transformers': '4.46.3'}

Reproduces the problem - code/configuration sample

I have integrated the WMT2024 benchmark (https://www2.statmt.org/wmt24/translation-task.html) into the current project. It uses COMET to conduct evaluation, a pre-trained language model that should be run on GPU. COMET can use GPU to calculate scores when "--debug" is set. However, when "--debug" is not set, COMET cannot access to GPU.

Reproduces the problem - command or script

can access GPUs when calculating COMET scores:
CUDA_VISIBLE_DEVICES=0 python run.py
--datasets wmt_news_gen
--hf-type chat
*--debug *
--work-dir outputs/test_project
--hf-path Llama-3.2-1B-Instruct
cannot access GPUs when calculating COMET scores:
CUDA_VISIBLE_DEVICES=0 python run.py
--datasets wmt_news_gen
--hf-type chat
--work-dir outputs/test_project
--hf-path Llama-3.2-1B-Instruct

Reproduces the problem - error message

can access GPUs when calculating COMET scores:
Starting inference process...
0%| | 0/1 [00:00<?, ?it/s]miniconda3/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:590: UserWarning: do_sample is set to False. However, temperature is set to 0.6 -- this flag is only used in sample-based generation modes. You should set do_sample=True or unset temperature.
warnings.warn(
/miniconda3/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:595: UserWarning: do_sample is set to False. However, top_p is set to 0.9 -- this flag is only used in sample-based generation modes. You should set do_sample=True or unset top_p.
warnings.warn(
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:10<00:00, 10.32s/it]
12/12 11:08:01 - OpenCompass - INFO - Partitioned into 1 tasks.
Encoder model frozen.
Lightning automatically upgraded your loaded checkpoint from v1.8.3.post1 to v2.4.0. To apply the upgrade to your files permanently, run python -m pytorch_lightning.utilities.upgrade_checkpoint ../../original_weights/wmt22-comet-da/checkpoints/model.ckpt
Encoder model frozen.
/miniconda3/lib/python3.10/site-packages/pytorch_lightning/core/saving.py:195: Found keys that are not in the model state dict but in the checkpoint: ['encoder.model.embeddings.position_ids']
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs

You are using a CUDA device ('NVIDIA A100-SXM4-40GB') that has Tensor Cores. To properly utilize them, you should set torch.set_float32_matmul_precision('medium' | 'high') which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Predicting DataLoader 0: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 8.55it/s]

cannot access GPUs when calculating COMET scores:
launch OpenICLInfer[Llama-3.2-1B-Instruct_hf/wmt_2024_news_cs-uk] on GPU 0
12/12 11:51:26 - OpenCompass - INFO - Partitioned into 1 tasks.

0%| | 0/1 [00:00<?, ?it/s]

0%| | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [05:10<00:00, 310.83s/it]
100%|██████████| 1/1 [05:10<00:00, 310.83s/it]
launch OpenICLEval[Llama-3.2-1B-Instruct_hf/wmt_2024_news_cs-uk] on CPU
dataset version metric mode Llama-3.2-1B-Instruct_hf


wmt_2024_news_cs-uk 15dad7 BLEU gen 0.57
wmt_2024_news_cs-uk 15dad7 CHRF gen 11.89
wmt_2024_news_cs-uk 15dad7 TER gen 350.76
wmt_2024_news_cs-uk 15dad7 wmt22_comet_da gen 33.30
wmt_2024_news_cs-uk 15dad7 wmt23_cometkiwi_da_xl gen 23.42

Other information

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants