[Bug] The evaluation of configurations requiring a TopKRetriever, such as for flores_datasets, failed with a KeyError: 'metadata'. #1686

DespairL · 2024-11-13T07:56:59Z

Prerequisite

I have searched Issues and Discussions but cannot get the expected help.
The bug has not been fixed in the latest version.

Type

I'm evaluating with the officially supported tasks/models/datasets.

Environment

{'CUDA available': True,
'CUDA_HOME': '/home/nfs03/cuda_tools/cuda-12.1',
'GCC': 'gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0',
'GPU 0,1,2,3,4,5,6': 'NVIDIA GeForce RTX 3090',
'MMEngine': '0.10.5',
'MUSA available': False,
'NVCC': 'Cuda compilation tools, release 12.1, V12.1.105',
'OpenCV': '4.10.0',
'PyTorch': '2.4.0+cu121',
'PyTorch compiling details': 'PyTorch built with:\n'
' - GCC 9.3\n'
' - C++ Version: 201703\n'
' - Intel(R) oneAPI Math Kernel Library Version '
'2022.2-Product Build 20220804 for Intel(R) 64 '
'architecture applications\n'
' - Intel(R) MKL-DNN v3.4.2 (Git Hash '
'1137e04ec0b5251ca2b4400a4fd3c667ce843d67)\n'
' - OpenMP 201511 (a.k.a. OpenMP 4.5)\n'
' - LAPACK is enabled (usually provided by '
'MKL)\n'
' - NNPACK is enabled\n'
' - CPU capability usage: AVX512\n'
' - CUDA Runtime 12.1\n'
' - NVCC architecture flags: '
'-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gen
code;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90\n'
' - CuDNN 90.1 (built against CUDA 12.4)\n'
' - Magma 2.6.1\n'
' - Build settings: BLAS_INFO=mkl, '
'BUILD_TYPE=Release, CUDA_VERSION=12.1, '
'CUDNN_VERSION=9.1.0, '
'CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, '
'CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 '
'-fabi-version=11 -fvisibility-inlines-hidden '
'-DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO '
'-DLIBKINETO_NOROCTRACER -DUSE_FBGEMM '
'-DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK '
'-DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC '
'-Wall -Wextra -Werror=return-type '
'-Werror=non-virtual-dtor -Werror=bool-operation '
'-Wnarrowing -Wno-missing-field-initializers '
'-Wno-type-limits -Wno-array-bounds '
'-Wno-unknown-pragmas -Wno-unused-parameter '
'-Wno-unused-function -Wno-unused-result '
'-Wno-strict-overflow -Wno-strict-aliasing '
'-Wno-stringop-overflow -Wsuggest-override '
'-Wno-psabi -Wno-error=pedantic '
'-Wno-error=old-style-cast -Wno-missing-braces '
'-fdiagnostics-color=always -faligned-new '
'-Wno-unused-but-set-variable '
'-Wno-maybe-uninitialized -fno-math-errno '
'-fno-trapping-math -Werror=format '
'-Wno-stringop-overflow, LAPACK_INFO=mkl, '
'PERF_WITH_AVX=1, PERF_WITH_AVX2=1, '
'PERF_WITH_AVX512=1, TORCH_VERSION=2.4.0, '
'USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=1, '
'USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, '
'USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, '
'USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, '
'USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, '
'USE_ROCM_KERNEL_ASSERT=OFF, \n',
'Python': '3.10.0 (default, Mar 3 2022, 09:58:08) [GCC 7.5.0]',
'TorchVision': '0.19.0+cu121',
'lmdeploy': "not installed:No module named 'lmdeploy'",
'numpy_random_seed': 2147483648,
'opencompass': '0.3.5+',
'sys.platform': 'linux',
'transformers': '4.46.2'}

Reproduces the problem - code/configuration sample

from mmengine.config import read_base
from opencompass.models import VLLM

model_abbr = "Llama3_8B_Base"
num_gpus = 2
lora_path = None
seed = 0
max_seq_len = 4096
max_out_len = 100
batch_size = 32
temperature = 0.0
top_p = 0.8
max_tokens = 1024

with read_base():
from opencompass.configs.datasets.flores.flores_gen_806ede import flores_datasets

datasets = []
datasets += flores_datasets

models = [
dict(
type=VLLM,
abbr=model_abbr,
path="/home/nfs02/model/llama-3-8b-instruct",
model_kwargs=dict(tensor_parallel_size=num_gpus, dtype='bfloat16',
seed=seed, max_model_len=max_seq_len, enable_lora=True,),
max_out_len=max_out_len,
max_seq_len=max_seq_len,
batch_size=batch_size,
generation_kwargs=dict(temperature=temperature, top_p=top_p, max_tokens=max_tokens,),
stop_words=['<|end_of_text|>', '<|eot_id|>'],
lora_path=lora_path,
run_cfg=dict(num_gpus=num_gpus),
),
]
work_dir = './general_ability/Llama3/'

Reproduces the problem - command or script

opencompass x.py --debug

Reproduces the problem - error message

[2024-11-13 15:05:19,372] [opencompass.openicl.icl_retriever.icl_topk_retriever] [INFO] Creating index for index set...
0%| | 0/997 [00:00<?, ?it/s]
You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the __call__ method is faster than using a method to encode the text followed by a call
to the pad method to get a padded encoding.
[rank0]: Traceback (most recent call last):
[rank0]: File "/home/nfs03/anaconda3/envs/LLMs_Eval_Opencompass/bin/opencompass", line 8, in
[rank0]: sys.exit(main())
[rank0]: File "/home/nfs03/anaconda3/envs/LLMs_Eval_Opencompass/lib/python3.10/site-packages/opencompass/cli/main.py", line 308, in main
[rank0]: runner(tasks)
[rank0]: File "/home/nfs03/anaconda3/envs/LLMs_Eval_Opencompass/lib/python3.10/site-packages/opencompass/runners/base.py", line 38, in call
[rank0]: status = self.launch(tasks)
[rank0]: File "/home/nfs03/anaconda3/envs/LLMs_Eval_Opencompass/lib/python3.10/site-packages/opencompass/runners/local.py", line 128, in launch
[rank0]: task.run(cur_model=getattr(self, 'cur_model',
[rank0]: File "/home/nfs03/anaconda3/envs/LLMs_Eval_Opencompass/lib/python3.10/site-packages/opencompass/tasks/openicl_infer.py", line 88, in run
[rank0]: self._inference()
[rank0]: File "/home/nfs03/anaconda3/envs/LLMs_Eval_Opencompass/lib/python3.10/site-packages/opencompass/tasks/openicl_infer.py", line 106, in _inference
[rank0]: retriever = ICL_RETRIEVERS.build(retriever_cfg)
[rank0]: File "/home/nfs03/anaconda3/envs/LLMs_Eval_Opencompass/lib/python3.10/site-packages/mmengine/registry/registry.py", line 570, in build
[rank0]: return self.build_func(cfg, *args, **kwargs, registry=self)
[rank0]: File "/home/nfs03/anaconda3/envs/LLMs_Eval_Opencompass/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg
[rank0]: obj = obj_cls(**args) # type: ignore
[rank0]: File "/home/nfs03/anaconda3/envs/LLMs_Eval_Opencompass/lib/python3.10/site-packages/opencompass/openicl/icl_retriever/icl_topk_retriever.py", line 83, in init
[rank0]: self.index = self.create_index()
[rank0]: File "/home/nfs03/anaconda3/envs/LLMs_Eval_Opencompass/lib/python3.10/site-packages/opencompass/openicl/icl_retriever/icl_topk_retriever.py", line 99, in create_index
[rank0]: res_list = self.forward(dataloader,
[rank0]: File "/home/nfs03/anaconda3/envs/LLMs_Eval_Opencompass/lib/python3.10/site-packages/opencompass/openicl/icl_retriever/icl_topk_retriever.py", line 131, in forward
[rank0]: metadata = entry.pop('metadata')
[rank0]: File "/home/nfs03/anaconda3/envs/LLMs_Eval_Opencompass/lib/python3.10/_collections_abc.py", line 954, in pop
[rank0]: value = self[key]
[rank0]: File "/home/nfs03/anaconda3/envs/LLMs_Eval_Opencompass/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 270, in getitem
[rank0]: return self.data[item]
[rank0]: KeyError: 'metadata'

Other information

'''
with torch.no_grad():
metadata = entry.pop('metadata')
raw_text = self.tokenizer.batch_decode(
entry['input_ids'],
skip_special_tokens=True,
verbose=False)
res = self.model.encode(raw_text, show_progress_bar=False)
'''
In the source code, TopkRetriever calls the metadata for each data entry; however, when using the flores_gen_806ede configuration, the official datasets provided—such as flores—only contain the keys input_ids and attention_mask after processing.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] The evaluation of configurations requiring a TopKRetriever, such as for flores_datasets, failed with a KeyError: 'metadata'. #1686

[Bug] The evaluation of configurations requiring a TopKRetriever, such as for flores_datasets, failed with a KeyError: 'metadata'. #1686

DespairL commented Nov 13, 2024

[Bug] The evaluation of configurations requiring a TopKRetriever, such as for flores_datasets, failed with a KeyError: 'metadata'. #1686

[Bug] The evaluation of configurations requiring a TopKRetriever, such as for flores_datasets, failed with a KeyError: 'metadata'. #1686

Comments

DespairL commented Nov 13, 2024

Prerequisite

Type

Environment

Reproduces the problem - code/configuration sample

Reproduces the problem - command or script

Reproduces the problem - error message

Other information