You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
/usr/lib/python3.10/inspect.py:288: FutureWarning: torch.distributed.reduce_op is deprecated, please use torch.distributed.ReduceOp instead
return isinstance(object, types.FunctionType)
/usr/local/lib/python3.10/dist-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
10/18/2024 03:08:59 - WARNING - main - trust_remote_code is set, there is no guarantee this model works properly and it may fail
10/18/2024 03:08:59 - INFO - main - Single-device run.
2024-10-18 03:09:02 [WARNING][auto_accelerator.py:422] Auto detect accelerator: HPU_Accelerator.
2024-10-18 03:09:02 [INFO][utils.py:201] Preparation started.
2024-10-18 03:09:02 [INFO][quantize.py:160] Start to prepare model with fp8_quant.
============================= HABANA PT BRIDGE CONFIGURATION ===========================
PT_HPU_LAZY_MODE = 1
PT_RECIPE_CACHE_PATH =
PT_CACHE_FOLDER_DELETE = 0
PT_HPU_RECIPE_CACHE_CONFIG =
PT_HPU_MAX_COMPOUND_OP_SIZE = 9223372036854775807
PT_HPU_LAZY_ACC_PAR_MODE = 1
PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES = 0
PT_HPU_EAGER_PIPELINE_ENABLE = 1
PT_HPU_EAGER_COLLECTIVE_PIPELINE_ENABLE = 1
---------------------------: System Configuration :---------------------------
Num CPU Cores : 112
CPU RAM : 1056428680 KB
2024-10-18 03:09:10 [INFO][utils.py:201] Preparation end.
Initializing inference mode
10/18/2024 03:09:11 - INFO - main - Args: Namespace(buckets=[16, 32, 64, 128, 189, 284], output_file='acc_bloomz7b_bs1_measure.txt', tasks=['hellaswag', 'lambada_openai', 'piqa', 'winogrande'], limit_iters=None, device='hpu', model_name_or_path='/DISK0/bloomz-7b1', bf16=True, max_new_tokens=100, max_input_tokens=0, batch_size=1, warmup=3, n_iterations=5, local_rank=0, use_kv_cache=True, use_hpu_graphs=True, dataset_name=None, column_name=None, do_sample=False, num_beams=1, top_k=None, penalty_alpha=None, trim_logits=True, seed=27, profiling_warmup_steps=0, profiling_steps=0, profiling_record_shapes=False, prompt=None, bad_words=None, force_words=None, assistant_model=None, peft_model=None, num_return_sequences=1, token=None, model_revision='main', attn_softmax_bf16=True, output_dir=None, bucket_size=128, bucket_internal=True, dataset_max_samples=-1, limit_hpu_graphs=False, show_graphs_count=False, reuse_cache=False, verbose_workers=False, simulate_dyn_prompt=None, reduce_recompile=False, use_flash_attention=True, flash_attention_recompute=True, flash_attention_causal_mask=False, flash_attention_fast_softmax=True, book_source=False, torch_compile=False, ignore_eos=True, temperature=1.0, top_p=1.0, const_serialization_path=None, trust_remote_code=True, parallel_strategy='none', input_embeds=False, run_partial_dataset=False, load_quantized_model_with_autogptq=False, disk_offload=False, load_quantized_model_with_inc=False, local_quantized_inc_model_path=None, quant_config='./quantization_config/maxabs_measure.json', world_size=0, global_rank=0)
10/18/2024 03:09:11 - INFO - main - device: hpu, n_hpu: 0, bf16: True
10/18/2024 03:09:11 - INFO - main - Model initialization took 13.380s
Traceback (most recent call last):
File "/home/intel/optimum-habana/examples/text-generation/run_lm_eval.py", line 231, in
main()
File "/home/intel/optimum-habana/examples/text-generation/run_lm_eval.py", line 198, in main
lm_tasks = lm_eval.tasks.get_task_dict(args.tasks)
File "/home/intel/optimum-habana/examples/text-generation/tmp/lm-evaluation-harness-0bf683b4e6a9df359b3156ba9ba8d62bdd47e0c0/lm_eval/tasks/init.py", line 415, in get_task_dict
task_name_dict = {
File "/home/intel/optimum-habana/examples/text-generation/tmp/lm-evaluation-harness-0bf683b4e6a9df359b3156ba9ba8d62bdd47e0c0/lm_eval/tasks/init.py", line 416, in
task_name: get_task(task_name)()
File "/home/intel/optimum-habana/examples/text-generation/tmp/lm-evaluation-harness-0bf683b4e6a9df359b3156ba9ba8d62bdd47e0c0/lm_eval/base.py", line 513, in init
self.download(data_dir, cache_dir, download_mode)
File "/home/intel/optimum-habana/examples/text-generation/tmp/lm-evaluation-harness-0bf683b4e6a9df359b3156ba9ba8d62bdd47e0c0/lm_eval/base.py", line 542, in download
self.dataset = datasets.load_dataset(
File "/usr/local/lib/python3.10/dist-packages/datasets/load.py", line 2606, in load_dataset
builder_instance = load_dataset_builder(
File "/usr/local/lib/python3.10/dist-packages/datasets/load.py", line 2277, in load_dataset_builder
dataset_module = dataset_module_factory(
File "/usr/local/lib/python3.10/dist-packages/datasets/load.py", line 1923, in dataset_module_factory
raise e1 from None
File "/usr/local/lib/python3.10/dist-packages/datasets/load.py", line 1875, in dataset_module_factory
can_load_config_from_parquet_export = "DEFAULT_CONFIG_NAME" not in f.read()
File "/usr/lib/python3.10/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
Expected behavior
quantization successfully running
The text was updated successfully, but these errors were encountered:
System Info
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
HF_ENDPOINT=https://hf-mirror.com
QUANT_CONFIG=./quantization_config/maxabs_measure.json python run_lm_eval.py
-o acc_yi34b_bs1_measure.txt
--model_name_or_path /mnt/disk1/Yi-34B
--attn_softmax_bf16
--use_hpu_graphs
--trim_logits
--use_kv_cache
--bucket_size=128
--bucket_internal
--use_flash_attention
--flash_attention_recompute
--bf16
--batch_size 1
--trust_remote_code
/usr/lib/python3.10/inspect.py:288: FutureWarning:
torch.distributed.reduce_op
is deprecated, please usetorch.distributed.ReduceOp
insteadreturn isinstance(object, types.FunctionType)
/usr/local/lib/python3.10/dist-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
10/18/2024 03:08:59 - WARNING - main -
trust_remote_code
is set, there is no guarantee this model works properly and it may fail10/18/2024 03:08:59 - INFO - main - Single-device run.
2024-10-18 03:09:02 [WARNING][auto_accelerator.py:422] Auto detect accelerator: HPU_Accelerator.
2024-10-18 03:09:02 [INFO][utils.py:201] Preparation started.
2024-10-18 03:09:02 [INFO][quantize.py:160] Start to prepare model with fp8_quant.
============================= HABANA PT BRIDGE CONFIGURATION ===========================
PT_HPU_LAZY_MODE = 1
PT_RECIPE_CACHE_PATH =
PT_CACHE_FOLDER_DELETE = 0
PT_HPU_RECIPE_CACHE_CONFIG =
PT_HPU_MAX_COMPOUND_OP_SIZE = 9223372036854775807
PT_HPU_LAZY_ACC_PAR_MODE = 1
PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES = 0
PT_HPU_EAGER_PIPELINE_ENABLE = 1
PT_HPU_EAGER_COLLECTIVE_PIPELINE_ENABLE = 1
---------------------------: System Configuration :---------------------------
Num CPU Cores : 112
CPU RAM : 1056428680 KB
2024-10-18 03:09:10 [INFO][utils.py:201] Preparation end.
Initializing inference mode
10/18/2024 03:09:11 - INFO - main - Args: Namespace(buckets=[16, 32, 64, 128, 189, 284], output_file='acc_bloomz7b_bs1_measure.txt', tasks=['hellaswag', 'lambada_openai', 'piqa', 'winogrande'], limit_iters=None, device='hpu', model_name_or_path='/DISK0/bloomz-7b1', bf16=True, max_new_tokens=100, max_input_tokens=0, batch_size=1, warmup=3, n_iterations=5, local_rank=0, use_kv_cache=True, use_hpu_graphs=True, dataset_name=None, column_name=None, do_sample=False, num_beams=1, top_k=None, penalty_alpha=None, trim_logits=True, seed=27, profiling_warmup_steps=0, profiling_steps=0, profiling_record_shapes=False, prompt=None, bad_words=None, force_words=None, assistant_model=None, peft_model=None, num_return_sequences=1, token=None, model_revision='main', attn_softmax_bf16=True, output_dir=None, bucket_size=128, bucket_internal=True, dataset_max_samples=-1, limit_hpu_graphs=False, show_graphs_count=False, reuse_cache=False, verbose_workers=False, simulate_dyn_prompt=None, reduce_recompile=False, use_flash_attention=True, flash_attention_recompute=True, flash_attention_causal_mask=False, flash_attention_fast_softmax=True, book_source=False, torch_compile=False, ignore_eos=True, temperature=1.0, top_p=1.0, const_serialization_path=None, trust_remote_code=True, parallel_strategy='none', input_embeds=False, run_partial_dataset=False, load_quantized_model_with_autogptq=False, disk_offload=False, load_quantized_model_with_inc=False, local_quantized_inc_model_path=None, quant_config='./quantization_config/maxabs_measure.json', world_size=0, global_rank=0)
10/18/2024 03:09:11 - INFO - main - device: hpu, n_hpu: 0, bf16: True
10/18/2024 03:09:11 - INFO - main - Model initialization took 13.380s
Traceback (most recent call last):
File "/home/intel/optimum-habana/examples/text-generation/run_lm_eval.py", line 231, in
main()
File "/home/intel/optimum-habana/examples/text-generation/run_lm_eval.py", line 198, in main
lm_tasks = lm_eval.tasks.get_task_dict(args.tasks)
File "/home/intel/optimum-habana/examples/text-generation/tmp/lm-evaluation-harness-0bf683b4e6a9df359b3156ba9ba8d62bdd47e0c0/lm_eval/tasks/init.py", line 415, in get_task_dict
task_name_dict = {
File "/home/intel/optimum-habana/examples/text-generation/tmp/lm-evaluation-harness-0bf683b4e6a9df359b3156ba9ba8d62bdd47e0c0/lm_eval/tasks/init.py", line 416, in
task_name: get_task(task_name)()
File "/home/intel/optimum-habana/examples/text-generation/tmp/lm-evaluation-harness-0bf683b4e6a9df359b3156ba9ba8d62bdd47e0c0/lm_eval/base.py", line 513, in init
self.download(data_dir, cache_dir, download_mode)
File "/home/intel/optimum-habana/examples/text-generation/tmp/lm-evaluation-harness-0bf683b4e6a9df359b3156ba9ba8d62bdd47e0c0/lm_eval/base.py", line 542, in download
self.dataset = datasets.load_dataset(
File "/usr/local/lib/python3.10/dist-packages/datasets/load.py", line 2606, in load_dataset
builder_instance = load_dataset_builder(
File "/usr/local/lib/python3.10/dist-packages/datasets/load.py", line 2277, in load_dataset_builder
dataset_module = dataset_module_factory(
File "/usr/local/lib/python3.10/dist-packages/datasets/load.py", line 1923, in dataset_module_factory
raise e1 from None
File "/usr/local/lib/python3.10/dist-packages/datasets/load.py", line 1875, in dataset_module_factory
can_load_config_from_parquet_export = "DEFAULT_CONFIG_NAME" not in f.read()
File "/usr/lib/python3.10/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
Expected behavior
quantization successfully running
The text was updated successfully, but these errors were encountered: