Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GRPO example "'PEFTHelper' object has no attribute 'validate_legal" #1687

Closed
lai-serena opened this issue Feb 13, 2025 · 2 comments
Closed

Comments

@lai-serena
Copy link

I am following along with the colab notebook at:https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen2.5_(3B)-GRPO.ipynb#scrollTo=vzOuSVCL_GA9
and after training 1 step
🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))== Unsloth 2025.2.5: Fast Qwen2 patching. Transformers: 4.48.3.
\ /| GPU: Tesla V100S-PCIE-32GB. Max memory: 31.733 GB. Platform: Linux.
O^O/ _/ \ Torch: 2.5.1+cu121. CUDA: 7.0. CUDA Toolkit: 12.1. Triton: 3.1.0
\ / Bfloat16 = FALSE. FA [Xformers = 0.0.28.post3. FA2 = False]
"-____-" Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Unsloth: vLLM loading /models/Qwen2.5-1.5B-Instruct with actual GPU utilization = 48.42%
Unsloth: Your GPU has CUDA compute capability 7.0 with VRAM = 31.73 GB.
Unsloth: Using conservativeness = 1.0. Chunked prefill tokens = 1024. Num Sequences = 256.
Unsloth: vLLM's KV Cache can use up to 12.47 GB. Also swap space = 6 GB.
WARNING 02-13 08:47:39 config.py:2276] Casting torch.bfloat16 to torch.float16.
INFO 02-13 08:47:45 config.py:510] This model supports multiple tasks: {'generate', 'classify', 'reward', 'score', 'embed'}. Defaulting to 'generate'.
INFO 02-13 08:47:45 llm_engine.py:234] Initializing an LLM engine (v0.6.6) with config: model='/models/Qwen2.5-1.5B-Instruct', speculative_config=None, tokenizer='/models/Qwen2.5-1.5B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=1024, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=/models/Qwen2.5-1.5B-Instruct, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=False, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"level":0,"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output"],"candidate_compile_sizes":[],"compile_sizes":[],"capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=False,
INFO 02-13 08:47:46 selector.py:217] Cannot use FlashAttention-2 backend for Volta and Turing GPUs.
INFO 02-13 08:47:46 selector.py:129] Using XFormers backend.
[W213 08:47:46.344152759 CUDAAllocatorConfig.h:28] Warning: expandable_segments not supported on this platform (function operator())
INFO 02-13 08:47:46 model_runner.py:1094] Starting to load model /models/Qwen2.5-1.5B-Instruct...
Loading safetensors checkpoint shards: 0% Completed | 0/1 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 1.90it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 1.90it/s]

INFO 02-13 08:47:47 model_runner.py:1099] Loading model weights took 2.8860 GB
INFO 02-13 08:47:47 punica_selector.py:11] Using PunicaWrapperGPU.
INFO 02-13 08:47:48 worker.py:241] Memory profiling takes 1.07 seconds
INFO 02-13 08:47:48 worker.py:241] the current vLLM instance can use total_gpu_memory (31.73GiB) x gpu_memory_utilization (0.48) = 15.36GiB
INFO 02-13 08:47:48 worker.py:241] model weights take 2.89GiB; non_torch_memory takes 0.12GiB; PyTorch activation peak memory takes 1.40GiB; the rest of the memory reserved for KV Cache is 10.95GiB.
INFO 02-13 08:47:48 gpu_executor.py:76] # GPU blocks: 25631, # CPU blocks: 14043
INFO 02-13 08:47:48 gpu_executor.py:80] Maximum concurrency for 1024 tokens per request: 400.48x
INFO 02-13 08:47:51 model_runner.py:1415] Capturing cudagraphs for decoding. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI. If out-of-memory error occurs during cudagraph capture, consider decreasing gpu_memory_utilization or switching to eager mode. You can also reduce the max_num_seqs as needed to decrease memory usage.
Capturing CUDA graph shapes: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 35/35 [00:19<00:00, 1.81it/s]
INFO 02-13 08:48:10 model_runner.py:1535] Graph capturing finished in 19 secs, took 0.38 GiB
INFO 02-13 08:48:10 llm_engine.py:431] init engine (profile, create kv cache, warmup model) took 23.31 seconds
Unsloth 2025.2.5 patched 28 layers with 28 QKV layers, 28 O layers and 28 MLP layers.
torch.distributed process group is initialized, but parallel_mode != ParallelMode.DISTRIBUTED. In order to use Torch DDP, launch your script with `python -m torch.distributed.launch
==((====))== Unsloth - 2x faster free finetuning | Num GPUs = 1
\ /| Num examples = 7,473 | Num Epochs = 1
O^O/ _/ \ Batch size per device = 1 | Gradient Accumulation steps = 1
\ / Total batch size = 1 | Total steps = 250
"-____-" Number of trainable parameters = 9,232,384
0%| | 0/250 [00:00<?, ?it/s]-------------------- Question:
Ahmed and Emily are having a contest to see who can get the best grade in the class. There have been 9 assignments and Ahmed has a 91 in the class. Emily has a 92. The final assignment is worth the same amount as all the other assignments. Emily got a 90 on the final assignment. What is the minimum grade Ahmed needs to get to beat Emily if all grades are whole numbers?
Answer:
100
Response:
To determine the minimum grade Ahmed needs to beat Emily, we first need to calculate the total possible grade Ahmed can get in the class.

Let's assume the maximum grade a student can get is 100. If there have been 9 assignments and each is worth the same, let's denote the total possible grade Ahmed can get as ( A ).

[ A = 100 \times 9 = 900 ]

Ahmed has already scored 91 out of 900. Let ( x ) represent the minimum grade Ahmed needs to beat Emily.

Emily has scored 92 on the first 8 assignments and 90 on the final assignment. The total possible grade for Emily is 900 as well, so we can write her overall grade:

[ 91 + 92 + 92 + 92 + 92 + 92 + 92 + 92 + 90 =
Extracted:
To determine the minimum grade Ahmed needs to beat Emily, we first need to calculate the total possible grade Ahmed can get in the class.

Let's assume the maximum grade a student can get is 100. If there have been 9 assignments and each is worth the same, let's denote the total possible grade Ahmed can get as ( A ).

[ A = 100 \times 9 = 900 ]

Ahmed has already scored 91 out of 900. Let ( x ) represent the minimum grade Ahmed needs to beat Emily.

Emily has scored 92 on the first 8 assignments and 90 on the final assignment. The total possible grade for Emily is 900 as well, so we can write her overall grade:

[ 91 + 92 + 92 + 92 + 92 + 92 + 92 + 92 + 90 =
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.0000000000000002e-07, 'rewards/xmlcount_reward_func': 0.0, 'rewards/soft_format_reward_func': 0.0, 'rewards/strict_format_reward_func': 0.0, 'rewards/int_reward_func': 0.0, 'rewards/correctness_reward_func': 0.0, 'reward': 0.0, 'reward_std': 0.0, 'completion_length': 182.25, 'kl': 0.0, 'epoch': 0.0}
0%|▊ | 1/250 [00:03<12:58, 3.13s/it]INFO 02-13 08:48:24 model_runner_base.py:120] Writing input of failed execution to /tmp/err_execute_model_input_20250213-084824.pkl...
INFO 02-13 08:48:24 model_runner_base.py:149] Completed writing input of failed execution to /tmp/err_execute_model_input_20250213-084824.pkl.
[rank0]: Traceback (most recent call last):
[rank0]: File "/opt/conda/lib/python3.10/site-packages/vllm/worker/model_runner_base.py", line 116, in _wrapper
[rank0]: return func(*args, **kwargs)
[rank0]: File "/opt/conda/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1632, in execute_model
[rank0]: self.set_active_loras(model_input.lora_requests,
[rank0]: File "/opt/conda/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1344, in set_active_loras
[rank0]: self.lora_manager.set_active_adapters(lora_requests, lora_mapping)
[rank0]: File "/opt/conda/lib/python3.10/site-packages/unsloth_zoo/vllm_lora_worker_manager.py", line 183, in set_active_adapters
[rank0]: set_active_adapters_worker(requests, mapping, self._apply_adapters,
[rank0]: File "/opt/conda/lib/python3.10/site-packages/vllm/adapter_commons/utils.py", line 52, in set_active_adapters_worker
[rank0]: apply_adapters_func(requests)
[rank0]: File "/opt/conda/lib/python3.10/site-packages/unsloth_zoo/vllm_lora_worker_manager.py", line 243, in _apply_adapters
[rank0]: self.add_adapter(lora)
[rank0]: File "/opt/conda/lib/python3.10/site-packages/unsloth_zoo/vllm_lora_worker_manager.py", line 251, in add_adapter
[rank0]: lora = self._load_adapter(lora_request)
[rank0]: File "/opt/conda/lib/python3.10/site-packages/unsloth_zoo/vllm_lora_worker_manager.py", line 157, in _load_adapter
[rank0]: raise e
[rank0]: File "/opt/conda/lib/python3.10/site-packages/unsloth_zoo/vllm_lora_worker_manager.py", line 110, in _load_adapter
[rank0]: peft_helper.validate_legal(self.lora_config)
[rank0]: AttributeError: 'PEFTHelper' object has no attribute 'validate_legal'

[rank0]: The above exception was the direct cause of the following exception:

[rank0]: Traceback (most recent call last):
[rank0]: File "/workspace/distill_model.py", line 74, in
[rank0]: trainer.train()
[rank0]: File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 2171, in train
[rank0]: return inner_training_loop(
[rank0]: File "", line 382, in _fast_inner_training_loop
[rank0]: File "", line 25, in _unsloth_training_step
[rank0]: File "/workspace/unsloth_compiled_cache/GRPOTrainer.py", line 323, in _prepare_inputs
[rank0]: outputs = self.llm.generate(all_prompts_text, sampling_params=self.sampling_params, use_tqdm=False, lora_request = self.model.load_lora('grpo_trainer_lora_model', load_tensors = True))
[rank0]: File "/opt/conda/lib/python3.10/site-packages/vllm/utils.py", line 1021, in inner
[rank0]: return fn(*args, **kwargs)
[rank0]: File "/opt/conda/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 462, in generate
[rank0]: outputs = self._run_engine(use_tqdm=use_tqdm)
[rank0]: File "/opt/conda/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 1242, in _run_engine
[rank0]: step_outputs = self.llm_engine.step()
[rank0]: File "/opt/conda/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 1390, in step
[rank0]: outputs = self.model_executor.execute_model(
[rank0]: File "/opt/conda/lib/python3.10/site-packages/vllm/executor/gpu_executor.py", line 88, in execute_model
[rank0]: output = self.driver_worker.execute_model(execute_model_req)
[rank0]: File "/opt/conda/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 343, in execute_model
[rank0]: output = self.model_runner.execute_model(
[rank0]: File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/opt/conda/lib/python3.10/site-packages/vllm/worker/model_runner_base.py", line 152, in _wrapper
[rank0]: raise type(err)(
[rank0]: AttributeError: Error in model execution (input dumped to /tmp/err_execute_model_input_20250213-084824.pkl): 'PEFTHelper' object has no attribute 'validate_legal'
[rank0]:[W213 08:48:25.171847756 ProcessGroupNCCL.cpp:1250] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator())
0%| | 1/250 [00:04<16:41, 4.02s/it]

environment:
Ubuntu Tesla V100S-PCIE-32GB
triton 3.1.0
trl 0.15.0.dev0
truststore 0.8.0
typeguard 4.4.1
types-dataclasses 0.6.6
typing_extensions 4.12.2
tyro 0.9.14
tzdata 2025.1
unsloth 2025.2.5
unsloth_zoo 2025.2.3
urllib3 1.26.18
uvicorn 0.34.0
uvloop 0.21.0
virtualenv 20.29.2
vllm 0.6.6
peft 0.14.0

@oneseer
Copy link

oneseer commented Feb 14, 2025

I had the same issue. Please try vllm>=0.7.0 instead.

@lai-serena
Copy link
Author

I had the same issue. Please try vllm>=0.7.0 instead.
it's work! thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants