We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Qwen2.5--VL-7B-Instruct
生成过程中,采用2卡 4090进行生成,但4090显卡利用率较低。如何提高显卡利用率,加快生成速度?
2卡 4090 Python 3.10
我尝试在2卡 4090上跑Qwen2.5-7B-Instruct模型,并且能够顺利输出,但唯一的问题是显卡利用率上不去,其中一张卡长期保持在1%,另外一张卡长期保持在48%到49%
The following example input & output can be used: def segment_caption(): device_map = {'visual': 0, 'model.embed_tokens': 0, 'model.layers.0': 1, 'model.layers.1': 1, 'model.layers.2': 1, 'model.layers.3': 1, 'model.layers.4': 1, 'model.layers.5': 1, 'model.layers.6': 1, 'model.layers.7': 1, 'model.layers.8': 1, 'model.layers.9': 1, 'model.layers.10': 1, 'model.layers.11': 1, 'model.layers.12': 1, 'model.layers.13': 1, 'model.layers.14': 1, 'model.layers.15': 1, 'model.layers.16': 1, 'model.layers.17': 1, 'model.layers.18': 1, 'model.layers.19': 1, 'model.layers.20': 1, 'model.layers.21': 1, 'model.layers.22': 1, 'model.layers.23': 1, 'model.layers.24': 1, 'model.layers.25': 1, 'model.layers.26': 1, 'model.layers.27': 1, 'model.norm': 1, 'model.rotary_emb': 1, 'lm_head': 1} model = Qwen2_5_VLForConditionalGeneration.from_pretrained( "Qwen/Qwen2.5-VL-7B-Instruct", torch_dtype="auto", device_map=device_map ) min_pixels = 256 * 28 * 28 max_pixels = 1280 * 28 * 28 processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct", min_pixels=min_pixels, max_pixels=max_pixels)
# Preparation for inference if image_path: text = processor.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) image_inputs, video_inputs = process_vision_info(messages) inputs = processor( text=[text], images=image_inputs, videos=video_inputs, padding=True, return_tensors="pt", ) inputs = inputs.to(model.device) else: input_prompt = processor.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) inputs = processor( text=[input_prompt], text_kwargs={"padding": False}, return_tensors="pt" ) inputs = inputs.to("cuda") # Inference: Generation of the output generated_ids = model.generate(**inputs, max_new_tokens=2048) generated_ids_trimmed = [ out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids) ] output_text = processor.batch_decode( generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False ) print(output_text) return output_text
预期结果:我希望这两张卡的利用率提高,加快生成速度。
The text was updated successfully, but these errors were encountered:
建议使用 vLLM 部署推理,详见 Readme 文档
Sorry, something went wrong.
No branches or pull requests
Model Series
Qwen2.5--VL-7B-Instruct
What are the models used?
Qwen2.5--VL-7B-Instruct
What is the scenario where the problem happened?
生成过程中,采用2卡 4090进行生成,但4090显卡利用率较低。如何提高显卡利用率,加快生成速度?
Information about environment
2卡 4090
Python 3.10
Description
Steps to reproduce
我尝试在2卡 4090上跑Qwen2.5-7B-Instruct模型,并且能够顺利输出,但唯一的问题是显卡利用率上不去,其中一张卡长期保持在1%,另外一张卡长期保持在48%到49%
Code
The following example input & output can be used:
def segment_caption():
device_map = {'visual': 0, 'model.embed_tokens': 0, 'model.layers.0': 1, 'model.layers.1': 1,
'model.layers.2': 1,
'model.layers.3': 1, 'model.layers.4': 1, 'model.layers.5': 1, 'model.layers.6': 1,
'model.layers.7': 1,
'model.layers.8': 1, 'model.layers.9': 1, 'model.layers.10': 1, 'model.layers.11': 1,
'model.layers.12': 1, 'model.layers.13': 1, 'model.layers.14': 1, 'model.layers.15': 1,
'model.layers.16': 1, 'model.layers.17': 1, 'model.layers.18': 1, 'model.layers.19': 1,
'model.layers.20': 1, 'model.layers.21': 1, 'model.layers.22': 1, 'model.layers.23': 1,
'model.layers.24': 1, 'model.layers.25': 1, 'model.layers.26': 1, 'model.layers.27': 1,
'model.norm': 1,
'model.rotary_emb': 1, 'lm_head': 1}
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
"Qwen/Qwen2.5-VL-7B-Instruct", torch_dtype="auto", device_map=device_map
)
min_pixels = 256 * 28 * 28
max_pixels = 1280 * 28 * 28
processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct", min_pixels=min_pixels,
max_pixels=max_pixels)
Expected results
预期结果:我希望这两张卡的利用率提高,加快生成速度。
The text was updated successfully, but these errors were encountered: