A Generation Error #474

tomato996 · 2025-02-13T08:27:04Z

My model encountered a strange problem. After loading the model, the first few inferences could output content normally. However, after a few inferences, the model's inference time suddenly increased and the output of meaningless tokens was like this.

I think this is not the data problem , because sometimes the same problem can get a normal response, and sometimes it can't .

Here is my code to load and inference

ckpt_path = "/ceph_data/szy/internlm-xcomposer2d5-7B"
self.model = AutoModelForCausalLM.from_pretrained(ckpt_path, torch_dtype=torch.bfloat16, trust_remote_code=True).cuda()
self.tokenizer = AutoTokenizer.from_pretrained(ckpt_path, trust_remote_code=True)

with torch.autocast(device_type='cuda', dtype=torch.float16):
     response, _ = self.model.chat(self.tokenizer, query, image, do_sample=False, num_beams=3, use_meta=True)

The text was updated successfully, but these errors were encountered:

mm-assistant bot assigned LightDXY Feb 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A Generation Error #474

A Generation Error #474

tomato996 commented Feb 13, 2025 •

edited

Loading

A Generation Error #474

A Generation Error #474

Comments

tomato996 commented Feb 13, 2025 • edited Loading

tomato996 commented Feb 13, 2025 •

edited

Loading