Update on the development branch #1726
kaiyux
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
The TensorRT-LLM team is pleased to announce that we are pushing an update to the development branch (and the Triton backend) this June 4, 2024.
This update includes:
examples/grok/README.md
.examples/llama/README.md
).qkv_bias
shape issue for Qwen1.5-32B (convert qwen 110b gptq checkpoint的时候,qkv_bias 的shape不能被3整除 #1589), thanks to the contribution from @Tlntin in fix up qkv.bias error when use qwen1.5-32b-gptq-int4 #1637.fpA_intB
, thanks to the contribution from @JamesTheZ in Fix the error of Ada traits for fpA_intB. #1583.examples/qwenvl/requirements.txt
, thanks to the contribution from @ngoanpv in Update requirements.txt #1248.lora_manager
, thanks to the contribution from @TheCodeWrangler in Fixed rslora scaling in lora_manager #1669.benchmarks/cpp/README.md
for gptManagerBenchmark seems to go into a dead loop with GPU usage 0% #1562 and Cannot process new request: [TensorRT-LLM][ERROR] Assertion failed: LoRA task 0 not found in cache. Please send LoRA weights with request (/home/jenkins/agent/workspace/LLM/main/L0_PostMerge/llm/cpp/tensorrt_llm/batch_manager/peftCacheManager.cpp:182) #1552.Thanks,
The TensorRT-LLM Engineering Team
Beta Was this translation helpful? Give feedback.
All reactions