-
Notifications
You must be signed in to change notification settings - Fork 637
[Feature] support pooling model dummy_run #4345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
…into pooling_emb_3
Thanks for your contribution! |
from fastdeploy.engine.pooling_params import PoolingParams | ||
from fastdeploy.engine.tasks import PoolingTask |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
从 engine import 东西到底层是合理的吗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里是参考vllm做法,它是vllm/tasks,我就放到engine底下了
class FdModel(Protocol[T_co]): | ||
"""The interface required for all models in FastDeploy.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
哪些类会继承FDModel,和 ModelForCasualLM 是啥关系
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
只有FDModelForPooling继承,和ModelForCasualLM没关系,ModelForCasualLM有compute_logits,pooling模型不计算这个
[num_reqs, req_num_tokens], | ||
dtype="int32", | ||
) | ||
model = cast(FdModelForPooling, self.get_model()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上,FdModelForPooling 和 ModelForCasualLM 关系是什么,一定要cast吗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里是设置一些默认的pooling_type(如果用户不设置),是需要cast的
to_update = model.pooler.get_pooling_updates(task) | ||
to_update.apply(dummy_pooling_params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to_update 用命名的语意准确吗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
参考vllm规范实现的
cumsum = paddle.zeros([n_seq + 1], dtype="int64") | ||
if cumsum.place.is_gpu_place(): | ||
cumsum = cumsum.cpu() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里为啥不直接zeros一个cpu tensor ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
self.attn_backends.append(attn_backend) | ||
|
||
def _dummy_pooler_run_task( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
为什么不直接实现在_dummy_pooler_run中,而是单独抽出一个_dummy_pooler_run_task ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
参看vllm规范写的
self.speculative_decoding = self.speculative_method is not None | ||
self.enable_logprob = fd_config.model_config.enable_logprob | ||
self.enable_early_stop = self.fd_config.early_stop_config.enable_early_stop | ||
self.is_pooling_model = self.fd_config.model_config.runner_type == "pooling" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.is_pooling_model和is_pooling_model是否能去除一个?有都存在的必要性吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
去除了is_pooling_model,保留了self.is_pooling_model
支持pooling模型dummy_pooler_run,以及将之前生成式模型预热阶段重构为dummy_sampler_run