Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Add classification Task with AutoModelForSequenceClassification and BertForSequenceClassification #10939

Open
1 task done
yamamotolotation opened this issue Dec 6, 2024 · 10 comments

Comments

@yamamotolotation
Copy link

🚀 The feature, motivation and pitch

Latest Version: v0.6.4.post1

Summary:
I would like to request support for using BERT and DistilBERT models for classification tasks. As of v0.6.4.post1, BertForSequenceClassification is only available for Sentence Pair Scoring tasks via the Score API.

Detail:
With a recent Pull Request, #9704 Qwen2ForSequenceClassification became available for text classification tasks. This is a fantastic improvement!

However, in text multi-classification tasks, models other than Qwen2 are often preferred due to the following reasons:

  • Limited multilingual support in Qwen2.
  • A larger number of parameters in Qwen2, leading to longer inference times.

BERT and DistilBERT, which have been extensively researched and fine-tuned for a wide range of languages, often achieve better classification accuracy and faster inference times. These models are frequently integrated into applications as local LLMs, making them an excellent fit for vLLM's role as a high-speed inference accelerator.

Currently, if we try to execute code for text classification using BERT or DistilBERT models in the same manner as the sample code for Qwen2, the following error is raised:

from vllm import LLM, SamplingParams

model = LLM(
    model="cl-tohoku/bert-base-japanese-v3.pt",
    tensor_parallel_size=tensor_parallel_size,
)
outputs = model.generate(
    prompts=list(texts),
    sampling_params=sampling_params,
)
ValueError: Model architectures ['BertForSequenceClassification'] are not supported for now. Supported architectures: dict_keys(['AquilaModel', 'AquilaForCausalLM', 'ArcticForCausalLM', 'BaiChuanForCausalLM', 'BaichuanForCausalLM', 'BloomForCausalLM', 'CohereForCausalLM', 'DbrxForCausalLM', 'DeciLMForCausalLM', 'DeepseekForCausalLM', 'DeepseekV2ForCausalLM', 'ExaoneForCausalLM', 'FalconForCausalLM', 'GemmaForCausalLM', 'Gemma2ForCausalLM', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTJForCausalLM', 'GPTNeoXForCausalLM', 'GraniteForCausalLM', 'GraniteMoeForCausalLM', 'InternLMForCausalLM', 'InternLM2ForCausalLM', 'InternLM2VEForCausalLM', 'JAISLMHeadModel', 'JambaForCausalLM', 'LlamaForCausalLM', 'LLaMAForCausalLM', 'MambaForCausalLM', 'FalconMambaForCausalLM', 'MiniCPMForCausalLM', 'MiniCPM3ForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'QuantMixtralForCausalLM', 'MptForCausalLM', 'MPTForCausalLM', 'NemotronForCausalLM', 'OlmoForCausalLM', 'OlmoeForCausalLM', 'OPTForCausalLM', 'OrionForCausalLM', 'PersimmonForCausalLM', 'PhiForCausalLM', 'Phi3ForCausalLM', 'Phi3SmallForCausalLM', 'PhiMoEForCausalLM', 'Qwen2ForCausalLM', 'Qwen2MoeForCausalLM', 'RWForCausalLM', 'StableLMEpochForCausalLM', 'StableLmForCausalLM', 'Starcoder2ForCausalLM', 'SolarForCausalLM', 'XverseForCausalLM', 'BartModel', 'BartForConditionalGeneration', 'Florence2ForConditionalGeneration', 'BertModel', 'RobertaModel', 'XLMRobertaModel', 'Gemma2Model', 'LlamaModel', 'MistralModel', 'Qwen2Model', 'Qwen2ForRewardModel', 'Qwen2ForSequenceClassification', 'LlavaNextForConditionalGeneration', 'Phi3VForCausalLM', 'Qwen2VLForConditionalGeneration', 'Blip2ForConditionalGeneration', 'ChameleonForConditionalGeneration', 'ChatGLMModel', 'ChatGLMForConditionalGeneration', 'FuyuForCausalLM', 'H2OVLChatModel', 'InternVLChatModel', 'Idefics3ForConditionalGeneration', 'LlavaForConditionalGeneration', 'LlavaNextVideoForConditionalGeneration', 'LlavaOnevisionForConditionalGeneration', 'MiniCPMV', 'MolmoForCausalLM', 'NVLM_D', 'PaliGemmaForConditionalGeneration', 'PixtralForConditionalGeneration', 'QWenLMHeadModel', 'Qwen2AudioForConditionalGeneration', 'UltravoxModel', 'MllamaForConditionalGeneration', 'EAGLEModel', 'MedusaModel', 'MLPSpeculatorPreTrainedModel'])

Adding support for AutoModelForSequenceClassification, BertForSequenceClassification, and similar models would greatly enhance the utility of vLLM for a broader range of text classification tasks.

Alternatives

No response

Additional context

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@DarkLight1337
Copy link
Member

As of v0.6.4.post1, BertForSequenceClassification is only available for Sentence Pair Scoring tasks via the Score API.

From my understanding, this isn't even available in v0.6.4.post1 since #10400 was merged after that. Are you using latest main branch?

@yamamotolotation
Copy link
Author

Thank you for pointing this out.

I haven’t tested whether the BERT model is available via the Score API on my local v0.6.4.post1 version, so you may be correct that it is not implemented yet. My earlier statement was based on the official vLLM documentation as of 2024-12-06, where it is already listed as "supported."
(Reference: vLLM Supported Models Documentation #sentence-pair-scoring)

That said, the core of my Feature Request is not about the current implementation status of the Score API but rather about adding BERT models to the Classification task. I’d appreciate hearing opinions on this topic as well.

@DarkLight1337
Copy link
Member

We don't distinguish between sequence classification and cross-encoding in terms of model architecture. As long as the model architecture is compatible, you can use the same model for regular sequence classification.

@yamamotolotation
Copy link
Author

yamamotolotation commented Dec 6, 2024

Thank you for your guidance. I realize that my understanding was incomplete, and I had assumed that this Score API could only be used for tasks where two sentences are input simultaneously into the model to evaluate similarity or relevance. I thought it couldn't be used for the task I want to perform, which involves processing an entire input sequence to predict specific class labels.
If it is possible to use the API for multi-class classification tasks with some additional coding, that would be incredibly helpful! If you have any sample code or reference information, I would love to learn more.

On the other hand, for text classification, it might be more convenient to support direct inference within the main script, such as using something like llm.generate(), rather than implementing it via HTTP requests. This could be particularly useful for use cases like mobile or embedded applications where direct inference support is critical.
Does vLLM only support BERT models in the manner shown in the sample code below?
https://github.com/maxdebayser/vllm/blob/e1e2f40336f64e9ffcc07d5e7f6f2317f1268c51/examples/openai_cross_encoder_score.py

I apologize if my question is based on a misunderstanding.

@DarkLight1337
Copy link
Member

You are correct that Score API is only for cross-encoder models. On the other hand, you can use LLM.encode to directly get the class probabilities.

@yamamotolotation
Copy link
Author

I wasn't aware of the llm.encode() module. Thank you for pointing it out!

I'll try using this feature to perform text classification with the BERT model and the DistilBERT model. If it works well, I'll share some sample code and close this issue. I'm really excited about this!

@yamamotolotation
Copy link
Author

I tried it, but unfortunately, it didn't work.
I misunderstood slightly: to execute LLM.encode(), I need to load a model into the LLM instance before. However, since BertForSequenceClassification is not supported, an error is returned at that stage, so I couldn't even test LLM.encode().

This result was occured using v0.6.4.post1, and the same error occurred when providing two types of models as the model= argument. (One was the path to a Hugging Face-format directory containing a locally fine-tuned model, and the other was the name of a BERT model published on Hugging Face as a string.)

llm = LLM(model="path/to/model/dir/my_fine_tuned_tohokuBERT", tensor_parallel_size=1)

ValueError: Model architectures ['BertForSequenceClassification'] are not supported for now. Supported architectures: dict_keys(['AquilaModel', 'AquilaForCausalLM', 'ArcticForCausalLM', 'BaiChuanForCausalLM', 'BaichuanForCausalLM', 'BloomForCausalLM', 'CohereForCausalLM', 'DbrxForCausalLM', 'DeciLMForCausalLM', 'DeepseekForCausalLM', 'DeepseekV2ForCausalLM', 'ExaoneForCausalLM', 'FalconForCausalLM', 'GemmaForCausalLM', 'Gemma2ForCausalLM', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTJForCausalLM', 'GPTNeoXForCausalLM', 'GraniteForCausalLM', 'GraniteMoeForCausalLM', 'InternLMForCausalLM', 'InternLM2ForCausalLM', 'InternLM2VEForCausalLM', 'JAISLMHeadModel', 'JambaForCausalLM', 'LlamaForCausalLM', 'LLaMAForCausalLM', 'MambaForCausalLM', 'FalconMambaForCausalLM', 'MiniCPMForCausalLM', 'MiniCPM3ForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'QuantMixtralForCausalLM', 'MptForCausalLM', 'MPTForCausalLM', 'NemotronForCausalLM', 'OlmoForCausalLM', 'OlmoeForCausalLM', 'OPTForCausalLM', 'OrionForCausalLM', 'PersimmonForCausalLM', 'PhiForCausalLM', 'Phi3ForCausalLM', 'Phi3SmallForCausalLM', 'PhiMoEForCausalLM', 'Qwen2ForCausalLM', 'Qwen2MoeForCausalLM', 'RWForCausalLM', 'StableLMEpochForCausalLM', 'StableLmForCausalLM', 'Starcoder2ForCausalLM', 'SolarForCausalLM', 'XverseForCausalLM', 'BartModel', 'BartForConditionalGeneration', 'Florence2ForConditionalGeneration', 'BertModel', 'RobertaModel', 'XLMRobertaModel', 'Gemma2Model', 'LlamaModel', 'MistralModel', 'Qwen2Model', 'Qwen2ForRewardModel', 'Qwen2ForSequenceClassification', 'LlavaNextForConditionalGeneration', 'Phi3VForCausalLM', 'Qwen2VLForConditionalGeneration', 'Blip2ForConditionalGeneration', 'ChameleonForConditionalGeneration', 'ChatGLMModel', 'ChatGLMForConditionalGeneration', 'FuyuForCausalLM', 'H2OVLChatModel', 'InternVLChatModel', 'Idefics3ForConditionalGeneration', 'LlavaForConditionalGeneration', 'LlavaNextVideoForConditionalGeneration', 'LlavaOnevisionForConditionalGeneration', 'MiniCPMV', 'MolmoForCausalLM', 'NVLM_D', 'PaliGemmaForConditionalGeneration', 'PixtralForConditionalGeneration', 'QWenLMHeadModel', 'Qwen2AudioForConditionalGeneration', 'UltravoxModel', 'MllamaForConditionalGeneration', 'EAGLEModel', 'MedusaModel', 'MLPSpeculatorPreTrainedModel'])
llm = LLM(model="cl-tohoku/bert-base-japanese-v3", tensor_parallel_size=1)

ValueError: Model architectures ['BertForPreTraining'] are not supported for now. Supported architectures: .......

@DarkLight1337
Copy link
Member

DarkLight1337 commented Dec 6, 2024

Even if you use LLM.encode, you need to have the latest code of vLLM (not latest released version), because those sequence classification models were added to vLLM after the release.

@yamamotolotation
Copy link
Author

yamamotolotation commented Dec 6, 2024

Finally, I understand the meaning of the advice you initially gave me.

Thank you so much for your thorough support. Once I confirm that the changes in the main branch are included in the next release version, I'll give it a try!

@DarkLight1337
Copy link
Member

Sorry that I wasn't being clear before!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants