[Feature]: Add classification Task with AutoModelForSequenceClassification and BertForSequenceClassification #10939

yamamotolotation · 2024-12-06T02:05:50Z

🚀 The feature, motivation and pitch

Latest Version: v0.6.4.post1

Summary:
I would like to request support for using BERT and DistilBERT models for classification tasks. As of v0.6.4.post1, BertForSequenceClassification is only available for Sentence Pair Scoring tasks via the Score API.

Detail:
With a recent Pull Request, #9704 Qwen2ForSequenceClassification became available for text classification tasks. This is a fantastic improvement!

However, in text multi-classification tasks, models other than Qwen2 are often preferred due to the following reasons:

Limited multilingual support in Qwen2.
A larger number of parameters in Qwen2, leading to longer inference times.

BERT and DistilBERT, which have been extensively researched and fine-tuned for a wide range of languages, often achieve better classification accuracy and faster inference times. These models are frequently integrated into applications as local LLMs, making them an excellent fit for vLLM's role as a high-speed inference accelerator.

Currently, if we try to execute code for text classification using BERT or DistilBERT models in the same manner as the sample code for Qwen2, the following error is raised:

from vllm import LLM, SamplingParams

model = LLM(
    model="cl-tohoku/bert-base-japanese-v3.pt",
    tensor_parallel_size=tensor_parallel_size,
)
outputs = model.generate(
    prompts=list(texts),
    sampling_params=sampling_params,
)

ValueError: Model architectures ['BertForSequenceClassification'] are not supported for now. Supported architectures: dict_keys(['AquilaModel', 'AquilaForCausalLM', 'ArcticForCausalLM', 'BaiChuanForCausalLM', 'BaichuanForCausalLM', 'BloomForCausalLM', 'CohereForCausalLM', 'DbrxForCausalLM', 'DeciLMForCausalLM', 'DeepseekForCausalLM', 'DeepseekV2ForCausalLM', 'ExaoneForCausalLM', 'FalconForCausalLM', 'GemmaForCausalLM', 'Gemma2ForCausalLM', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTJForCausalLM', 'GPTNeoXForCausalLM', 'GraniteForCausalLM', 'GraniteMoeForCausalLM', 'InternLMForCausalLM', 'InternLM2ForCausalLM', 'InternLM2VEForCausalLM', 'JAISLMHeadModel', 'JambaForCausalLM', 'LlamaForCausalLM', 'LLaMAForCausalLM', 'MambaForCausalLM', 'FalconMambaForCausalLM', 'MiniCPMForCausalLM', 'MiniCPM3ForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'QuantMixtralForCausalLM', 'MptForCausalLM', 'MPTForCausalLM', 'NemotronForCausalLM', 'OlmoForCausalLM', 'OlmoeForCausalLM', 'OPTForCausalLM', 'OrionForCausalLM', 'PersimmonForCausalLM', 'PhiForCausalLM', 'Phi3ForCausalLM', 'Phi3SmallForCausalLM', 'PhiMoEForCausalLM', 'Qwen2ForCausalLM', 'Qwen2MoeForCausalLM', 'RWForCausalLM', 'StableLMEpochForCausalLM', 'StableLmForCausalLM', 'Starcoder2ForCausalLM', 'SolarForCausalLM', 'XverseForCausalLM', 'BartModel', 'BartForConditionalGeneration', 'Florence2ForConditionalGeneration', 'BertModel', 'RobertaModel', 'XLMRobertaModel', 'Gemma2Model', 'LlamaModel', 'MistralModel', 'Qwen2Model', 'Qwen2ForRewardModel', 'Qwen2ForSequenceClassification', 'LlavaNextForConditionalGeneration', 'Phi3VForCausalLM', 'Qwen2VLForConditionalGeneration', 'Blip2ForConditionalGeneration', 'ChameleonForConditionalGeneration', 'ChatGLMModel', 'ChatGLMForConditionalGeneration', 'FuyuForCausalLM', 'H2OVLChatModel', 'InternVLChatModel', 'Idefics3ForConditionalGeneration', 'LlavaForConditionalGeneration', 'LlavaNextVideoForConditionalGeneration', 'LlavaOnevisionForConditionalGeneration', 'MiniCPMV', 'MolmoForCausalLM', 'NVLM_D', 'PaliGemmaForConditionalGeneration', 'PixtralForConditionalGeneration', 'QWenLMHeadModel', 'Qwen2AudioForConditionalGeneration', 'UltravoxModel', 'MllamaForConditionalGeneration', 'EAGLEModel', 'MedusaModel', 'MLPSpeculatorPreTrainedModel'])

Adding support for AutoModelForSequenceClassification, BertForSequenceClassification, and similar models would greatly enhance the utility of vLLM for a broader range of text classification tasks.

Alternatives

No response

Additional context

No response

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

DarkLight1337 · 2024-12-06T05:47:49Z

As of v0.6.4.post1, BertForSequenceClassification is only available for Sentence Pair Scoring tasks via the Score API.

From my understanding, this isn't even available in v0.6.4.post1 since #10400 was merged after that. Are you using latest main branch?

yamamotolotation · 2024-12-06T06:05:18Z

Thank you for pointing this out.

I haven’t tested whether the BERT model is available via the Score API on my local v0.6.4.post1 version, so you may be correct that it is not implemented yet. My earlier statement was based on the official vLLM documentation as of 2024-12-06, where it is already listed as "supported."
(Reference: vLLM Supported Models Documentation #sentence-pair-scoring)

That said, the core of my Feature Request is not about the current implementation status of the Score API but rather about adding BERT models to the Classification task. I’d appreciate hearing opinions on this topic as well.

DarkLight1337 · 2024-12-06T06:24:19Z

We don't distinguish between sequence classification and cross-encoding in terms of model architecture. As long as the model architecture is compatible, you can use the same model for regular sequence classification.

yamamotolotation · 2024-12-06T06:54:32Z

Thank you for your guidance. I realize that my understanding was incomplete, and I had assumed that this Score API could only be used for tasks where two sentences are input simultaneously into the model to evaluate similarity or relevance. I thought it couldn't be used for the task I want to perform, which involves processing an entire input sequence to predict specific class labels.
If it is possible to use the API for multi-class classification tasks with some additional coding, that would be incredibly helpful! If you have any sample code or reference information, I would love to learn more.

On the other hand, for text classification, it might be more convenient to support direct inference within the main script, such as using something like llm.generate(), rather than implementing it via HTTP requests. This could be particularly useful for use cases like mobile or embedded applications where direct inference support is critical.
Does vLLM only support BERT models in the manner shown in the sample code below?
https://github.com/maxdebayser/vllm/blob/e1e2f40336f64e9ffcc07d5e7f6f2317f1268c51/examples/openai_cross_encoder_score.py

I apologize if my question is based on a misunderstanding.

DarkLight1337 · 2024-12-06T07:01:58Z

You are correct that Score API is only for cross-encoder models. On the other hand, you can use LLM.encode to directly get the class probabilities.

yamamotolotation · 2024-12-06T08:31:13Z

I wasn't aware of the llm.encode() module. Thank you for pointing it out!

I'll try using this feature to perform text classification with the BERT model and the DistilBERT model. If it works well, I'll share some sample code and close this issue. I'm really excited about this!

yamamotolotation · 2024-12-06T12:18:43Z

I tried it, but unfortunately, it didn't work.
I misunderstood slightly: to execute LLM.encode(), I need to load a model into the LLM instance before. However, since BertForSequenceClassification is not supported, an error is returned at that stage, so I couldn't even test LLM.encode().

This result was occured using v0.6.4.post1, and the same error occurred when providing two types of models as the model= argument. (One was the path to a Hugging Face-format directory containing a locally fine-tuned model, and the other was the name of a BERT model published on Hugging Face as a string.)

llm = LLM(model="path/to/model/dir/my_fine_tuned_tohokuBERT", tensor_parallel_size=1)

ValueError: Model architectures ['BertForSequenceClassification'] are not supported for now. Supported architectures: dict_keys(['AquilaModel', 'AquilaForCausalLM', 'ArcticForCausalLM', 'BaiChuanForCausalLM', 'BaichuanForCausalLM', 'BloomForCausalLM', 'CohereForCausalLM', 'DbrxForCausalLM', 'DeciLMForCausalLM', 'DeepseekForCausalLM', 'DeepseekV2ForCausalLM', 'ExaoneForCausalLM', 'FalconForCausalLM', 'GemmaForCausalLM', 'Gemma2ForCausalLM', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTJForCausalLM', 'GPTNeoXForCausalLM', 'GraniteForCausalLM', 'GraniteMoeForCausalLM', 'InternLMForCausalLM', 'InternLM2ForCausalLM', 'InternLM2VEForCausalLM', 'JAISLMHeadModel', 'JambaForCausalLM', 'LlamaForCausalLM', 'LLaMAForCausalLM', 'MambaForCausalLM', 'FalconMambaForCausalLM', 'MiniCPMForCausalLM', 'MiniCPM3ForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'QuantMixtralForCausalLM', 'MptForCausalLM', 'MPTForCausalLM', 'NemotronForCausalLM', 'OlmoForCausalLM', 'OlmoeForCausalLM', 'OPTForCausalLM', 'OrionForCausalLM', 'PersimmonForCausalLM', 'PhiForCausalLM', 'Phi3ForCausalLM', 'Phi3SmallForCausalLM', 'PhiMoEForCausalLM', 'Qwen2ForCausalLM', 'Qwen2MoeForCausalLM', 'RWForCausalLM', 'StableLMEpochForCausalLM', 'StableLmForCausalLM', 'Starcoder2ForCausalLM', 'SolarForCausalLM', 'XverseForCausalLM', 'BartModel', 'BartForConditionalGeneration', 'Florence2ForConditionalGeneration', 'BertModel', 'RobertaModel', 'XLMRobertaModel', 'Gemma2Model', 'LlamaModel', 'MistralModel', 'Qwen2Model', 'Qwen2ForRewardModel', 'Qwen2ForSequenceClassification', 'LlavaNextForConditionalGeneration', 'Phi3VForCausalLM', 'Qwen2VLForConditionalGeneration', 'Blip2ForConditionalGeneration', 'ChameleonForConditionalGeneration', 'ChatGLMModel', 'ChatGLMForConditionalGeneration', 'FuyuForCausalLM', 'H2OVLChatModel', 'InternVLChatModel', 'Idefics3ForConditionalGeneration', 'LlavaForConditionalGeneration', 'LlavaNextVideoForConditionalGeneration', 'LlavaOnevisionForConditionalGeneration', 'MiniCPMV', 'MolmoForCausalLM', 'NVLM_D', 'PaliGemmaForConditionalGeneration', 'PixtralForConditionalGeneration', 'QWenLMHeadModel', 'Qwen2AudioForConditionalGeneration', 'UltravoxModel', 'MllamaForConditionalGeneration', 'EAGLEModel', 'MedusaModel', 'MLPSpeculatorPreTrainedModel'])

llm = LLM(model="cl-tohoku/bert-base-japanese-v3", tensor_parallel_size=1)

ValueError: Model architectures ['BertForPreTraining'] are not supported for now. Supported architectures: .......

DarkLight1337 · 2024-12-06T12:24:12Z

Even if you use LLM.encode, you need to have the latest code of vLLM (not latest released version), because those sequence classification models were added to vLLM after the release.

yamamotolotation · 2024-12-06T13:19:49Z

Finally, I understand the meaning of the advice you initially gave me.

Thank you so much for your thorough support. Once I confirm that the changes in the main branch are included in the next release version, I'll give it a try!

DarkLight1337 · 2024-12-06T13:54:52Z

Sorry that I wasn't being clear before!

yamamotolotation added the feature request label Dec 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Add classification Task with AutoModelForSequenceClassification and BertForSequenceClassification #10939

[Feature]: Add classification Task with AutoModelForSequenceClassification and BertForSequenceClassification #10939

yamamotolotation commented Dec 6, 2024

DarkLight1337 commented Dec 6, 2024

yamamotolotation commented Dec 6, 2024

DarkLight1337 commented Dec 6, 2024

yamamotolotation commented Dec 6, 2024 •

edited

Loading

DarkLight1337 commented Dec 6, 2024

yamamotolotation commented Dec 6, 2024

yamamotolotation commented Dec 6, 2024

DarkLight1337 commented Dec 6, 2024 •

edited

Loading

yamamotolotation commented Dec 6, 2024 •

edited

Loading

DarkLight1337 commented Dec 6, 2024

[Feature]: Add classification Task with AutoModelForSequenceClassification and BertForSequenceClassification #10939

[Feature]: Add classification Task with AutoModelForSequenceClassification and BertForSequenceClassification #10939

Comments

yamamotolotation commented Dec 6, 2024

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

DarkLight1337 commented Dec 6, 2024

yamamotolotation commented Dec 6, 2024

DarkLight1337 commented Dec 6, 2024

yamamotolotation commented Dec 6, 2024 • edited Loading

DarkLight1337 commented Dec 6, 2024

yamamotolotation commented Dec 6, 2024

yamamotolotation commented Dec 6, 2024

DarkLight1337 commented Dec 6, 2024 • edited Loading

yamamotolotation commented Dec 6, 2024 • edited Loading

DarkLight1337 commented Dec 6, 2024

yamamotolotation commented Dec 6, 2024 •

edited

Loading

DarkLight1337 commented Dec 6, 2024 •

edited

Loading

yamamotolotation commented Dec 6, 2024 •

edited

Loading