-
-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FIX]avoid initialize process group when using a single GPU #2496
Conversation
This is great! please add a test and let me know when it is ready for review! |
@simon-mo , I have completed the coding and tested using an example similar to the one below, could you please take a little time to review this PR? from vllm import LLM, SamplingParams
# Sample prompts.
prompts = [
"Hello, my name is",
"The president of the United States is",
"The capital of France is",
"The future of AI is",
]
# Create a sampling params object.
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
# Create an LLM.
llm0 = LLM(model="facebook/opt-125m",trust_remote_code=True,gpu_memory_utilization=0.3)
# Generate texts from the prompts. The output is a list of RequestOutput objects
# that contain the prompt, generated text, and other information.
outputs = llm0.generate(prompts, sampling_params)
# Print the outputs.
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
# Create an LLM.
llm1 = LLM(model="facebook/opt-125m",trust_remote_code=True,gpu_memory_utilization=0.5)
# Generate texts from the prompts. The output is a list of RequestOutput objects
# that contain the prompt, generated text, and other information.
outputs = llm1.generate(prompts, sampling_params)
# Print the outputs.
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}") PS:in my local host , the LLM is chatglm3. |
@simon-mo I apologize to disturb you,I want to know what should I do next |
Thanks for the ping. hmm but the final approach is not what I had in mind. Is monkey patching necessary here? Looks like we can just change the functions inside vLLM to achieve what you wanted. |
@simon-mo Thanks for your review. In my approach, monkey patching is necessary. The reason for using this approach instead of changing the functions inside vLLM is to avoid modifying too much original code (such as : Of course, these are my personal understandings. I believe you might have better methods and deeper considerations. Please let me know. |
Please directly modify the vLLM code so the code in the end is simple and maintainable. |
@simon-mo Thanks for your review. Considering that you don't agree with my approach, I will close this PR for now. I will think about how to directly modify the vLLM code directly. |
Thank you for understanding! |
Refer to:
#117
#244
#565
#654