Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Model] Initial support for LLaVA-NeXT #4199

Merged
merged 178 commits into from
Jun 10, 2024

Conversation

DarkLight1337
Copy link
Member

@DarkLight1337 DarkLight1337 commented Apr 19, 2024

I have added experimental support for LLaVA-NeXT, with one big caveat: the size of the input image is fixed by the configuration, otherwise the feature size (i.e. number of tokens to duplicate) would vary depending on the runtime input. This prevents us from taking full advantage of the extra resolution. Still, this provides us access to a 34b model which should improve over their 7b and 13b LLaVA-1.5 models.

Related Contributions

This PR completes part of #3978.

Since this PR depends on the functionalities proposed in #4197 to pass image_sizes to LLaVA-NeXT model, it is set to Draft status until that is merged. Afterwards, you should be able to see the diffs that are exclusive to this PR.

To avoid unnecessary resource usage, this branch is frozen (except for critical fixes) until its dependencies have all been merged.

Features

  1. Experimental support for LLaVA-NeXT (also known as: LLaVA-1.6, LLaVA-1.5-HD)
    • Added LlavaNextForConditionalGeneration to the list of supported architectures. (Tested with llava-hf/llava-v1.6-34b-hf)

    • Limitation: The input image is resized to a static image_input_shape (NCHW format, specified in the configuration) before passing it to the model; otherwise, the number of <image> input tokens required in the text prompt (equal to image_feature_size) would vary at runtime depending on the original size of the input image. The following table shows the image_feature_size which you need to specify in the configuration for each image_input_shape:

      Width (→)
      Height (↓)
      336 448 560 672 784 896 1008 1120 1232 1344
      336 1176 1368 1560 1752 1944 2136 2328 2182 2036 1890
      448 1376 2928 2438 2144 1948 1752 1896 2040 2184 2328
      560 1576 2448 2928 2536 2242 2046 1850 1752 1848 1992
      672 1776 2160 2544 2928 2634 2340 2144 1948 1850 1752
      784 1976 1968 2256 2640 2928 2634 2438 2242 2046 1948
      896 2176 1776 2064 2352 2640 2928 2634 2438 2242 2144
      1008 2376 1926 1872 2160 2448 2640 2928 2732 2536 2340
      1120 2232 2076 1776 1968 2256 2448 2736 2928 2732 2536
      1232 2088 2226 1876 1872 2064 2256 2544 2736 2928 2732
      1344 1944 2376 2026 1776 1968 2160 2352 2544 2736 2928
      • For other image sizes, you can attempt to run the model first; the resulting error should tell you the expected image_feature_size.

Future Work

We can overcome the current limitations of static input shape after #5215 has been merged. Once that is addressed, we can openly support this model by adding it to the docs and README. See:

- Also add docs for basic VLM usage
- Also add docs for basic VLM usage
- Note that LLaVA-1.5 has been refactored to facilitate this
@jeejeelee
Copy link
Collaborator

Nice work, any plan to port CLIPVisionModel's code?

@DarkLight1337
Copy link
Member Author

Nice work, any plan to port CLIPVisionModel's code?

@jeejeelee It is outside the scope of this PR; however you are welcome to voice your thoughts in #4194.

@DarkLight1337
Copy link
Member Author

Thanks for the review! I have addressed your comments.

Copy link
Member

@ywang96 ywang96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for adding this model and I'll merge this after pushing a final addition on the docs.

@ywang96 ywang96 enabled auto-merge (squash) June 10, 2024 04:25
@ywang96 ywang96 merged commit 6b29d6f into vllm-project:main Jun 10, 2024
103 checks passed
dtrifiro pushed a commit to opendatahub-io/vllm that referenced this pull request Jun 10, 2024
@DarkLight1337 DarkLight1337 deleted the llava-next branch June 10, 2024 13:01
@DarkLight1337
Copy link
Member Author

Oops, turns out that I forgot to copy some functions from LLaVA into LLaVA-NeXT. Drafting a quick fix now.

robertgshaw2-neuralmagic pushed a commit to neuralmagic/nm-vllm that referenced this pull request Jun 11, 2024
tjohnson31415 added a commit to tjohnson31415/vllm that referenced this pull request Jun 11, 2024
* upstream/main: (126 commits)
  [Bugfix][Frontend] Cleanup "fix chat logprobs" (vllm-project#5026)
  [Bugfix] OpenAI entrypoint limits logprobs while ignoring server defined --max-logprobs (vllm-project#5312)
  [Misc] Various simplifications and typing fixes (vllm-project#5368)
  [ci] Fix Buildkite agent path (vllm-project#5392)
  [Doc] Add documentation for FP8 W8A8 (vllm-project#5388)
  Bump version to v0.5.0 (vllm-project#5384)
  [Docs] Alphabetically sort sponsors (vllm-project#5386)
  [Docs] Add Docs on Limitations of VLM Support (vllm-project#5383)
  [ci] Mount buildkite agent on Docker container to upload benchmark results (vllm-project#5330)
  [ci] Use small_cpu_queue for doc build (vllm-project#5331)
  [Bugfix] Fix LLaVA-NeXT (vllm-project#5380)
  [Feature][Frontend]:  Continued `stream_options` implementation also in CompletionRequest (vllm-project#5319)
  [Model] Initial support for LLaVA-NeXT (vllm-project#4199)
  [Misc] Improve error message when LoRA parsing fails (vllm-project#5194)
  [misc][typo] fix typo (vllm-project#5372)
  [Frontend][Misc] Enforce Pixel Values as Input Type for VLMs in API Server (vllm-project#5374)
  [Misc] Update to comply with the new `compressed-tensors` config (vllm-project#5350)
  [Bugfix] Fix KeyError: 1 When Using LoRA adapters (vllm-project#5164)
  [Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (vllm-project#5047)
  [mis][ci/test] fix flaky test in test_sharded_state_loader.py (vllm-project#5361)
  ...
joerunde pushed a commit to joerunde/vllm that referenced this pull request Jun 17, 2024
xjpang pushed a commit to xjpang/vllm that referenced this pull request Jun 27, 2024
xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 8, 2024
xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 24, 2024
Temirulan pushed a commit to Temirulan/vllm-whisper that referenced this pull request Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants