Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Llava #2639

Closed
wants to merge 41 commits into from
Closed

[WIP] Llava #2639

wants to merge 41 commits into from

Conversation

BabyChouSr
Copy link
Collaborator

@BabyChouSr BabyChouSr commented Nov 4, 2023

Why are these changes needed?

Provide the ability to interact with multimodal models.

TODO:

  • adapter.conversation_template is changed
  • adapter.load_model works
  • test text generation on CLI
  • test image prompting on CLI
  • test multimodal worker endpoint
  • add gradio demo

CLI commands:

python3 -m fastchat.serve.cli --model-path llava-hf/llava-1.5-7b-hf --debug

Testing multimodal worker:

python3 -m fastchat.serve.controller
python3 -m fastchat.serve.model_worker --model-path llava-hf/llava-1.5-7b-hf --multimodal
python3 -m fastchat.serve.test_message --model-name llava-hf/llava-1.5-7b-hf --max-new-tokens 256

Example output (with --max-new-tokens 256):
image

Gradio interface:
image
image

Unit test outputs:

image

Things to think about:

  • I created another file for the vision models, is there a smarter way to do this? Only some functions change but there is shared functionality between gradio_web_server_vision and gradio_web_server.
  • How do I get the model list for vision models?
  • Should we keep the ability to process images? (Pad, resize, crop) (DELETED for now)

Checks

  • I've run format.sh to lint the changes in this PR.
  • I've included any doc changes needed.
  • I've made sure the relevant tests are passing (if applicable).

@BabyChouSr BabyChouSr marked this pull request as ready for review November 7, 2023 01:25
@BabyChouSr BabyChouSr requested a review from merrymercy November 7, 2023 01:29
Copy link
Member

@merrymercy merrymercy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall design/change looks very good! I only made some minor style suggestions.

Make sure your changes do not break existing functions by running some unit tests here (https://github.com/lm-sys/FastChat/tree/main/tests#unit-tests-for-fastchat)

fastchat/constants.py Outdated Show resolved Hide resolved
fastchat/conversation.py Show resolved Hide resolved
fastchat/model/model_adapter.py Outdated Show resolved Hide resolved
fastchat/serve/gradio_web_server_vision.py Outdated Show resolved Hide resolved
fastchat/serve/inference.py Outdated Show resolved Hide resolved
fastchat/serve/inference.py Outdated Show resolved Hide resolved
Copy link
Member

@merrymercy merrymercy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Rename fastchat/serve/examples/dog.jpeg -> fastchat/serve/example_images/dog.jpeg
  2. Delete playground/images/python.png, playground/images/sunset.jpg

fastchat/conversation.py Outdated Show resolved Hide resolved
fastchat/serve/gradio_web_server_vision.py Outdated Show resolved Hide resolved
fastchat/serve/gradio_web_server_vision.py Outdated Show resolved Hide resolved
fastchat/utils.py Outdated Show resolved Hide resolved
fastchat/conversation.py Outdated Show resolved Hide resolved
fastchat/serve/multimodal_model_worker.py Outdated Show resolved Hide resolved
fastchat/serve/multimodal_model_worker.py Outdated Show resolved Hide resolved
@infwinston
Copy link
Member

Do we need to add additional dependency? I got the below error for python3 -m fastchat.serve.cli --model-path liuhaotian/llava-v1.5-7b --multimodal --debug

Traceback (most recent call last):
  File "/opt/conda/envs/chatbot/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/conda/envs/chatbot/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/gcpuser/sky_workdir/FastChat/fastchat/serve/cli.py", line 310, in <module>
    main(args)
  File "/home/gcpuser/sky_workdir/FastChat/fastchat/serve/cli.py", line 226, in main
    chat_loop(
  File "/home/gcpuser/sky_workdir/FastChat/fastchat/serve/inference.py", line 371, in chat_loop
    model, tokenizer, image_processor = load_model(
  File "/home/gcpuser/sky_workdir/FastChat/fastchat/model/model_adapter.py", line 321, in load_model
    model, tokenizer, image_processor = adapter.load_model(model_path, kwargs)
  File "/home/gcpuser/sky_workdir/FastChat/fastchat/model/model_adapter.py", line 1870, in load_model
    vision_tower.load_model()
  File "/home/gcpuser/sky_workdir/FastChat/fastchat/model/llava/multimodal_encoder/clip_encoder.py", line 23, in load_model
    self.image_processor = CLIPImageProcessor.from_pretrained(
  File "/opt/conda/envs/chatbot/lib/python3.9/site-packages/transformers/utils/import_utils.py", line 1259, in __getattribute__
    requires_backends(cls, cls._backends)
  File "/opt/conda/envs/chatbot/lib/python3.9/site-packages/transformers/utils/import_utils.py", line 1247, in requires_backends
    raise ImportError("".join(failed))
ImportError:
CLIPImageProcessor requires the PIL library but it was not found in your environment. You can install it with pip:
`pip install pillow`. Please note that you may need to restart your runtime after installation.

@merrymercy
Copy link
Member

You can add new dependency here

model_worker = ["accelerate>=0.21", "peft", "sentencepiece", "torch", "transformers>=4.31.0", "protobuf"]
, either by adding them to model_worker or create a new tag vision =

@infwinston
Copy link
Member

Sorry could you resolve conflicts?

if image_file.startswith("http://") or image_file.startswith("https://"):
response = requests.get(image_file)
image = Image.open(BytesIO(response.content)).convert("RGB")
elif base64.b64encode(base64.b64decode(image_file)) == image_file.encode():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the input is not in the base64 format, this line will raise an error. Maybe swap the last two branches, or catch the error?

@shaunaa126
Copy link

@BabyChouSr this is an awesome change, thank you for contributing to this. I was trying to test this by running on CPU but I get the following error when running the test message.

python -m fastchat.serve.model_worker --model-path llava-1.5-7b-hf --multimodal --device cpu

2023-12-29 11:48:35 | INFO | model_worker | args: Namespace(host='localhost', port=21002, worker_address='http://localhost:21002', controller_address='http://localhost:21001', model_path='llava-1.5-7b-hf', revision='main', device='cpu', gpus=None, num_gpus=1, max_gpu_memory=None, dtype=None, load_8bit=False, cpu_offloading=False, gptq_ckpt=None, gptq_wbits=16, gptq_groupsize=-1, gptq_act_order=False, awq_ckpt=None, awq_wbits=16, awq_groupsize=-1, enable_exllama=False, exllama_max_seq_len=4096, exllama_gpu_split=None, exllama_cache_8bit=False, enable_xft=False, xft_max_seq_len=4096, xft_dtype=None, model_names=None, conv_template=None, embed_in_truncate=False, limit_worker_concurrency=5, stream_interval=2, no_register=False, seed=None, debug=False, ssl=False, multimodal=True)
2023-12-29 11:48:35 | INFO | model_worker | Loading the model ['llava-1.5-7b-hf'] on worker 2082b194 ...
Loading checkpoint shards:   0%|                                                   | 0/3 [00:00<?, ?it/s]
Loading checkpoint shards: 100%|███████████████████████████████████████████| 3/3 [00:00<00:00, 22.60it/s]
Loading checkpoint shards: 100%|███████████████████████████████████████████| 3/3 [00:00<00:00, 22.55it/s]
2023-12-29 11:48:35 | ERROR | stderr | 
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
2023-12-29 11:48:35 | INFO | model_worker | Register to controller
2023-12-29 11:48:35 | ERROR | stderr | INFO:     Started server process [174724]
2023-12-29 11:48:35 | ERROR | stderr | INFO:     Waiting for application startup.
2023-12-29 11:48:35 | ERROR | stderr | INFO:     Application startup complete.
2023-12-29 11:48:35 | ERROR | stderr | INFO:     Uvicorn running on http://localhost:21002 (Press CTRL+C to quit)

python -m fastchat.serve.test_message --model-name llava-1.5-7b-hf --max-new-tokens 256

Models: ['llava-1.5-7b-hf']
worker_addr: http://localhost:21002
USER: Tell me a story with more than 1000 words.
ASSISTANT: **NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS PAGE.**

("addmm_impl_cpu_" not implemented for 'Half')

@BabyChouSr
Copy link
Collaborator Author

Closing this PR for now. This PR addresses many concerns including 1. Multimodal support 2. Gradio web server for multimodal models 3. Support for Huggingface multimodal models 4. GPT-4-V support. I will use this PR as reference to decompose into separate PRs.

@BabyChouSr BabyChouSr closed this Jan 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants