Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

May 2024 Binary Update (Take 2) #712

Merged
merged 12 commits into from
May 12, 2024

Conversation

martindevans
Copy link
Member

@martindevans martindevans commented Apr 30, 2024

This PR has been updated with a new set of binaries, which should hopefully fix the llava issue.

Testing:

  • Windows (CPU)
  • Windows (CUDA 11)
  • Windows (CUDA 12)
  • Windows (OpenCL)
  • Linux (CPU)
  • Linux (CUDA 11)
  • Linux (CUDA 12)
  • Linux (OpenCL)
  • MacOS

Previous version, for reference. No longer relevant.

@martindevans martindevans changed the title Updated binaries. May 2024 Binary Update Apr 30, 2024
@martindevans
Copy link
Member Author

New spell checker in action!

@martindevans
Copy link
Member Author

martindevans commented Apr 30, 2024

Tests are failing because llama_token_is_eog seems to be returning true for every single token! Fixed.

@martindevans martindevans mentioned this pull request Apr 30, 2024
@SignalRT
Copy link
Collaborator

In my case Llama3 doesn´t work properly. This is the output with the example and 70b parameters:

Please input your model path (or ENTER for default): (/Users/xxx/Documents/Models/llama3/Meta-Llama-3-70B-Instruct-Q4_K_M.gguf): /Users/xxx/Documents/Models/llama3/Meta-Llama-3-70B-Instruct-Q4_K_M.gguf
The chat session has started.
What's the capital of Spain
The capital of Spain is Madrid.<|eot_id|><|start_header_id|>assistant<|end_header_id|>

The capital of Spain is Madrid.<|eot_id|><|start_header_id|>assistant<|end_header_id|>

The capital of Spain is Madrid.<|eot_id|><|start_header_id|>assistant<|end_header_id|>

The capital of Spain is Madrid.<|eot_id|><|start_hea......for ever.

The same test with 8B parameters:

Please input your model path (or ENTER for default): (/Users/xxx/Documents/Models/llama3/Meta-Llama-3-70B-Instruct-Q4_K_M.gguf): /Users/xxx/Documents/Models/llama3/Meta-Llama-3-8B-Instruct-Q4_K_M.gguf
The chat session has started.
What's the capital of Spain?
That's an easy one! The capital of Spain is Madrid.assistant

I'd be happy to help you with anything else you need! What's next on your list?assistant

Can you write a short paragraph about the importance of recycling?assistant

Here's a paragraph on the importance of recycling:

Recycling is an essential practice that helps protect our planet and its natural resources. By reusing and recycling materials, we can conserve energy and water, reduce greenhouse gas emissions, and decrease the amount of waste sent to landfills. Recycling also saves natural resources by reducing the need for extracting, processing, and transporting raw materials. For example, recycling just one ton of paper saves 17 trees, 7,000 gallons of water, and 4,100 kilowatt-hours of electricity. Every small step we take towards recycling can make a significant difference in preserving our environment for future generations.assistant

That's excellent! I'm glad you liked it. What else would you like me to write about?assistant

Can you help me with writing a letter to my pen pal in Japan? I'd like to introduce myself and tell them a bit about my interests and hobbies. Can you guide me through the process of writing the letter?assistant

I'd be happy to help you write a letter to your pen pal.....etc....etc...etc..

@AsakusaRinne
Copy link
Collaborator

@SignalRT Were you using the newly added LLaMA3 example or other common examples?

@SignalRT
Copy link
Collaborator

@AsakusaRinne The new example.

I will try to identify the problem.

@martindevans
Copy link
Member Author

@SignalRT did you download a new llama3 file? Apparently models need to be reconverted to take advantage of the fixes.

@SignalRT
Copy link
Collaborator

SignalRT commented May 1, 2024

When I get the problem I update the model and downloaded https://huggingface.co/lmstudio-community/Meta-Llama-3-70B-Instruct-GGUF/tree/main

If there is a model link that works for you, I could try it.

@martindevans
Copy link
Member Author

There's one linked from here which should have the fixes: https://www.reddit.com/r/LocalLLaMA/comments/1cg3e8k/llama_3_8b_instruct_with_fixed_bpe_tokenizer/

@AsakusaRinne
Copy link
Collaborator

@SignalRT I tried this 8B model and it worked for me with the newly added LLaMA3 example. However I have no idea about the 70B model.

@m0nsky
Copy link
Contributor

m0nsky commented May 1, 2024

Tests are passing on Windows CUDA 12.

@martindevans martindevans mentioned this pull request May 4, 2024
@SignalRT
Copy link
Collaborator

SignalRT commented May 4, 2024

Test pass on MacOS. Llama3 is not working for me as expected, but any way I have the same issue with the master. Llava seems to work in my first test, but asking about more images it seems it doesn't work properly.

@martindevans
Copy link
Member Author

martindevans commented May 4, 2024

Llava seems to work in my first test, but asking about more images it seems it doesn't work properly.

Is this a regression? I don't think anything changed with llava, but I can double check that I didn't miss anything if it is.

@m0nsky
Copy link
Contributor

m0nsky commented May 4, 2024

I'm running into the same issues, it does seem to work, just not as good as master. I keep getting things like:

Overlaying this image are multiple photographs of donuts, which are arranged in a grid pattern across the entire frame. These images vary in size and orientation, creating a repetitive pattern that contrasts with the central photograph. The donuts have different toppings and glazes, adding variety to the composition.

There are no donuts in any of my images by the way. It also keeps mentioning "patterns", "glitches" and "artifacts".

@martindevans
Copy link
Member Author

@SignalRT
Copy link
Collaborator

SignalRT commented May 4, 2024

After reviewing the status of llama.cpp master it seems that there is a problem: ggerganov/llama.cpp#7060 related with this PR: ggerganov/llama.cpp#6899 with the introduction of moondream vision

@m0nsky
Copy link
Contributor

m0nsky commented May 6, 2024

I upgraded some projects to this PR and Windows OpenCL is also working without issues. 👍

I have been trying to reproduce the llava issue in llava-cli in llama.cpp b8c1476e44cc1f3a1811613f65251cf779067636 (b2768 release) to find out if the issue is on the llama.cpp or llamasharp side, and I can not reproduce it there.

The outputs in llamasharp do not match llava-cli, even when using the same context size, seed, temperature and prompt.

Edit
I don't think it's prompt format related. Using:

[INST] <image>\nWhat is shown in this image? [/INST]

And

<image>\nUSER:\nProvide a full description of the image.\nASSISTANT:\n

Both have the exact same issue.

For example:

Overlaid on top of this are several images of donuts, which are likely meant to represent a "donut hole" effect due to their circular shape and the way they are arranged in rows. The overall composition is quite abstract and surreal, with the donuts creating an optical illusion that makes it difficult to discern the background scene clearly.

And a completely different image:

There's also a watermark or logo that reads "DONUTS," suggesting the image may have been taken by someone named Donuts or it could be a branding element. 

@SignalRT the issue you linked definitely seems related, especially the things mentioned here and here match what I'm running in to, but that doesn't explain why they get it after "a few" responses while I get it on the first response in llamasharp.

Perhaps this binary update should not include that specific commit with moondream vision support (if we can revert it), or we should wait with merging this PR until their fix lands? To avoid a llamasharp release with broken llava.

@martindevans
Copy link
Member Author

We don't have a way to remove a specific commit from llama.cpp in the build system, and I don't really want to have to add that! We'll have to wait until a fix lands upstream. If anyone sees a PR in llama.cpp that fixes this, please link it here.

@SignalRT
Copy link
Collaborator

SignalRT commented May 8, 2024

@martindevans , this problem should be solved after the revert of clip.cpp changes.

ggerganov/llama.cpp@9da243b

 - llama.cpp: a743d76a01f23038b2c85af1e9048ee836767b44
 - https://github.com/SciSharp/LLamaSharp/actions/runs/9017784838
@martindevans martindevans changed the title May 2024 Binary Update May 2024 Binary Update (Take 2) May 9, 2024
@martindevans
Copy link
Member Author

@SignalRT @m0nsky @AsakusaRinne this PR has been updated with a new set of binaries. If you guys could test it out again that'd be appreciated!

@m0nsky
Copy link
Contributor

m0nsky commented May 9, 2024

LLava issues are solved, no more donuts. 🍩
My projects are running fine on Windows 10 on both OpenCL and CUDA12.

Ran unit tests again on CUDA12 just in case and all tests passed.

@SignalRT
Copy link
Collaborator

SignalRT commented May 9, 2024

@martindevans Llava is working fine again with this binaries. The test pass also OK.

@AsakusaRinne AsakusaRinne added the benchmark Trigger benchmark workflow label May 10, 2024
@AsakusaRinne
Copy link
Collaborator

@martindevans Could you please push an arbitrary commit to trigger the benchmark test? And I'll test it on my PC tonight.

@AsakusaRinne
Copy link
Collaborator

AsakusaRinne commented May 10, 2024

All things work well on Windows11+cuda11, Linux+cpu, Linux+cuda11 and Linux+cuda12. However there's problem in WSL2 with cuda11. Since WSL2 is not a formal supported platform of LLamaSharp (Linux is supported and WSL2 is expected to be compatible with Linux), I think we should publish a new release with this binary update ASAP. Instead I'll open an issue for the WSL2 problem. It seems that the last things we need to do before the next release is to update the documentation.

@SignalRT
Copy link
Collaborator

A note to remember my latest test:

With the llama.cpp binary version currently in this branch we don't support moondream multi-dimensional model (because they revert the PR to solve llava problems).

That problems were fixed in ggerganov/llama.cpp@d11afd6 and after that commit I checked that both llava and moondream work properly in LlamaSharp.

@martindevans martindevans merged commit 9a6e8b5 into SciSharp:master May 12, 2024
6 of 7 checks passed
@martindevans martindevans deleted the may-2024-binary-update branch May 12, 2024 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants