GitHub Action Pipeline Improvements #245

martindevans · 2023-11-04T15:10:54Z

Moved "common" defines (i.e. things that are the same on all platforms) into a single env var.

This common define include -DLLAMA_NATIVE=OFF which should fix the issue with AVX2 missing in builds where that isn't defined (e.g. CUDA).

- Added `COMMON_DEFINE` env var which will contain all common defines for all platforms (experimental)

martindevans · 2023-11-04T15:11:07Z

Test run here: https://github.com/martindevans/LLamaSharp/actions/runs/6755600541

lexxsoft · 2023-11-04T16:06:59Z

@martindevans , I have checked out and build martindevans:fix/more_march_native branch, then reference

\LLama\bin\Debug\net7.0\LLamaSharp.dll
\LLama\runtimes\libllama-cuda12.dll
I am getting:

The type initializer for 'LLama.Native.NativeApi' threw an exception.

Or do I also need to add

\LLama\runtimes\libllama.dll
?

If I do it works, but I am not sure if I am using CUDA12, as TaskManager does not show any GPU load.

martindevans · 2023-11-04T16:27:39Z

You need to download the binaries from this run.

To install them in the project you need to overwrite the various files in LLamaSharp\LLama\runtimes:

deps.zip contains all the various CPU backends. Take the deps/AVX2 binaries from that zip and overwrite libllama.so and libllama.dll
deps.zip also contains the MacOS binaries. Take the files from deps/macos-metal and overwrite libllama.dylib and ggml-metal.metal
Download cu11.7.1.zip, rename those and overwrite libllama-cuda11.dll/libllama-cuda11.so.
Download cu12.7.1.zip, rename those and overwrite libllama-cuda12.dll/libllama-cuda12.so.

That should get you a completely up to date set of binaries.

There is one major caveat: there's not really any stability in the llama.cpp API from one version to the next. These binaries have been built with the latest version of llama.cpp and there's no guarantee they'll be compatible with LLamaSharp. If you encounter errors due to that it'll take a bit longer to update LLamaSharp to the new version.

martindevans · 2023-11-04T16:29:24Z

Note this should also help with #220 since it will add AVX2 to the linux binaries as well. Hopefully faster CI will be less flakey!

lexxsoft · 2023-11-04T16:45:15Z

I am not sure if I am following correctly.

Since I am using Cuda12 backend, I don't need anything from deps.zip, but just to be safe, I take DLL from AVX2 folder and overwrite it on original dependencies, though I am not referencing this file anywhere. Or should I?

Then I take cu12.1.0.zip and extract libllama.dll (there are no AVX2 variant), rename it libllama-cuda12.dll and place in the previous location.

Then I reference

\LLama\bin\Debug\net7.0\LLamaSharp.dll
\LLama\runtimes\libllama-cuda12.dll

But the result is the same:

The type initializer for 'LLama.Native.NativeApi' threw an exception.

martindevans · 2023-11-04T16:53:08Z

Since I am using Cuda12 backend, I don't need anything from deps.zip, but just to be safe, I take DLL from AVX2 folder and overwrite it on original dependencies, though I am not referencing this file anywhere. Or should I?

If you're using the LLamaSharp project then it should have everything setup already to reference the DLLs where necessary. Since you're overwriting the existing DLLs you shouldn't need to change anything else.

You're right that you don't need anything from the deps.zip folder, but I'd rather not mix up different backend versions even while testing. it's a recipe for confusion!

there are no AVX2 variant

At the moment there's only one cuda build and it uses AVX2 (as of this PR). In the future we might want to consider building all the variants for CUDA as well as all the variants for CPU, but the CUDA build is extremely slow so it's not doing that at the moment.

Then I reference

I'm not quite sure what you mean here? If you've pulled my fork then you should just be able to run on of the LLamaSharp examples with no further changes (except setting n_gpu_layers of course).

The type initializer for 'LLama.Native.NativeApi' threw an exception.

This will either be caused by the version compatibility issue I mentioned, or you've not got the binaries setup correctly so it can't find them.

lexxsoft · 2023-11-04T17:04:22Z

I'm not quite sure what you mean here? If you've pulled my fork then you should just be able to run on of the LLamaSharp examples with no further changes (except setting n_gpu_layers of course).

I was using those 2 dependencies in my own project, where I know what works and what does not.

Ok, let's try your way: I have updated the files as specified, then I have launched "TestRunner.cs", and chose option 4. I have specified the same model as in my own code, then I am getting total gibberish:

llama_new_context_with_model: compute buffer total size = 96.63 MB
emia--------------------------fLA (d ? un'our and 1 and 39a pro, especially impress. matricesell with------------------------------------------------

Plus I think I am using ~~GPU~~ CPU backend in the test, I am not sure how to switch to Cuda12.

martindevans · 2023-11-04T17:08:09Z

Unfortunately gibberish probably means there's some incompatible change in the llama.cpp API that I'll need to fix before you can test this. Hopefully I'll get time to do that this weekend.

hswlab · 2023-11-04T17:43:32Z

I just tried the libllama.dll from the avx folder of the current binaries. The Session is starting but the Chat Bot is responding strange responses. When I'm asking "What is an apple?", he is responding: "kwiet gegenüber then". I suspect it's the same problem that @lexxsoft was talking about :)

However, the response speed does not seem to have changed compared to the avx binaries I used 4 days ago. Can someone explain whether the AI calculates the answers to a question first and then returns them word by word, or are the answers calculated word by word? Since there are such unusually long pauses between each word, one might think that it is calculated word by word. This is also how I explain the gibberish that is sometimes returned. I then think the AI has lost the red thread during its answer :)

Can you actually configure that answers should only be updated after a complete sentence? Maybe that would be faster than getting an update at such short intervals?

martindevans · 2023-11-04T18:05:12Z

However, the response speed does not seem to have changed compared to the avx binaries I used 4 days ago.

This PR has added AVX2 support to Linux and Cuda binaries. So unless you're using one of those platforms and AVX2 (not just AVX) you won't see any difference!

Can someone explain whether the AI calculates the answers to a question first and then returns them word by word, or are the answers calculated word by word?

Language models never "calculate the answer" or really do any kind of thinking. They are always just picking the most probable next token in the sequence. They fundamentally work token by token, since you can't generate N+2 until you know what was picked for N+1.

AsakusaRinne · 2023-11-04T18:13:33Z

It seems that every time we compile in the pipeline, we clone the latest master branch. Shall we add a file containing the commit id to make it fixed in the pipeline? Thus when we compile the llama.cpp, we read the commit id from the file first and checkout the repository to it.

martindevans · 2023-11-04T18:32:19Z

It seems that every time we compile in the pipeline, we clone the latest master branch.

Yeah that is something I've thought about fixing. At the moment the pipeline is usually only run manually when I specifically want updated binaries, but it would probably handy to have some kind of override to specify the version.

I'd probably do it with a new input here which has a default value of master.

martindevans · 2023-11-04T22:21:20Z

I'm planning to integrate the new binaries into this PR tomorrow. As part of that I'll fix whatever has broken due to the updated llama.cpp. Once that's done testing should be as simple as pulling this branch and running the examples :)

martindevans · 2023-11-05T15:59:19Z

@SignalRT already started a PR to update the binaries to a newer version. I'll merge this PR, so we can use the action to generate binaries for #249

martindevans · 2023-11-05T16:54:31Z

@lexxsoft any testing you can do over on that other PR would be much appreciated :)

martindevans added 2 commits November 4, 2023 15:00

- Explicitly added -DLLAMA_NATIVE=OFF (i.e. disabling march=native)

f850380

- Added `COMMON_DEFINE` env var which will contain all common defines for all platforms (experimental)

Moved common defines into one single env var

32deb56

martindevans mentioned this pull request Nov 4, 2023

v0.6.0 significantly reduced performance #225

Closed

martindevans merged commit ec20ab3 into SciSharp:master Nov 5, 2023
6 checks passed

martindevans deleted the fix/more_march_native branch November 5, 2023 15:59

lexxsoft mentioned this pull request Nov 5, 2023

Align with llama.cpp b1488 #249

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Action Pipeline Improvements #245

GitHub Action Pipeline Improvements #245

martindevans commented Nov 4, 2023

martindevans commented Nov 4, 2023

lexxsoft commented Nov 4, 2023 •

edited

Loading

martindevans commented Nov 4, 2023

martindevans commented Nov 4, 2023

lexxsoft commented Nov 4, 2023

martindevans commented Nov 4, 2023

lexxsoft commented Nov 4, 2023 •

edited

Loading

martindevans commented Nov 4, 2023

hswlab commented Nov 4, 2023

martindevans commented Nov 4, 2023

AsakusaRinne commented Nov 4, 2023

martindevans commented Nov 4, 2023

martindevans commented Nov 4, 2023

martindevans commented Nov 5, 2023

martindevans commented Nov 5, 2023

GitHub Action Pipeline Improvements #245

GitHub Action Pipeline Improvements #245

Conversation

martindevans commented Nov 4, 2023

martindevans commented Nov 4, 2023

lexxsoft commented Nov 4, 2023 • edited Loading

martindevans commented Nov 4, 2023

martindevans commented Nov 4, 2023

lexxsoft commented Nov 4, 2023

martindevans commented Nov 4, 2023

lexxsoft commented Nov 4, 2023 • edited Loading

martindevans commented Nov 4, 2023

hswlab commented Nov 4, 2023

martindevans commented Nov 4, 2023

AsakusaRinne commented Nov 4, 2023

martindevans commented Nov 4, 2023

martindevans commented Nov 4, 2023

martindevans commented Nov 5, 2023

martindevans commented Nov 5, 2023

lexxsoft commented Nov 4, 2023 •

edited

Loading

lexxsoft commented Nov 4, 2023 •

edited

Loading