-
Notifications
You must be signed in to change notification settings - Fork 98
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #1049 from quarkiverse/llama3-java-docs
Add basic docs for Llama3.java
- Loading branch information
Showing
1 changed file
with
71 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,71 @@ | ||
= Llama3.java | ||
|
||
include::./includes/attributes.adoc[] | ||
|
||
https://github.com/mukel/llama3.java[Llama3.java] provides a way to run large language models (LLMs) locally and in pure Java and embedded in your Quarkus application. | ||
You can run various https://huggingface.co/mukel[models] such as LLama3, Mistral on your machine. | ||
|
||
[#_prerequisites] | ||
== Prerequisites | ||
To use Llama3.java it is necessary to run on Java 21 or later. This is because it utilizes the new https://openjdk.org/jeps/448[Vector API] for faster inference. Note that the Vector API is still a Java preview features, so it is required to explicitly enable it. | ||
Since the Vector API are still a preview feature in Java 21, and up to the latest Java 23, it is necessary to enable it on the JVM by launching it with the following flags: | ||
[source] | ||
---- | ||
--enable-preview --enable-native-access=ALL-UNNAMED --add-modules jdk.incubator.vector | ||
---- | ||
or equivalently to configure the quarkus-maven-plugin in the pom.xml file of your project as it follows: | ||
[source,xml,subs=attributes+] | ||
---- | ||
<configuration> | ||
<jvmArgs>--enable-preview --enable-native-access=ALL-UNNAMED</jvmArgs> | ||
<modules> | ||
<module>jdk.incubator.vector</module> | ||
</modules> | ||
</configuration> | ||
---- | ||
=== Dev Mode | ||
Quarkus LangChain4j automatically handles the pulling of the models configured by the application, so there is no need for users to do so manually. | ||
WARNING: When running Quarkus in dev mode C2 compilation is not enabled and this can make Llama3.java excessively slow. This limitation will be fixed with Quarkus 3.17 when `<forceC2>true</forceC2>` is set. | ||
WARNING: Models are huge, so make sure you have enough disk space. | ||
NOTE: Due to model's large size, pulling them can take time | ||
== Using Llama3.java | ||
To let Llama3.java running inference on your models, add the following dependency into your project: | ||
[source,xml,subs=attributes+] | ||
---- | ||
<dependency> | ||
<groupId>io.quarkiverse.langchain4j</groupId> | ||
<artifactId>quarkus-langchain4j-llama3-java</artifactId> | ||
<version>{project-version}</version> | ||
</dependency> | ||
---- | ||
If no other LLM extension is installed, link:../ai-services.adoc[AI Services] will automatically utilize the configured Llama3.java model. | ||
By default, the extension uses as model https://huggingface.co/mukel/Llama-3.2-1B-Instruct-GGUF[`mukel/Llama-3.2-1B-Instruct-GGUF`]. | ||
You can change it by setting the `quarkus.langchain4j.llama3.chat-model.model-name` property in the `application.properties` file: | ||
[source,properties,subs=attributes+] | ||
---- | ||
quarkus.langchain4j.llama3.chat-model.model-name=mukel/Llama-3.2-3B-Instruct-GGUF | ||
---- | ||
=== Configuration | ||
Several configuration properties are available: | ||
include::includes/quarkus-langchain4j-llama3-java.adoc[leveloffset=+1,opts=optional] | ||