JetInfero is a nimble and high-performance library that enables developers to integrate local Large Language Models (LLMs) effortlessly into their applications. Powered by llama.cpp ๐๏ธ, JetInfero prioritizes speed, flexibility, and ease of use ๐. Itโs compatible with any language supporting Win64, Unicode, and dynamic-link libraries (DLLs).
- Optimized for Speed โก๏ธ: Built on llama.cpp, JetInfero offers lightning-fast inference capabilities with minimal overhead.
- Cross-Language Support ๐: Seamlessly integrates with Delphi, C++, C#, Java, and other Win64-compatible environments.
- Intuitive API ๐ฌ: A clean procedural API simplifies model management, inference execution, and callback handling.
- Customizable Templates ๐๏ธ: Tailor input prompts to suit different use cases with ease.
- Scalable Performance ๐: Leverage GPU acceleration, token streaming, and multi-threaded execution for demanding workloads.
JetInfero expands your toolkit with capabilities such as:
- Dynamic chatbot creation ๐ฃ๏ธ.
- Automated text generation ๐ and summarization ๐ป.
- Context-aware content creation ๐.
- Real-time token streaming for adaptive applications โ.
- Operates entirely offline ๐, ensuring sensitive data remains secure.
- GPU acceleration supported via Vulkan for enhanced performance ๐.
- Configure GPU utilization with
AGPULayers
๐. - Allocate threads dynamically using
AMaxThreads
๐. - Access performance metrics to monitor throughput and efficiency ๐.
JetInferoโs template system simplifies input customization. Templates include placeholders such as:
{role}
: Denotes the senderโs role (e.g.,user
,assistant
).{content}
: Represents the message content.
For example:
jiDefineModel(
// Model Filename
'C:/LLM/GGUFDolphin3.0-Llama3.1-8B-Q4_K_M.gguf',
// Model Refname
'Dolphin3.0-Llama3.1-8B-Q4_K_M.gguf',
// Model Template
'<|im_start|>{role}\n{content}<|im_end|>',
// Model Template End
'<|im_start|>assistant',
// Capitalize Role
False,
// Max Context
8192,
// Main GPU, -1 for best, 0..N GPU number
-1,
// GPU Layers, -1 for max, 0 for CPU only, 1..N for layer
-1,
// Max threads, default 4, max will be physical CPU count
4
);
- Adaptability ๐: Customize prompts for various LLMs and use cases.
- Consistency ๐: Ensure predictable inputs for reliable results.
- Flexibility ๐: Modify prompt formats for tasks like JSON or markdown generation.
- Define models with
jiDefineModel
๐จ. - Load/unload models dynamically using
jiLoadModel
andjiUnloadModel
๐. - Save/load model configurations with
jiSaveModelDefines
andjiLoadModelDefines
๐๏ธ. - Clear all model definitions using
jiClearModelDefines
๐งน.
- Perform inference tasks with
jiRunInference
โ๏ธ. - Stream real-time tokens via
InferenceTokenCallback
โ. - Retrieve responses using
jiGetInferenceResponse
๐๏ธ.
- Retrieve detailed metrics like tokens/second, input/output token counts, and execution time via
jiGetPerformanceResult
๐.
-
Download the Repository ๐ฆ
- Download here and extract the files to your preferred directory ๐.
Ensure
JetInfero.dll
is accessible in your project directory. -
Acquire a GGUF Model ๐ง
- Obtain a model from Hugging Face, such as Dolphin3.0-Llama3.1-8B-Q4_K_M-GGUF, a good general purpose model. You can download directly from our Hugging Face account. See the model card for more information.
- Save it to a directory accessible to your application (e.g.,
C:/LLM/GGUF
) ๐พ.
-
Add JetInfero to Your Project ๐จ
- Include the
JetInfero
unit in your Delphi project.
- Include the
-
Ensure GPU Compatibility ๐ฎ
- Verify Vulkan compatibility for enhanced performance โก. Adjust
AGPULayers
as needed to accommodate VRAM limitations ๐.
- Verify Vulkan compatibility for enhanced performance โก. Adjust
-
Building JetInfero DLL ๐ ๏ธ
- Open and compile the
JetInfero.dproj
project ๐. This process will generate the 64-bitJetInfero.dll
in thelib
folder ๐๏ธ. - The project was created and tested using Delphi 12.2 on Windows 11 24H2 ๐ฅ๏ธ.
- Open and compile the
-
Using JetInfero ๐
- JetInfero can be used with any programming language that supports Win64 and Unicode bindings ๐ป.
- Ensure the
JetInfero.dll
is included in your distribution and accessible at runtime ๐ฆ.
Note: JetInfero requires direct access to the GPU/CPU and is not recommended for use inside a virtual machine.
Integrate JetInfero into your Delphi project:
uses
JetInfero;
var
LTokensPerSec: Double;
LTotalInputTokens: Int32;
LTotalOutputTokens: Int32;
begin
if jiInit() then
begin
jiDefineModel(
'C:/LLM/GGUF/Dolphin3.0-Llama3.1-8B-Q4_K_M.gguf',
'Dolphin3.0-Llama3.1-8B-Q4_K_M.gguf',
'<|im_start|>{role}\n{content}<|im_end|>',
'<|im_start|>assistant', False, 8192, -1, -1, 4);
jiLoadModel('Dolphin3.0-Llama3.1-8B-Q4_K_M.gguf');
jiAddMessage('user', 'What is AI?');
if jiRunInference(PWideChar(LModelRef)) then
begin
jiGetPerformanceResult(@LTokensPerSec, @LTotalInputTokens, @LTotalOutputTokens);
WriteLn('Input Tokens : ', LTotalInputTokens);
WriteLn('Output Tokens: ', LTotalOutputTokens);
WriteLn('Speed : ', LTokensPerSec:3:2, ' t/s');
end
else
begin
WriteLn('Error: ', jiGetLastError());
end;
jiUnloadModel();
jiQuit();
end;
end.
Define a custom callback to handle token streaming:
procedure InferenceCallback(const Token: string; const UserData: Pointer);
begin
Write(Token);
end;
jiSetInferenceTokenCallback(@InferenceCallback, nil);
Access performance results to monitor efficiency:
var
Metrics: TPerformanceResult;
begin
Metrics := jiGetPerformanceResult();
WriteLn('Tokens/Sec: ', Metrics.TokensPerSecond);
WriteLn('Input Tokens: ', Metrics.TotalInputTokens);
WriteLn('Output Tokens: ', Metrics.TotalOutputTokens);
end;
- Report issues via the Issue Tracker ๐.
- Engage in discussions on the Forum and Discord ๐ฌ.
- Learn more at Learn Delphi ๐.
Contributions to โจ JetInfero are highly encouraged! ๐
- ๐ Report Issues: Submit issues if you encounter bugs or need help.
- ๐ก Suggest Features: Share your ideas to make Lumina even better.
- ๐ง Create Pull Requests: Help expand the capabilities and robustness of the library.
Your contributions make a difference! ๐โจ
JetInfero is distributed under the ๐ BSD-3-Clause License, allowing for redistribution and use in both source and binary forms, with or without modification, under specific conditions. See the LICENSE file for more details.
Elevate your Delphi projects with JetInfero ๐ โ your bridge to seamless local generative AI integration ๐ค.