GitHub - tinyBigGAMES/JetInfero: Local LLM Inference Library

🌟 Fast, Flexible Local LLM Inference for Developers 🚀

JetInfero is a nimble and high-performance library that enables developers to integrate local Large Language Models (LLMs) effortlessly into their applications. Powered by llama.cpp 🕊️, JetInfero prioritizes speed, flexibility, and ease of use 🌐. It’s compatible with any language supporting Win64, Unicode, and dynamic-link libraries (DLLs).

💡 Why Choose JetInfero?

Optimized for Speed ⚡️: Built on llama.cpp, JetInfero offers lightning-fast inference capabilities with minimal overhead.
Cross-Language Support 🌐: Seamlessly integrates with Delphi, C++, C#, Java, and other Win64-compatible environments.
Intuitive API 🔬: A clean procedural API simplifies model management, inference execution, and callback handling.
Customizable Templates 🖋️: Tailor input prompts to suit different use cases with ease.
Scalable Performance 🚀: Leverage GPU acceleration, token streaming, and multi-threaded execution for demanding workloads.

🛠️ Key Features

🤖 Advanced AI Integration

JetInfero expands your toolkit with capabilities such as:

Dynamic chatbot creation 🗣️.
Automated text generation 🔄 and summarization 🕻.
Context-aware content creation 🌐.
Real-time token streaming for adaptive applications ⌚.

🔒 Privacy-Centric Local Execution

Operates entirely offline 🔐, ensuring sensitive data remains secure.
GPU acceleration supported via Vulkan for enhanced performance 🚒.

⚙️ Performance Optimization

Configure GPU utilization with AGPULayers 🔄.
Allocate threads dynamically using AMaxThreads 🌐.
Access performance metrics to monitor throughput and efficiency 📊.

🔀 Flexible Prompt Templates

JetInfero’s template system simplifies input customization. Templates include placeholders such as:

{role}: Denotes the sender’s role (e.g., user, assistant).
{content}: Represents the message content.

For example:

  jiDefineModel(
    // Model Filename
    'C:/LLM/GGUFDolphin3.0-Llama3.1-8B-Q4_K_M.gguf', 
    
    // Model Refname
    'Dolphin3.0-Llama3.1-8B-Q4_K_M.gguf',                  
    
     // Model Template
    '<|im_start|>{role}\n{content}<|im_end|>',                
    
     // Model Template End
    '<|im_start|>assistant',                                
    
    // Capitalize Role
    False,                                                     
    
    // Max Context
    8192,                                                      
    
    // Main GPU, -1 for best, 0..N GPU number
    -1,                                                        
    
    // GPU Layers, -1 for max, 0 for CPU only, 1..N for layer
    -1,                                                        
    
    // Max threads, default 4, max will be physical CPU count     
     4                                                         
  );

Template Benefits

Adaptability 🌐: Customize prompts for various LLMs and use cases.
Consistency 🔄: Ensure predictable inputs for reliable results.
Flexibility 🌈: Modify prompt formats for tasks like JSON or markdown generation.

🍂 Streamlined Model Management

Define models with jiDefineModel 🔨.
Load/unload models dynamically using jiLoadModel and jiUnloadModel 🔀.
Save/load model configurations with jiSaveModelDefines and jiLoadModelDefines 🗃️.
Clear all model definitions using jiClearModelDefines 🧹.

🔁 Inference Execution

Perform inference tasks with jiRunInference ⚙️.
Stream real-time tokens via InferenceTokenCallback ⌚.
Retrieve responses using jiGetInferenceResponse 🖊️.

📊 Performance Monitoring

Retrieve detailed metrics like tokens/second, input/output token counts, and execution time via jiGetPerformanceResult 📊.

🛠️ Installation

Download the Repository 📦
- Download here and extract the files to your preferred directory 📂.
Ensure JetInfero.dll is accessible in your project directory.
Acquire a GGUF Model 🧠
- Obtain a model from Hugging Face, such as Dolphin3.0-Llama3.1-8B-Q4_K_M-GGUF, a good general purpose model. You can download directly from our Hugging Face account. See the model card for more information.
- Save it to a directory accessible to your application (e.g., C:/LLM/GGUF) 💾.
Add JetInfero to Your Project 🔨
- Include the JetInfero unit in your Delphi project.
Ensure GPU Compatibility 🎮
- Verify Vulkan compatibility for enhanced performance ⚡. Adjust AGPULayers as needed to accommodate VRAM limitations 📉.
Building JetInfero DLL 🛠️
- Open and compile the JetInfero.dproj project 📂. This process will generate the 64-bit JetInfero.dll in the lib folder 🗂️.
- The project was created and tested using Delphi 12.2 on Windows 11 24H2 🖥️.
Using JetInfero 🚀
- JetInfero can be used with any programming language that supports Win64 and Unicode bindings 💻.
- Ensure the JetInfero.dll is included in your distribution and accessible at runtime 📦.

Note: JetInfero requires direct access to the GPU/CPU and is not recommended for use inside a virtual machine.

📈 Quick Start

⚙️ Basic Setup

Integrate JetInfero into your Delphi project:

uses
  JetInfero;

var
  LTokensPerSec: Double;
  LTotalInputTokens: Int32;
  LTotalOutputTokens: Int32;
begin
  if jiInit() then
  begin
    jiDefineModel(
      'C:/LLM/GGUF/Dolphin3.0-Llama3.1-8B-Q4_K_M.gguf',
      'Dolphin3.0-Llama3.1-8B-Q4_K_M.gguf',
      '<|im_start|>{role}\n{content}<|im_end|>',
      '<|im_start|>assistant', False, 8192, -1, -1, 4);
    
    jiLoadModel('Dolphin3.0-Llama3.1-8B-Q4_K_M.gguf');

    jiAddMessage('user', 'What is AI?');

    if jiRunInference(PWideChar(LModelRef)) then
      begin
        jiGetPerformanceResult(@LTokensPerSec, @LTotalInputTokens, @LTotalOutputTokens);
        WriteLn('Input Tokens : ', LTotalInputTokens);
        WriteLn('Output Tokens: ', LTotalOutputTokens);
        WriteLn('Speed        : ', LTokensPerSec:3:2, ' t/s');
      end
    else
      begin
        WriteLn('Error: ', jiGetLastError());
      end;

    jiUnloadModel();
    jiQuit();
  end;

end.

🔁 Using Callbacks

Define a custom callback to handle token streaming:

procedure InferenceCallback(const Token: string; const UserData: Pointer);
begin
  Write(Token);
end;

jiSetInferenceTokenCallback(@InferenceCallback, nil);

📊 Retrieve Performance Metrics

Access performance results to monitor efficiency:

var
  Metrics: TPerformanceResult;
begin
  Metrics := jiGetPerformanceResult();
  WriteLn('Tokens/Sec: ', Metrics.TokensPerSecond);
  WriteLn('Input Tokens: ', Metrics.TotalInputTokens);
  WriteLn('Output Tokens: ', Metrics.TotalOutputTokens);
end;

🛠️ Support and Resources

Report issues via the Issue Tracker 🐞.
Engage in discussions on the Forum and Discord 💬.
Learn more at Learn Delphi 📚.

🤝 Contributing

Contributions to ✨ JetInfero are highly encouraged! 🌟

🐛 Report Issues: Submit issues if you encounter bugs or need help.
💡 Suggest Features: Share your ideas to make Lumina even better.
🔧 Create Pull Requests: Help expand the capabilities and robustness of the library.

Your contributions make a difference! 🙌✨

Contributors 👥🤝

📜 Licensing

JetInfero is distributed under the 🆓 BSD-3-Clause License, allowing for redistribution and use in both source and binary forms, with or without modification, under specific conditions. See the LICENSE file for more details.

Elevate your Delphi projects with JetInfero 🚀 – your bridge to seamless local generative AI integration 🤖.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github		.github
bin		bin
examples/testbed		examples/testbed
lib		lib
media		media
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
virustotal.txt		virustotal.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌟 Fast, Flexible Local LLM Inference for Developers 🚀

💡 Why Choose JetInfero?

🛠️ Key Features

🤖 Advanced AI Integration

🔒 Privacy-Centric Local Execution

⚙️ Performance Optimization

🔀 Flexible Prompt Templates

Template Benefits

🍂 Streamlined Model Management

🔁 Inference Execution

📊 Performance Monitoring

🛠️ Installation

📈 Quick Start

⚙️ Basic Setup

🔁 Using Callbacks

📊 Retrieve Performance Metrics

🛠️ Support and Resources

🤝 Contributing

Contributors 👥🤝

📜 Licensing

Made with ❤️ in Delphi

About

Sponsor this project

Languages

License

tinyBigGAMES/JetInfero

Folders and files

Latest commit

History

Repository files navigation

🌟 Fast, Flexible Local LLM Inference for Developers 🚀

💡 Why Choose JetInfero?

🛠️ Key Features

🤖 Advanced AI Integration

🔒 Privacy-Centric Local Execution

⚙️ Performance Optimization

🔀 Flexible Prompt Templates

Template Benefits

🍂 Streamlined Model Management

🔁 Inference Execution

📊 Performance Monitoring

🛠️ Installation

📈 Quick Start

⚙️ Basic Setup

🔁 Using Callbacks

📊 Retrieve Performance Metrics

🛠️ Support and Resources

🤝 Contributing

Contributors 👥🤝

📜 Licensing

Made with ❤️ in Delphi

About

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project

Languages