Skip to content

tinyBigGAMES/JetInfero

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

12 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

JetInfero
Chat on Discord Follow on Bluesky

๐ŸŒŸ Fast, Flexible Local LLM Inference for Developers ๐Ÿš€

JetInfero is a nimble and high-performance library that enables developers to integrate local Large Language Models (LLMs) effortlessly into their applications. Powered by llama.cpp ๐Ÿ•Š๏ธ, JetInfero prioritizes speed, flexibility, and ease of use ๐ŸŒ. Itโ€™s compatible with any language supporting Win64, Unicode, and dynamic-link libraries (DLLs).

๐Ÿ’ก Why Choose JetInfero?

  • Optimized for Speed โšก๏ธ: Built on llama.cpp, JetInfero offers lightning-fast inference capabilities with minimal overhead.
  • Cross-Language Support ๐ŸŒ: Seamlessly integrates with Delphi, C++, C#, Java, and other Win64-compatible environments.
  • Intuitive API ๐Ÿ”ฌ: A clean procedural API simplifies model management, inference execution, and callback handling.
  • Customizable Templates ๐Ÿ–‹๏ธ: Tailor input prompts to suit different use cases with ease.
  • Scalable Performance ๐Ÿš€: Leverage GPU acceleration, token streaming, and multi-threaded execution for demanding workloads.

๐Ÿ› ๏ธ Key Features

๐Ÿค– Advanced AI Integration

JetInfero expands your toolkit with capabilities such as:

  • Dynamic chatbot creation ๐Ÿ—ฃ๏ธ.
  • Automated text generation ๐Ÿ”„ and summarization ๐Ÿ•ป.
  • Context-aware content creation ๐ŸŒ.
  • Real-time token streaming for adaptive applications โŒš.

๐Ÿ”’ Privacy-Centric Local Execution

  • Operates entirely offline ๐Ÿ”, ensuring sensitive data remains secure.
  • GPU acceleration supported via Vulkan for enhanced performance ๐Ÿš’.

โš™๏ธ Performance Optimization

  • Configure GPU utilization with AGPULayers ๐Ÿ”„.
  • Allocate threads dynamically using AMaxThreads ๐ŸŒ.
  • Access performance metrics to monitor throughput and efficiency ๐Ÿ“Š.

๐Ÿ”€ Flexible Prompt Templates

JetInferoโ€™s template system simplifies input customization. Templates include placeholders such as:

  • {role}: Denotes the senderโ€™s role (e.g., user, assistant).
  • {content}: Represents the message content.

For example:

  jiDefineModel(
    // Model Filename
    'C:/LLM/GGUFDolphin3.0-Llama3.1-8B-Q4_K_M.gguf', 
    
    // Model Refname
    'Dolphin3.0-Llama3.1-8B-Q4_K_M.gguf',                  
    
     // Model Template
    '<|im_start|>{role}\n{content}<|im_end|>',                
    
     // Model Template End
    '<|im_start|>assistant',                                
    
    // Capitalize Role
    False,                                                     
    
    // Max Context
    8192,                                                      
    
    // Main GPU, -1 for best, 0..N GPU number
    -1,                                                        
    
    // GPU Layers, -1 for max, 0 for CPU only, 1..N for layer
    -1,                                                        
    
    // Max threads, default 4, max will be physical CPU count     
     4                                                         
  );

Template Benefits

  • Adaptability ๐ŸŒ: Customize prompts for various LLMs and use cases.
  • Consistency ๐Ÿ”„: Ensure predictable inputs for reliable results.
  • Flexibility ๐ŸŒˆ: Modify prompt formats for tasks like JSON or markdown generation.

๐Ÿ‚ Streamlined Model Management

  • Define models with jiDefineModel ๐Ÿ”จ.
  • Load/unload models dynamically using jiLoadModel and jiUnloadModel ๐Ÿ”€.
  • Save/load model configurations with jiSaveModelDefines and jiLoadModelDefines ๐Ÿ—ƒ๏ธ.
  • Clear all model definitions using jiClearModelDefines ๐Ÿงน.

๐Ÿ” Inference Execution

  • Perform inference tasks with jiRunInference โš™๏ธ.
  • Stream real-time tokens via InferenceTokenCallback โŒš.
  • Retrieve responses using jiGetInferenceResponse ๐Ÿ–Š๏ธ.

๐Ÿ“Š Performance Monitoring

  • Retrieve detailed metrics like tokens/second, input/output token counts, and execution time via jiGetPerformanceResult ๐Ÿ“Š.

๐Ÿ› ๏ธ Installation

  1. Download the Repository ๐Ÿ“ฆ

    • Download here and extract the files to your preferred directory ๐Ÿ“‚.

    Ensure JetInfero.dll is accessible in your project directory.

  2. Acquire a GGUF Model ๐Ÿง 

    • Obtain a model from Hugging Face, such as Dolphin3.0-Llama3.1-8B-Q4_K_M-GGUF, a good general purpose model. You can download directly from our Hugging Face account. See the model card for more information.
    • Save it to a directory accessible to your application (e.g., C:/LLM/GGUF) ๐Ÿ’พ.
  3. Add JetInfero to Your Project ๐Ÿ”จ

    • Include the JetInfero unit in your Delphi project.
  4. Ensure GPU Compatibility ๐ŸŽฎ

    • Verify Vulkan compatibility for enhanced performance โšก. Adjust AGPULayers as needed to accommodate VRAM limitations ๐Ÿ“‰.
  5. Building JetInfero DLL ๐Ÿ› ๏ธ

    • Open and compile the JetInfero.dproj project ๐Ÿ“‚. This process will generate the 64-bit JetInfero.dll in the lib folder ๐Ÿ—‚๏ธ.
    • The project was created and tested using Delphi 12.2 on Windows 11 24H2 ๐Ÿ–ฅ๏ธ.
  6. Using JetInfero ๐Ÿš€

    • JetInfero can be used with any programming language that supports Win64 and Unicode bindings ๐Ÿ’ป.
    • Ensure the JetInfero.dll is included in your distribution and accessible at runtime ๐Ÿ“ฆ.

Note: JetInfero requires direct access to the GPU/CPU and is not recommended for use inside a virtual machine.

๐Ÿ“ˆ Quick Start

โš™๏ธ Basic Setup

Integrate JetInfero into your Delphi project:

uses
  JetInfero;

var
  LTokensPerSec: Double;
  LTotalInputTokens: Int32;
  LTotalOutputTokens: Int32;
begin
  if jiInit() then
  begin
    jiDefineModel(
      'C:/LLM/GGUF/Dolphin3.0-Llama3.1-8B-Q4_K_M.gguf',
      'Dolphin3.0-Llama3.1-8B-Q4_K_M.gguf',
      '<|im_start|>{role}\n{content}<|im_end|>',
      '<|im_start|>assistant', False, 8192, -1, -1, 4);
    
    jiLoadModel('Dolphin3.0-Llama3.1-8B-Q4_K_M.gguf');

    jiAddMessage('user', 'What is AI?');

    if jiRunInference(PWideChar(LModelRef)) then
      begin
        jiGetPerformanceResult(@LTokensPerSec, @LTotalInputTokens, @LTotalOutputTokens);
        WriteLn('Input Tokens : ', LTotalInputTokens);
        WriteLn('Output Tokens: ', LTotalOutputTokens);
        WriteLn('Speed        : ', LTokensPerSec:3:2, ' t/s');
      end
    else
      begin
        WriteLn('Error: ', jiGetLastError());
      end;

    jiUnloadModel();
    jiQuit();
  end;

end.

๐Ÿ” Using Callbacks

Define a custom callback to handle token streaming:

procedure InferenceCallback(const Token: string; const UserData: Pointer);
begin
  Write(Token);
end;

jiSetInferenceTokenCallback(@InferenceCallback, nil);

๐Ÿ“Š Retrieve Performance Metrics

Access performance results to monitor efficiency:

var
  Metrics: TPerformanceResult;
begin
  Metrics := jiGetPerformanceResult();
  WriteLn('Tokens/Sec: ', Metrics.TokensPerSecond);
  WriteLn('Input Tokens: ', Metrics.TotalInputTokens);
  WriteLn('Output Tokens: ', Metrics.TotalOutputTokens);
end;

๐Ÿ› ๏ธ Support and Resources

๐Ÿค Contributing

Contributions to โœจ JetInfero are highly encouraged! ๐ŸŒŸ

  • ๐Ÿ› Report Issues: Submit issues if you encounter bugs or need help.
  • ๐Ÿ’ก Suggest Features: Share your ideas to make Lumina even better.
  • ๐Ÿ”ง Create Pull Requests: Help expand the capabilities and robustness of the library.

Your contributions make a difference! ๐Ÿ™Œโœจ

Contributors ๐Ÿ‘ฅ๐Ÿค


๐Ÿ“œ Licensing

JetInfero is distributed under the ๐Ÿ†“ BSD-3-Clause License, allowing for redistribution and use in both source and binary forms, with or without modification, under specific conditions. See the LICENSE file for more details.


Elevate your Delphi projects with JetInfero ๐Ÿš€ โ€“ your bridge to seamless local generative AI integration ๐Ÿค–.

Delphi

Made with โค๏ธ in Delphi