Skip to content

Local code completion example, using ONNX, codegen, with a c# webserver, VSCode and Visual Studio extensions, and a interop to huggingface rust tokenizer

Notifications You must be signed in to change notification settings

SirWaffle/local-ai-code-completion

Repository files navigation

local-ai-code-completion

Local code completion example, using ONNX models, Codegen model ( or whatever other model ), with a c# webserver, VSCode and Visual Studio extensions, to get similar copilot code completion behavior, at the symbol, line, or full method level, running on your own machine!

This repo is setup to use either blingfire, or a questionably made Rust interop to the huggingface tokenizers library for managing tokenizing and converting back to text. The rust interop was put together by me ( i dont know rust ), and i basically just mashed buttons until it worked. No garuntees on safety or no memory leaks or anything,. that will improve in the future...

more info at githubs/etc

Currently a work in progress!

this is mostly a least code require, simple example that connects everything together to make your own code completion / code generator, meant to run locally with just one user ( but does support multiple users ). Extending it to act as a service for multiple users will work, but there are inefficiencies and whatnot that should be addressed for that to work well

This should act as a good jumping off point to customize to your needs, and I will improve stuff as I have time

roadmap / TODO

  • working on making an easily run VSCode extension ( currently i just launch via vscode extension debug )
  • more plugin optiosn for generating: allow symbol, line, or full function generation via options
  • Visual Studio suggestion extension
  • better behavior of plugins ( caching, etc )
  • better management of generation
  • better how to guide
  • Adding beam search, and multiple samples to the generation code
  • making post processing steps ( searches, softmax, top_k, top_p ) operate in GPU to avoid costly CPU <-> GPU copies of tensors
  • extending the rust interop to huggingface tokenizers to something less hacky and with more features exposed. Make ti thread safe, ensure no memory leaks, add cleanup of rust alloc'ed memory, all that stuff

bad code ( should be changed! )

  • webserver only allows one generation of the model at a time, enforced by a semaphor
  • hard coded paths to model locations and what not in C#, not json files
  • the rust interop was a rapid hack job, it will need improvements

premade ONNX model

I put an ONNX model and the required tokenizer.json file up on huggingface. if the model isnt there yet, im probably still uploading... :

rough directions

  • Webserver is written in visual studio 2022. Open the solution, and build and run the webserver.

  • edit the paths on the webserver to point to your model, and tokenizer.json file: \Webserver\webserver\Generators\CodeGenSingleton.cs

  • i run this via the cuda execution of ONNX, which requires about 8 GB free of VRAM. If you want to change this to CPU, see: \Webserver\genLib\Generators\CodeGenOnnx.cs

and comment out:
so.AppendExecutionProvider_CUDA(gpuDeviceId);
  • run web server

  • run web server test program, which will call the web server, generate text, and print to screen

  • run the vscode extension project, and test it out in VSCode!

  • (more to come as I get things farther along )

to make your own ONNX model

python -m transformers.onnx --model=. --feature=causal-lm onnx/
  • use netron to view your model, observe the inputs and outputs. The input names of the model should match in CodeGenOnnx.cs: https://netron.app/

About

Local code completion example, using ONNX, codegen, with a c# webserver, VSCode and Visual Studio extensions, and a interop to huggingface rust tokenizer

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published