Native llama.cpp bindings for the bare runtime, enabling efficient large language model inference from JavaScript.
This is nowhere near ready to use.
If you want to use bare-llama be prepared to join the development team 😏
This is the beginnings of an addon for the bare js runtime, which means we'll be able to use it with Pear, a js runtime that enables p2p applications.
Bare is the core runtime of Pear.
Node.js support is possible, probably not Deno or Bun. But maybe! Who knows?! If you know let me know. I have no idea.
import { LlamaModel } from 'bare-llama'
// Create and initialize text generation model
const model = await LlamaModel.create({
modelFilepath: './models/model.gguf'
})
// Generate text
const result = await model.generate('The quick brown fox')
console.log(result)
// Clean up
await model.destroy()
Generate text:
// Create a new model instance
const model = await LlamaModel.create({
modelFilepath: './path/to/model.gguf',
embedding: false // true for embedding models, false by default for text generation models
})
// Generate text
const generated = await model.generate('Once upon a time', {
temperature: 0.8,
maxTokens: 100
})
await model.destroy()
Create embeddings:
const model = await LlamaModel.create({
modelFilepath: './path/to/embeddings-model.gguf',
embedding: true
})
// Get embeddings (requires embedding: true)
const embeddings = await model.encode('Hello world')
await model.destroy()
Additional methods:
const model = await LlamaModel.create({
modelFilepath: './path/to/embeddings-model.gguf',
embedding: true
})
// Get model metadata
const metadata = await model.getMetadata()
// Tokenize text & detokenize tokens
const tokens = await model.tokenize('Hello world')
const text = await model.detokenize(tokens)
await model.destroy()
You'll have to download models yourself!
The tests are currently set up to use a smollm gguf model: https://huggingface.co/mradermacher/SmolLM-135M-Instruct-GGUF
Apache-2.0