Releases: meta-llama/llama-stack
Releases · meta-llama/llama-stack
v0.0.61
What's Changed
- add NVIDIA NIM inference adapter by @mattf in #355
- Tgi fixture by @dineshyv in #519
- fixes tests & move braintrust api_keys to request headers by @yanxi0830 in #535
- allow env NVIDIA_BASE_URL to set NVIDIAConfig.url by @mattf in #531
- move playground ui to llama-stack repo by @yanxi0830 in #536
- fix[documentation]: Update links to point to correct pages by @sablair in #549
- Fix URLs to Llama Stack Read the Docs Webpages by @JeffreyLind3 in #547
- Fix Zero to Hero README.md Formatting by @JeffreyLind3 in #546
- Guide readme fix by @raghotham in #552
- Fix broken Ollama link by @aidando73 in #554
- update client cli docs by @dineshyv in #560
- reduce the accuracy requirements to pass the chat completion structured output test by @mattf in #522
- removed assertion in ollama.py and fixed typo in the readme by @wukaixingxp in #563
- Cerebras Inference Integration by @henrytwo in #265
- unregister API for dataset by @sixianyi0721 in #507
- [llama stack ui] add native eval & inspect distro & playground pages by @yanxi0830 in #541
- Telemetry API redesign by @dineshyv in #525
- Introduce GitHub Actions Workflow for Llama Stack Tests by @ConnorHack in #523
- specify the client version that works for current together server by @jeffxtang in #566
- remove unused telemetry related code by @dineshyv in #570
- Fix up safety client for versioned API by @stevegrubb in #573
- Add eval/scoring/datasetio API providers to distribution templates & UI developer guide by @yanxi0830 in #564
- Add ability to query and export spans to dataset by @dineshyv in #574
- Renames otel config from jaeger to otel by @codefromthecrypt in #569
- add telemetry docs by @dineshyv in #572
- Console span processor improvements by @dineshyv in #577
- doc: quickstart guide errors by @aidando73 in #575
- Add kotlin docs by @Riandy in #568
- Update android_sdk.md by @Riandy in #578
- Bump kotlin docs to 0.0.54.1 by @Riandy in #579
- Make LlamaStackLibraryClient work correctly by @ashwinb in #581
- Update integration type for Cerebras to hosted by @henrytwo in #583
- Use customtool's get_tool_definition to remove duplication by @jeffxtang in #584
- [#391] Add support for json structured output for vLLM by @aidando73 in #528
- Fix Jaeger instructions by @yurishkuro in #580
- fix telemetry import by @yanxi0830 in #585
- update template run.yaml to include openai api key for braintrust by @yanxi0830 in #590
- add tracing to library client by @dineshyv in #591
- Fixes for library client by @ashwinb in #587
- Fix issue 586 by @yanxi0830 in #594
New Contributors
- @sablair made their first contribution in #549
- @JeffreyLind3 made their first contribution in #547
- @aidando73 made their first contribution in #554
- @henrytwo made their first contribution in #265
- @sixianyi0721 made their first contribution in #507
- @ConnorHack made their first contribution in #523
- @yurishkuro made their first contribution in #580
Full Changelog: v0.0.55...v0.0.61
v0.0.55 release
Llama Stack 0.0.54 Release
What's Changed
- Bugfixes release on top of 0.0.53
- Don't depend on templates.py when print llama stack build messages by @ashwinb in #496
- Restructure docs by @dineshyv in #494
- Since we are pushing for HF repos, we should accept them in inference configs by @ashwinb in #497
- Fix fp8 quantization script. by @liyunlu0618 in #500
- use logging instead of prints by @dineshyv in #499
New Contributors
- @liyunlu0618 made their first contribution in #500
Full Changelog: v0.0.53...v0.0.54
Llama Stack 0.0.53 Release
🚀 Initial Release Notes for Llama Stack!
Added
- Resource-oriented design for models, shields, memory banks, datasets and eval tasks
- Persistence for registered objects with distribution
- Ability to persist memory banks created for FAISS
- PostgreSQL KVStore implementation
- Environment variable placeholder support in run.yaml files
- Comprehensive Zero-to-Hero notebooks and quickstart guides
- Support for quantized models in Ollama
- Vision models support for Together, Fireworks, Meta-Reference, and Ollama, and vLLM
- Bedrock distribution with safety shields support
- Evals API with task registration and scoring functions
- MMLU and SimpleQA benchmark scoring functions
- Huggingface dataset provider integration for benchmarks
- Support for custom dataset registration from local paths
- Benchmark evaluation CLI tools with visualization tables
- RAG evaluation scoring functions and metrics
- Local persistence for datasets and eval tasks
Changed
- Split safety into distinct providers (llama-guard, prompt-guard, code-scanner)
- Changed provider naming convention (
impls
→inline
,adapters
→remote
) - Updated API signatures for dataset and eval task registration
- Restructured folder organization for providers
- Enhanced Docker build configuration
- Added version prefixing for REST API routes
- Enhanced evaluation task registration workflow
- Improved benchmark evaluation output formatting
- Restructured evals folder organization for better modularity
Removed
llama stack configure
command
What's Changed
- Update download command by @Wauplin in #9
- Update fbgemm version by @jianyuh in #12
- Add CLI reference docs by @dltn in #14
- Added Ollama as an inference impl by @hardikjshah in #20
- Hide older models by @dltn in #23
- Introduce Llama stack distributions by @ashwinb in #22
- Rename inline -> local by @dltn in #24
- Avoid using nearly double the memory needed by @ashwinb in #30
- Updates to prompt for tool calls by @hardikjshah in #29
- RFC-0001-The-Llama-Stack by @raghotham in #8
- Add API keys to AgenticSystemConfig instead of relying on dotenv by @ashwinb in #33
- update cli ref doc by @jeffxtang in #34
- fixed bug in download not enough disk space condition by @sisminnmaw in #35
- Updated cli instructions with additonal details for each subcommands by @varunfb in #36
- Updated URLs and addressed feedback by @varunfb in #37
- Fireworks basic integration by @benjibc in #39
- Together AI basic integration by @Nutlope in #43
- Update LICENSE by @raghotham in #47
- Add patch for SSE event endpoint responses by @dltn in #50
- API Updates: fleshing out RAG APIs, introduce "llama stack" CLI command by @ashwinb in #51
- [inference] Add a TGI adapter by @ashwinb in #52
- upgrade llama_models by @benjibc in #55
- Query generators for RAG query by @hardikjshah in #54
- Add Chroma and PGVector adapters by @ashwinb in #56
- API spec update, client demo with Stainless SDK by @yanxi0830 in #58
- Enable Bing search by @hardikjshah in #59
- add safety to openapi spec by @yanxi0830 in #62
- Add config file based CLI by @yanxi0830 in #60
- Simplified Telemetry API and tying it to logger by @ashwinb in #57
- [Inference] Use huggingface_hub inference client for TGI adapter by @hanouticelina in #53
- Support
data:
in URL for memory. Add ootb support for pdfs by @hardikjshah in #67 - Remove request wrapper migration by @yanxi0830 in #64
- CLI Update: build -> configure -> run by @yanxi0830 in #69
- API Updates by @ashwinb in #73
- Unwrap ChatCompletionRequest for context_retriever by @yanxi0830 in #75
- CLI - add back build wizard, configure with name instead of build.yaml by @yanxi0830 in #74
- CLI: add build templates support, move imports by @yanxi0830 in #77
- fix prompt with name args by @yanxi0830 in #80
- Fix memory URL parsing by @yanxi0830 in #81
- Allow TGI adaptor to have non-standard llama model names by @hardikjshah in #84
- [API Updates] Model / shield / memory-bank routing + agent persistence + support for private headers by @ashwinb in #92
- Bedrock Guardrails comiting after rebasing the fork by @rsgrewal-aws in #96
- Bedrock Inference Integration by @poegej in #94
- Support for Llama3.2 models and Swift SDK by @ashwinb in #98
- fix safety using inference by @yanxi0830 in #99
- Fixes typo for setup instruction for starting Llama Stack Server section by @abhishekmishragithub in #103
- Make TGI adapter compatible with HF Inference API by @Wauplin in #97
- Fix links & format by @machina-source in #104
- docs: fix typo by @dijonkitchen in #107
- LG safety fix by @kplawiak in #108
- Minor typos, HuggingFace -> Hugging Face by @marklysze in #113
- Reordered pip install and llama model download by @KarthiDreamr in #112
- Update getting_started.ipynb by @delvingdeep in #117
- fix: 404 link to agentic system repository by @moldhouse in #118
- Fix broken links in RFC-0001-llama-stack.md by @bhimrazy in #134
- Validate
name
inllama stack build
by @russellb in #128 - inference: Fix download command in error msg by @russellb in #133
- configure: Fix a error msg typo by @russellb in #131
- docs: Note how to use podman by @russellb in #130
- add env for LLAMA_STACK_CONFIG_DIR by @yanxi0830 in #137
- [bugfix] fix duplicate api endpoints by @yanxi0830 in #139
- Use inference APIs for executing Llama Guard by @ashwinb in #121
- fixing safety inference and safety adapter for new API spec. Pinned t… by @yogishbaliga in #105
- [CLI] remove dependency on CONDA_PREFIX in CLI by @yanxi0830 in #144
- [bugfix] fix #146 by @yanxi0830 in #147
- Extract provider data properly (attempt 2) by @ashwinb in #148
is_multimodal
acceptscore_model_id
not model itself. by @wizardbc in #153- fix broken bedrock inference provider by @moritalous in #151
- Fix podman+selinux compatibility by @russellb in #132
- docker: Install in editable mode for dev purposes by @russellb in #160
- [CLI] simplify docker run by @yanxi0830 in #159
- Add a RoutableProvider protocol, support for multiple routing keys by @ashwinb in #163
- docker: Check for selinux before using
--security-opt
by @russellb in #167 - Adds markdown-link-check and fixes a broken link by @codefromthecrypt in #165
- [bugfix] conda path lookup by @yanxi0830 in #179
- fix prompt guard by @ashwinb in #177
- inference: Add model option to client by @russellb in #17...