-
Notifications
You must be signed in to change notification settings - Fork 30
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
2 changed files
with
76 additions
and
73 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,106 +1,109 @@ | ||
# llms-txt | ||
--- | ||
title: "The /llms.txt file" | ||
author: "Jeremy Howard" | ||
description: "A proposal to standardise on using an `/llms.txt` file to provide information to help LLMs use a website." | ||
image: "/logo.png" | ||
--- | ||
|
||
## Background | ||
|
||
<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! --> | ||
Today websites are not just used to provide information to people, but they are also used to provide information to large language models. For instance, language models are often used to enhance development environments used by coders, with many systems including an option to ingest information about programming libraries and APIs from website documentation. | ||
|
||
This file will become your README and also the index of your | ||
documentation. | ||
Providing information for language models is a little different to providing information for humans, although there is plenty of overlap. Language models generally like to have information in a more concise form. This can be more similar to what a human expert would want to read. Language models can ingest a lot of information quickly, so it can be helpful to have a single place where all of the key information can be collated. | ||
|
||
## Developer Guide | ||
## Proposal | ||
|
||
If you are new to using `nbdev` here are some useful pointers to get you | ||
started. | ||
Therefore, we propose that those interested in providing LLM-friendly content add a `/llms.txt` file to their site. This is a markdown file that provides brief background information and guidance, along with links to markdown files (which can also link to external sites) providing more detailed information. This can be used, for instance, in order to provide information necessary for coders to use a library, or as part of research to learn about a person or organization and so forth. | ||
|
||
### Setup | ||
llms.txt markdown is human and LLM readable, but is also in a precise format allowing fixed processing methods. For instance, there is an [llms-txt](https://answerdotai.github.io/llms-txt/intro.html) project providing a CLI and Python module for parsing llms.txt files and generating LLM context from them. | ||
|
||
It can be helpful to have a dedicated environment for development. Here | ||
we are assuming that you have an conda environment file called `env.yml` | ||
named after `llms_txt` i.e.: | ||
We furthermore propose that pages on websites that have information that might be useful for LLMs to read provide a clean markdown version of those pages at the same URL as the original page, but with `.md` appended. (URLs without file names should append `index.html.md` instead.) | ||
|
||
``` yaml | ||
# env.yml | ||
name: llms_txt | ||
The [FastHTML project](https://fastht.ml) follows these two proposals for its documentation. For instance, here is the [FastHTML docs llms.txt](https://docs.fastht.ml/llms.txt). And here is an example of a [regular HTML docs page](https://docs.fastht.ml/tutorials/by_example.html), along with exact same URL but with [a .md extension](https://docs.fastht.ml/tutorials/by_example.html.md). Note that all [nbdev](https://nbdev.fast.ai/) projects now create .md versions of all pages by default, and all Answer.AI and fast.ai software projects using nbdev have had their docs regenerated with this feature---for instance, see the [markdown version](https://fastcore.fast.ai/docments.html.md) of [fastcore's docments module](https://fastcore.fast.ai/docments.html). | ||
|
||
channels: | ||
- fastai | ||
This proposal does not include any particular recommendation for how to process the file, since it will depend on the application. For example, FastHTML automatically builds a new version of two markdown files including the contents of the linked URLs, using an XML-based structure suitable for use in LLMs such as Claude. The two files are: [llms-ctx.txt](https://docs.fastht.ml/llms-ctx.txt), which does not include the optional URLs, and [llms-ctx-full.txt](https://docs.fastht.ml/llms-ctx-full.txt), which does include them. They are created using the [`llms_txt2ctx`](https://llmstxt.org/intro.html#cli) command line application. | ||
|
||
dependencies: | ||
- fastai::nbdev>=2.3.12 | ||
# - python>=3.11 # specify python version if required | ||
# - dependency 1 | ||
# - dependency 2 | ||
# - pip | ||
# pip: | ||
# - pip dependency 1 | ||
# - pip dependency 2 | ||
``` | ||
llms.txt files can be used in various scenarios. For software libraries, they can provide a structured overview of documentation, making it easier for LLMs to locate specific features or usage examples. In corporate websites, they can outline organizational structure and key information sources. Information about new legislation and necessary background and context could be curated in an llms.txt file to help stakeholders understand it. | ||
|
||
You can then use `conda` or `mamba` (faster at resolving) to create and | ||
update your environment file should your needs change as you work on | ||
`llms_txt` | ||
llms.txt files can be adapted for various domains. Personal portfolio or CV websites could use them to help answer questions about an individual. In e-commerce, they could outline product categories and policies. Educational institutions might use them to summarize course offerings and resources. | ||
|
||
``` sh | ||
# create a conda environment for working on llms-txt | ||
$ mamba env create -f env.yml | ||
## Format | ||
|
||
# update conda environment | ||
$ mamba env update -n llms_txt --file env.yml | ||
``` | ||
At the moment the most widely and easily understood format for language models is Markdown. Simply showing where key Markdown files can be found is a great first step. Providing some basic structure helps a language model to find where the information it needs can come from. | ||
|
||
### Install llms_txt in Development mode | ||
The llms.txt file is unusual in that it uses Markdown to structure the information rather than a classic structured format such as XML. The reason for this is that we expect many of these files to be read by language models and agents. Having said that, the information in llms.txt follows a specific format and can be read using standard programmatic-based tools. | ||
|
||
``` sh | ||
# activate conda environment | ||
$ conda activate llms_txt | ||
The llms.txt file spec is for files located in the root path `/llms.txt` of a website (or, optionally, in a subpath). A file following the spec contains the following sections as markdown, in the specific order: | ||
|
||
# make sure llms_txt package is installed in development mode | ||
$ pip install -e . | ||
- An H1 with the name of the project or site. This is the only required section | ||
- A blockquote with a short summary of the project, containing key information necessary for understanding the rest of the file | ||
- Zero or more markdown sections (e.g. paragraphs, lists, etc) of any type except headings, containing more detailed information about the project and how to interpret the provided files | ||
- Zero or more markdown sections delimited by H2 headers, containing "file lists" of URLs where further detail is available | ||
- Each "file list" is a markdown list, containing a required markdown hyperlink `[name](url)`, then optionally a `:` and notes about the file. | ||
|
||
# make changes under nbs/ directory | ||
# ... | ||
Here is a mock example: | ||
|
||
# compile to have changes apply to llms_txt | ||
$ nbdev_prepare | ||
``` | ||
```markdown | ||
# Title | ||
|
||
## Usage | ||
> Optional description goes here | ||
|
||
### Installation | ||
Optional details go here | ||
|
||
Install latest from the GitHub | ||
[repository](https://github.com/AnswerDotAI/llms-txt): | ||
## Section name | ||
|
||
``` sh | ||
$ pip install git+https://github.com/AnswerDotAI/llms-txt.git | ||
``` | ||
- [Link title](https://link_url): Optional link details | ||
|
||
or from [conda](https://anaconda.org/AnswerDotAI/llms-txt) | ||
## Optional | ||
|
||
``` sh | ||
$ conda install -c AnswerDotAI llms_txt | ||
- [Link title](https://link_url) | ||
``` | ||
|
||
or from [pypi](https://pypi.org/project/llms-txt/) | ||
Note that the "Optional" section has a special meaning---if it's included, the URLs provided there can be skipped if a shorter context is needed. Use it for secondary information which can often be skipped. | ||
|
||
``` sh | ||
$ pip install llms_txt | ||
``` | ||
## Existing standards | ||
|
||
llms.txt is designed to coexist with current web standards. While sitemaps list all pages for search engines, llms.txt offers a curated overview for LLMs. It can complement robots.txt by providing context for allowed content. The file can also reference structured data markup used on the site, helping LLMs understand how to interpret this information in context. | ||
|
||
The approach of standardising on a path for the file follows the approach of `/robots.txt` and `/sitemap.xml`. robots.txt and llms.txt have different purposes — llms.txt information would generally be explicitly requested by a human for a particular task, to have a language model help them use the information on a website. On the other hand, robots.txt is generally used to let automated tools what access to a site is considered acceptable. | ||
|
||
sitemap.xml is a list of all the indexable human-readable information available on a site. This isn’t a substitute for llms.txt since it: | ||
|
||
- Often won’t have the LLM-readable versions of pages listed | ||
- Doesn’t include URLs to external sites, even although they might be helpful to understand the information | ||
- Will generally cover documents that in aggregate will be too large to fit in an LLM context window, and will include a lot of information that isn’t necessary to understand the site. | ||
|
||
## Example | ||
|
||
### Documentation | ||
Here’s an example of llms.txt, in this case a cut down version of the file used for the FastHTML project: | ||
|
||
Documentation can be found hosted on this GitHub | ||
[repository](https://github.com/AnswerDotAI/llms-txt)’s | ||
[pages](https://AnswerDotAI.github.io/llms-txt/). Additionally you can | ||
find package manager specific guidelines on | ||
[conda](https://anaconda.org/AnswerDotAI/llms-txt) and | ||
[pypi](https://pypi.org/project/llms-txt/) respectively. | ||
```markdown | ||
# FastHTML | ||
|
||
## How to use | ||
> FastHTML is a python library which brings together Starlette, Uvicorn, HTMX, and fastcore's `FT` "FastTags" into a library for creating server-rendered hypermedia applications. | ||
|
||
Fill me in please! Don’t forget code examples: | ||
Important notes: | ||
|
||
``` python | ||
1+1 | ||
- Although parts of its API are inspired by FastAPI, it is *not* compatible with FastAPI syntax and is not targeted at creating API services | ||
- FastHTML is compatible with JS-native web components and any vanilla JS library, but not with React, Vue, or Svelte. | ||
|
||
## Docs | ||
|
||
- [FastHTML quick start](https://docs.fastht.ml/tutorials/quickstart_for_web_devs.html.md): A brief overview of many FastHTML features | ||
- [HTMX reference](https://raw.githubusercontent.com/bigskysoftware/reference.md): Brief description of all HTMX attributes, CSS classes, headers, events, extensions, js lib methods, and config options | ||
|
||
## Examples | ||
|
||
- [Todo list application](https://raw.githubusercontent.com/AnswerDotAI/fasthtml/adv_app.py): Detailed walk-thru of a complete CRUD app in FastHTML showing idiomatic use of FastHTML and HTMX patterns. | ||
|
||
## Optional | ||
|
||
- [Starlette full documentation](https://gist.githubusercontent.com/jph00/starlette-sml.md): A subset of the Starlette documentation useful for FastHTML development. | ||
``` | ||
|
||
2 | ||
To create effective llms.txt files, consider these guidelines: Use concise, clear language. When linking to resources, include brief, informative descriptions. Avoid ambiguous terms or unexplained jargon. Run a tool that expands your llms.txt file into an LLM context file and test a number of language models to see if they can answer questions about your content. | ||
|
||
## Next steps | ||
|
||
The llms.txt specification is open for community input. A [GitHub repository](https://github.com/AnswerDotAI/llms-txt) hosts [this informal overview](https://github.com/AnswerDotAI/llms-txt/blob/main/nbs/index.md), allowing for version control and public discussion. A community discord channel is available for sharing implementation experiences and discussing best practices. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters