Skip to content

Commit

Permalink
logo
Browse files Browse the repository at this point in the history
  • Loading branch information
jph00 committed Sep 2, 2024
1 parent bc5c9f6 commit 5739741
Show file tree
Hide file tree
Showing 5 changed files with 19 additions and 4 deletions.
2 changes: 1 addition & 1 deletion llms_txt/_modidx.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Autogenerated by nbdev

d = { 'settings': { 'branch': 'main',
'doc_baseurl': '/llms-txt',
'doc_baseurl': '/',
'doc_host': 'https://llmstxt.org',
'git_url': 'https://github.com/AnswerDotAI/llms-txt',
'lib_path': 'llms_txt'},
Expand Down
Binary file modified nbs/favicon.ico
Binary file not shown.
8 changes: 5 additions & 3 deletions nbs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,11 @@ Providing information for language models is a little different to providing inf

## Proposal

Therefore, we propose that those interested in providing LLM-friendly content add a `/llms.txt` file to their site. This is a markdown file that provides brief background information and guidance, along with links to markdown files (which can also link to external sites) providing more detailed information. This can be used, for instance, in order to provide information necessary for coders to use a library, or as part of research to learn about a person or organization and so forth.
![llms.txt logo](logo.png){.lightbox width=150px .floatr}

llms.txt markdown is human and LLM readable, but is also in a precise format allowing fixed processing methods. For instance, there is an [llms-txt](https://answerdotai.github.io/llms-txt/intro.html) project providing a CLI and Python module for parsing llms.txt files and generating LLM context from them.
We propose that those interested in providing LLM-friendly content add a `/llms.txt` file to their site. This is a markdown file that provides brief background information and guidance, along with links to markdown files (which can also link to external sites) providing more detailed information. This can be used, for instance, in order to provide information necessary for coders to use a library, or as part of research to learn about a person or organization and so forth. You are free to use the llms.txt logo on your site to indicate your support if you wish.

llms.txt markdown is human and LLM readable, but is also in a precise format allowing fixed processing methods (i.e. classical programming techniques such as parsers and regex). For instance, there is an [llms-txt](https://answerdotai.github.io/llms-txt/intro.html) project providing a CLI and Python module for parsing llms.txt files and generating LLM context from them.

We furthermore propose that pages on websites that have information that might be useful for LLMs to read provide a clean markdown version of those pages at the same URL as the original page, but with `.md` appended. (URLs without file names should append `index.html.md` instead.)

Expand Down Expand Up @@ -65,7 +67,7 @@ Note that the "Optional" section has a special meaning---if it's included, the U

llms.txt is designed to coexist with current web standards. While sitemaps list all pages for search engines, llms.txt offers a curated overview for LLMs. It can complement robots.txt by providing context for allowed content. The file can also reference structured data markup used on the site, helping LLMs understand how to interpret this information in context.

The approach of standardising on a path for the file follows the approach of `/robots.txt` and `/sitemap.xml`. robots.txt and llms.txt have different purposesllms.txt information would generally be explicitly requested by a human for a particular task, to have a language model help them use the information on a website. On the other hand, robots.txt is generally used to let automated tools what access to a site is considered acceptable.
The approach of standardising on a path for the file follows the approach of `/robots.txt` and `/sitemap.xml`. robots.txt and llms.txt have different purposes---robots.txt is generally used to let automated tools what access to a site is considered acceptable, such as for search indexing bots. On the other hand, llms.txt information will often be used on demand when a user explicitly requesting information about a topic, such as when including a coding library's documentation in a project, or when asking a chat bot with search functiontionality for information. Our expectation is that llms.txt will mainly be useful for *inference*, i.e. at the time a user is seeking assistance, as opposed to for *training*. However, perhaps if llms.txt usage becomes widespread, future training runs could take advantage of the information in llms.txt files too.

sitemap.xml is a list of all the indexable human-readable information available on a site. This isn’t a substitute for llms.txt since it:

Expand Down
Binary file modified nbs/logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
13 changes: 13 additions & 0 deletions nbs/styles.css
Original file line number Diff line number Diff line change
Expand Up @@ -35,3 +35,16 @@ div.description {
font-size: 135%;
opacity: 70%;
}

.quarto-figure:has(img.floatr) {
float:right;
margin-left: 1rem;
margin-bottom: 0.5rem;
margin-top: 0.5rem;
}

.quarto-figure:has(img.floatr) figcaption {
text-align: center;
width: 100%;
}

0 comments on commit 5739741

Please sign in to comment.