Skip to content

Commit

Permalink
Fix minor errors at the documentations (#747)
Browse files Browse the repository at this point in the history
* fix README.md errors and grammar

* delete beta rst api docs

* fix docs index (toctree)

* langchain unstructured version change

---------

Co-authored-by: jeffrey <[email protected]>
  • Loading branch information
vkehfdl1 and jeffrey authored Sep 26, 2024
1 parent 72aae01 commit 99912c1
Show file tree
Hide file tree
Showing 11 changed files with 39 additions and 207 deletions.
42 changes: 21 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# AutoRAG

RAG AutoML tool for automatically finds an optimal RAG pipeline for your data.
RAG AutoML tool for automatically finding an optimal RAG pipeline for your data.

![Thumbnail](https://github.com/user-attachments/assets/6bab243d-a4b3-431a-8ac0-fe17336ab4de)

Expand All @@ -9,8 +9,8 @@ but you don’t know what pipeline is great for “your own data” and "your ow
Making and evaluating all RAG modules is very time-consuming and hard to do.
But without it, you will never know which RAG pipeline is the best for your own use-case.

AutoRAG is a tool for finding optimal RAG pipeline for “your data.”
You can evaluate various RAG modules automatically with your own evaluation data,
AutoRAG is a tool for finding the optimal RAG pipeline for “your data.”
You can evaluate various RAG modules automatically with your own evaluation data
and find the best RAG pipeline for your own use-case.

AutoRAG supports a simple way to evaluate many RAG module combinations.
Expand Down Expand Up @@ -111,11 +111,11 @@ modules:
chunk_method: Token
chunk_size: 1024
chunk_overlap: 24
add_file_name: english
add_file_name: en
```
You can also use multiple Chunk modules at once.
In this case, you need to use one corpus to create QA, and then map the rest of the corpus to QA Data.
In this case, you need to use one corpus to create QA and then map the rest of the corpus to QA Data.
If the chunk method is different, the retrieval_gt will be different, so we need to remap it to the QA dataset.
#### Start Chunking
Expand All @@ -137,14 +137,14 @@ You can create QA dataset with just a few lines of code.
import pandas as pd
from llama_index.llms.openai import OpenAI

from autorag.data.beta.filter.dontknow import dontknow_filter_rule_based
from autorag.data.beta.generation_gt.llama_index_gen_gt import (
from autorag.data.qa.filter.dontknow import dontknow_filter_rule_based
from autorag.data.qa.generation_gt.llama_index_gen_gt import (
make_basic_gen_gt,
make_concise_gen_gt,
)
from autorag.data.beta.schema import Raw, Corpus
from autorag.data.beta.query.llama_gen_query import factoid_query_gen
from autorag.data.beta.sample import random_single_hop
from autorag.data.qa.schema import Raw, Corpus
from autorag.data.qa.query.llama_gen_query import factoid_query_gen
from autorag.data.qa.sample import random_single_hop

llm = OpenAI()
raw_df = pd.read_parquet("your/path/to/corpus.parquet")
Expand Down Expand Up @@ -191,15 +191,15 @@ initial_qa.to_parquet('./qa.parquet', './corpus.parquet')

### 1. Set YAML File

First, you need to set the config yaml file for your RAG optimization.
First, you need to set the config YAML file for your RAG optimization.

You can get various config yaml files at [here](./sample_config).
We highly recommend using pre-made config yaml files for starter.
You can get various config YAML files at [here](./sample_config).
We highly recommend using pre-made config YAML files for starter.

If you want to make your own config yaml files, check out the [Config yaml file](#-create-your-own-config-yaml-file)
If you want to make your own config YAML files, check out the [Config YAML file](#-create-your-own-config-yaml-file)
section.

Here is an example of the config yaml file to use `retrieval`, `prompt_maker`, and `generator` nodes.
Here is an example of the config YAML file to use `retrieval`, `prompt_maker`, and `generator` nodes.

```yaml
node_lines:
Expand Down Expand Up @@ -251,13 +251,13 @@ evaluator = Evaluator(qa_data_path='your/path/to/qa.parquet', corpus_data_path='
evaluator.start_trial('your/path/to/config.yaml')
```

or you can use command line interface
or you can use the command line interface

```bash
autorag evaluate --config your/path/to/default_config.yaml --qa_data_path your/path/to/qa.parquet --corpus_data_path your/path/to/corpus.parquet
```

Once it is done, you can see several files and folders created at your current directory.
Once it is done, you can see several files and folders created in your current directory.
At the trial folder named to numbers (like 0),
you can check `summary.csv` file that summarizes the evaluation results and the best RAG pipeline for your data.

Expand All @@ -266,7 +266,7 @@ at [here](https://docs.auto-rag.com/optimization/folder_structure.html).

### 3. Run Dashboard

You can run dashboard to easily see the result.
You can run a dashboard to easily see the result.

```bash
autorag dashboard --trial_dir /your/path/to/trial_dir
Expand All @@ -280,7 +280,7 @@ autorag dashboard --trial_dir /your/path/to/trial_dir

### 4-1. Run as a CLI

You can use a found optimal RAG pipeline right away with extracted yaml file.
You can use a found optimal RAG pipeline right away with the extracted YAML file.

```python
from autorag.deploy import Runner
Expand All @@ -293,7 +293,7 @@ runner.run('your question')

You can run this pipeline as an API server.

Check out API endpoint at [here](deploy/api_endpoint.md).
Check out the API endpoint at [here](deploy/api_endpoint.md).

```python
from autorag.deploy import Runner
Expand All @@ -310,7 +310,7 @@ autorag run_api --config_path your/path/to/pipeline.yaml --host 0.0.0.0 --port 8

you can run this pipeline as a web interface.

Check out web interface at [here](deploy/web.md).
Check out the web interface at [here](deploy/web.md).

```bash
autorag run_web --trial_path your/path/to/trial_path
Expand Down
29 changes: 0 additions & 29 deletions docs/source/api_spec/autorag.data.beta.filter.rst

This file was deleted.

45 changes: 0 additions & 45 deletions docs/source/api_spec/autorag.data.beta.generation_gt.rst

This file was deleted.

37 changes: 0 additions & 37 deletions docs/source/api_spec/autorag.data.beta.query.rst

This file was deleted.

47 changes: 0 additions & 47 deletions docs/source/api_spec/autorag.data.beta.rst

This file was deleted.

21 changes: 0 additions & 21 deletions docs/source/api_spec/autorag.data.beta.schema.rst

This file was deleted.

6 changes: 2 additions & 4 deletions docs/source/data_creation/data_creation.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,12 +26,10 @@ To see the tutorial of the data creation, check [here](tutorial.md).

```{toctree}
---
maxdepth: 2
maxdepth: 1
---
tutorial.md
qa_creation/qa_creation.md
chunk/chunk.md
parse/parse.md
legacy/tutorial.md
legacy/ragas.md
legacy/legacy.md
```
13 changes: 13 additions & 0 deletions docs/source/data_creation/legacy/legacy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Legacy

This is the legacy docs.
Deprecated since v0.3.0 release.


```{toctree}
---
maxdepth: 1
---
ragas.md
tutorial.md
```
2 changes: 1 addition & 1 deletion docs/source/data_creation/qa_creation/filter.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ The supported filtering methods are below.
1. [Rule-based Don't know Filter](#rule-based-dont-know-filter)
2. [LLM-based Don't know Filter](#llm-based-dont-know-filter)

# 1. Unanswerable question filtering
## 1. Unanswerable question filtering

Sometimes LLM generates unanswerable questions from the given passage.
If unintended unanswerable questions are generated, the retrieval optimization performance will be lower.
Expand Down
2 changes: 1 addition & 1 deletion docs/source/data_creation/qa_creation/query_gen.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ In OpenAI version of data creation, you can use only 'gpt-4o-2024-08-06' and 'gp
If you want to use another model, use llama_index version instead.
```

# Question types
## Question types

1. [Factoid](#1-factoid)
2. [Concept Completion](#2-concept-completion)
Expand Down
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ gradio

### Langchain ###
langchain-core>=0.3.0
langchain_unstructured
langchain-unstructured>=0.1.5
langchain-upstage

# autorag dashboard
Expand Down

0 comments on commit 99912c1

Please sign in to comment.