Unable to Provide insights on Overall Data - Only Taking top 5 or 7 chunks #100

Curiosity007 · 2023-05-19T15:34:53Z

.env

Generic

TEXT_EMBEDDINGS_MODEL=sentence-transformers/all-MiniLM-L6-v2
TEXT_EMBEDDINGS_MODEL_TYPE=HF # LlamaCpp or HF
USE_MLOCK=false

Ingestion

PERSIST_DIRECTORY=db
DOCUMENTS_DIRECTORY=source_documents
INGEST_CHUNK_SIZE=500
INGEST_CHUNK_OVERLAP=50
INGEST_N_THREADS=1

Generation

MODEL_TYPE=LlamaCpp # GPT4All or LlamaCpp
MODEL_PATH=eachadea/ggml-vicuna-7b-1.1/ggml-vic7b-q5_1.bin
MODEL_TEMP=0.8
MODEL_N_CTX=2048 # Max total size of prompt+answer
MODEL_MAX_TOKENS=1024 # Max size of answer
MODEL_STOP=[STOP]
CHAIN_TYPE=betterstuff
N_RETRIEVE_DOCUMENTS=100 # How many documents to retrieve from the db
N_FORWARD_DOCUMENTS=100 # How many documents to forward to the LLM, chosen among those retrieved
N_GPU_LAYERS=32

Python version

Python 3.10.10

System

Description: Ubuntu 22.04.2 LTS Release: 22.04 Codename: jammy

CASALIOY version

Latest Commit - ee9a4e5

Information

The official example scripts
My own modified scripts

Related Components

Document ingestion
GUI
Prompt answering

Reproduction

I have fed the system a 5000 line csv file, with 30 columns.

Now I asked about overall insight from the data.

I can see in the terminal, it is only seeing top 5 or 7 documents, which is nothing but single row. So, this is giving me answer based on 5 or 7 rows, and thus no actual insight is coming

Point to be noted - I have kept only 1 document in the source documents folder to avoid information overlapping

Expected behavior

Should be able to understand the pattern in the data, and suggest some insights based on it.

The text was updated successfully, but these errors were encountered:

neeewwww · 2023-05-19T15:51:10Z

Pandas and Pandas AI, might be the solution for this.

…

On Fri, May 19, 2023, 12:35 Curiosity007 ***@***.***> wrote: .env Generic TEXT_EMBEDDINGS_MODEL=sentence-transformers/all-MiniLM-L6-v2 TEXT_EMBEDDINGS_MODEL_TYPE=HF # LlamaCpp or HF USE_MLOCK=false Ingestion PERSIST_DIRECTORY=db DOCUMENTS_DIRECTORY=source_documents INGEST_CHUNK_SIZE=500 INGEST_CHUNK_OVERLAP=50 INGEST_N_THREADS=1 Generation MODEL_TYPE=LlamaCpp # GPT4All or LlamaCpp MODEL_PATH=eachadea/ggml-vicuna-7b-1.1/ggml-vic7b-q5_1.bin MODEL_TEMP=0.8 MODEL_N_CTX=2048 # Max total size of prompt+answer MODEL_MAX_TOKENS=1024 # Max size of answer MODEL_STOP=[STOP] CHAIN_TYPE=betterstuff N_RETRIEVE_DOCUMENTS=100 # How many documents to retrieve from the db N_FORWARD_DOCUMENTS=100 # How many documents to forward to the LLM, chosen among those retrieved N_GPU_LAYERS=32 Python version Python 3.10.10 System Description: Ubuntu 22.04.2 LTS Release: 22.04 Codename: jammy CASALIOY version Latest Commit - ee9a4e5 <ee9a4e5> Information - The official example scripts - My own modified scripts Related Components - Document ingestion - GUI - Prompt answering Reproduction I have fed the system a 5000 line csv file, with 30 columns. Now I asked about overall insight from the data. I can see in the terminal, it is only seeing top 5 or 7 documents, which is nothing but single row. So, this is giving me answer based on 5 or 7 rows, and thus no actual insight is coming Expected behavior Should be able to understand the pattern in the data, and suggest some insights based on it. — Reply to this email directly, view it on GitHub <#100>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEZ6BAN6V3AP427TLSPHCKTXG6HKRANCNFSM6AAAAAAYH53MNA> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

Curiosity007 changed the title ~~Unable to come out with insights on Overall Data~~ Unable to Provide with insights on Overall Data - Only Taking top 5 or 7 Documents May 19, 2023

Curiosity007 changed the title ~~Unable to Provide with insights on Overall Data - Only Taking top 5 or 7 Documents~~ Unable to Provide insights on Overall Data - Only Taking top 5 or 7 Documents May 19, 2023

Curiosity007 changed the title ~~Unable to Provide insights on Overall Data - Only Taking top 5 or 7 Documents~~ Unable to Provide insights on Overall Data - Only Taking top 5 or 7 chunks May 19, 2023

su77ungr added the help wanted Extra attention is needed label May 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to Provide insights on Overall Data - Only Taking top 5 or 7 chunks #100

Unable to Provide insights on Overall Data - Only Taking top 5 or 7 chunks #100

Curiosity007 commented May 19, 2023 •

edited

Loading

neeewwww commented May 19, 2023 via email

Unable to Provide insights on Overall Data - Only Taking top 5 or 7 chunks #100

Unable to Provide insights on Overall Data - Only Taking top 5 or 7 chunks #100

Comments

Curiosity007 commented May 19, 2023 • edited Loading

.env

Generic

Ingestion

Generation

Python version

System

CASALIOY version

Information

Related Components

Reproduction

Expected behavior

neeewwww commented May 19, 2023 via email

Curiosity007 commented May 19, 2023 •

edited

Loading