[DocQA] add: new feature using DocQA to evaluate given file #35

JLin-dev · 2024-08-14T08:29:52Z

This pull request introduces a new evaluate.py file along with updates to the existing DocQA module. The primary goal of this update is to enhance the evaluation capabilities of the DocQA system by providing two distinct evaluation methods: Retriever Evaluation and End-to-End Evaluation.

Key Features

Evaluator Integration in Executor DocQA:
- The updated evaluate.py can be uploaded to the Executor DocQA, enabling it to perform two different types of evaluations:
  - Retriever Evaluation: This involves passing chunks and questions to the LLM to assess the relevance of the retrieved chunks, ensuring that the retriever has accurately identified related content.
  - End-to-End Evaluation: This checks whether the LLM-generated answer correctly addresses the given question, thus evaluating the overall performance of the LLM.
Basic Evaluation Functions:
- The updated files provide essential functions that facilitate both types of evaluations. Additionally, the in-app results feature allows users to assess the yield results directly within the chat interface.
Future Improvement Paths:
- Path 1: Mistral LLM Integration:
  - Future work can involve integrating the open-source Mistral LLM, along with a tuned critic bot, to evaluate the performance of the retriever and LLM more effectively for end-to-end scenarios.
- Path 2: RAGAS Framework Integration:
  - The evaluation process can be further enhanced by using formats such as the DRCD structure (e.g., DRCD) to pass data into the RAGAS framework. The evaluate.py file also includes functionality to handle DRCD-formatted data for this purpose.

These updates are a significant step forward in improving the evaluation and performance measurement of the DocQA system. I look forward to any feedback and suggestions for further enhancements.

Update eval file with function calling in docqa

62883c9

ifTNT added the good first issue Good for newcomers label Aug 14, 2024

ifTNT added 2 commits August 19, 2024 18:02

Merge branch 'dev' of github.com:kuwaai/genai-os into JLin-dev/main

772110a

[rag-eval] fix: remove data from repository

dcfe444

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DocQA] add: new feature using DocQA to evaluate given file #35

[DocQA] add: new feature using DocQA to evaluate given file #35

JLin-dev commented Aug 14, 2024

[DocQA] add: new feature using DocQA to evaluate given file #35

Are you sure you want to change the base?

[DocQA] add: new feature using DocQA to evaluate given file #35

Conversation

JLin-dev commented Aug 14, 2024

Key Features