Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DocQA] add: new feature using DocQA to evaluate given file #35

Open
wants to merge 3 commits into
base: dev
Choose a base branch
from

Conversation

JLin-dev
Copy link

This pull request introduces a new evaluate.py file along with updates to the existing DocQA module. The primary goal of this update is to enhance the evaluation capabilities of the DocQA system by providing two distinct evaluation methods: Retriever Evaluation and End-to-End Evaluation.

Key Features

  1. Evaluator Integration in Executor DocQA:

    • The updated evaluate.py can be uploaded to the Executor DocQA, enabling it to perform two different types of evaluations:
      • Retriever Evaluation: This involves passing chunks and questions to the LLM to assess the relevance of the retrieved chunks, ensuring that the retriever has accurately identified related content.
      • End-to-End Evaluation: This checks whether the LLM-generated answer correctly addresses the given question, thus evaluating the overall performance of the LLM.
  2. Basic Evaluation Functions:

    • The updated files provide essential functions that facilitate both types of evaluations. Additionally, the in-app results feature allows users to assess the yield results directly within the chat interface.
  3. Future Improvement Paths:

    • Path 1: Mistral LLM Integration:
      • Future work can involve integrating the open-source Mistral LLM, along with a tuned critic bot, to evaluate the performance of the retriever and LLM more effectively for end-to-end scenarios.
    • Path 2: RAGAS Framework Integration:
      • The evaluation process can be further enhanced by using formats such as the DRCD structure (e.g., DRCD) to pass data into the RAGAS framework. The evaluate.py file also includes functionality to handle DRCD-formatted data for this purpose.

These updates are a significant step forward in improving the evaluation and performance measurement of the DocQA system. I look forward to any feedback and suggestions for further enhancements.

@ifTNT ifTNT added the good first issue Good for newcomers label Aug 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants