Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QnA Documents Diverge from Azure Index Document Source #72

Open
3 tasks
ibrahim-kabir opened this issue Feb 14, 2024 · 3 comments
Open
3 tasks

QnA Documents Diverge from Azure Index Document Source #72

ibrahim-kabir opened this issue Feb 14, 2024 · 3 comments
Assignees

Comments

@ibrahim-kabir
Copy link

Summary:

The current QnA generation process relies on different documents than the Azure Indexes. QnAs are generated from the 'crawl' table, but Finesse use finesse-public-guidance index is generated from the 'public-guidance' blob storage. This causes issues with generating scores of 0 in the Finesse Benchmark tool located in the repository finesse-backend. The script needs modification to generate documents from the blob storage instead of the 'louis_crawl' postgreSQL databse.

Tasks:

  • Modify the script to generate QnAs from the Index blob storage
  • Test the new QnAs with the Finesse Benchmark tool
  • Update finesse-data with the new QnAs

Acceptance Criteria:

  • The script successfully connects to the Azure Blob Storage account.
  • Documents are retrieved from the 'public-guidance' container in the Azure Blob Storage.
  • QnAs are generated using documents from the blob storage.
  • Running the Finesse Benchmark tool with the modified script does not result in scores of 0.
@rngadam
Copy link
Contributor

rngadam commented Feb 16, 2024

AFAIK @k-allagbe imported guidance documents from the same crawl table into the blob storage?

@ibrahim-kabir
Copy link
Author

AFAIK @k-allagbe imported guidance documents from the same crawl table into the blob storage?

@rngadam After the testing I noticed that the 2 null scores (0%) are supposed to be in the bad_questions repository. This means that all the questions returned a score. Moreover, the imported guidance documents are indeed from the same crawl table than the blob storage. This lead me to think that the questions are perfectly generated and we only need more of them and to better classify them (into the good or bad questions repository). So before trying to fix this issue, I propose to generate more questions and test them on Finesse to verify that all documents return a score.

@ibrahim-kabir
Copy link
Author

ibrahim-kabir commented Mar 5, 2024

AFAIK @k-allagbe imported guidance documents from the same crawl table into the blob storage?

@rngadam , @k-allagbe confirmed me they were imported from the crawl table. I will close this issue if you agree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

When branches are created from issues, their pull requests are automatically linked.

2 participants