-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First input file is the only one processed #61
Comments
A check of the metadata field in the qa_tuples_filtered folder shows only the first file lead to question/answer pairs: grep -r metadata qatuples_filtered |
Hmm that's strange. I'm not able to reproduce this using the toy example the repo starts with, so that leaves a few possibilities:
Would you be against sharing your input files and maybe your config so I can try to repro it on my end, or is that stuff confidential? |
Here is the config file:
FINAL_ASSISTANT_PROMPT_RAG: 'You are a helpful AI assistant.
MODE: api Unfortunately, I cannot share our input files. I found the config file that was used to process army training manuals: I modified the config to use our input files and model and it appears that most, if not all input files now are being processed. |
Our input folder contains 11 files. All appear to be read in:
Successfully read file: ./input/innovationqkb.glossary.WordPress.2024-07-27.xml.md
JSON file saved successfully.
Successfully read file: ./input/innovationqkb.WordPress.2024-07-26.xml.md
JSON file saved successfully.
Successfully read file: ./input/ipcomkb.WordPress.2024-07-26.xml.md
JSON file saved successfully.
Successfully read file: ./input/iqideaskb.WordPress.2024-07-26.xml.md
JSON file saved successfully.
Successfully read file: ./input/ipcomkb.faq.WordPress.2024-07-27.xml.md
JSON file saved successfully.
Successfully read file: ./input/priorartdatabasekb.WordPress.2024-07-26.xml.md
JSON file saved successfully.
Successfully read file: ./input/iqideaskb.faq.WordPress.2024-07-27.xml.md
JSON file saved successfully.
Successfully read file: ./input/innovationqkb.faq.WordPress.2024-07-27.xml.md
JSON file saved successfully.
Successfully read file: ./input/priorartdatabasekb.glossary.WordPress.2024-07-27.xml.md
JSON file saved successfully.
Successfully read file: ./input/iqideaskb.glossary.WordPress.2024-07-27.xml.md
JSON file saved successfully.
Successfully read file: ./input/priorartdatabasekb.faq.WordPress.2024-07-27.xml.md
Pretraining set created.
However, only the first file: ./input/innovationqkb.glossary.WordPress.2024-07-27.xml.md has question/answer pairs produced.
The augmentoolkit output messages do not appear to give an indication as to whether there is an issue.
COMPLETED PHASE 0
Output written to /tmp/augmentoolkit/original/output/question_generation_generations/question_generation_generations/a0db9260-500e>
Output written to /tmp/augmentoolkit/original/output/question_generation_generations/question_generation_generations/a7bc5e7d-950c>
Output written to /tmp/augmentoolkit/original/output/question_generation_generations/question_generation_generations/bd42e735-6bba>
Output written to /tmp/augmentoolkit/original/output/question_generation_generations/question_generation_generations/c6ddcde3-8678>
Output written to /tmp/augmentoolkit/original/output/question_generation_generations/question_generation_generations/15a96815-222f>
FAILED TO GENERATE QUESTIONS!
Output written to /tmp/augmentoolkit/original/output/question_generation_generations/question_generation_generations/91488b07-8c85>
Output written to /tmp/augmentoolkit/original/output/question_generation_generations/question_generation_generations/f8913bd2-afb5>
Output written to /tmp/augmentoolkit/original/output/question_generation_generations/question_generation_generations/8ef1f903-2906>
Output written to /tmp/augmentoolkit/original/output/question_generation_generations/question_generation_generations/ee46ce5b-8461>
FAILED TO GENERATE QUESTIONS!
Output written to /tmp/augmentoolkit/original/output/question_generation_generations/question_generation_generations/86b05937-c2a0>
Output written to /tmp/augmentoolkit/original/output/question_generation_generations/question_generation_generations/1b6b185c-b3fd>
Output written to /tmp/augmentoolkit/original/output/question_generation_generations/question_generation_generations/0adb9229-3210>
Output written to /tmp/augmentoolkit/original/output/question_generation_generations/question_generation_generations/d51cf6a9-1745>
FAILED TO GENERATE QUESTIONS!
Output written to /tmp/augmentoolkit/original/output/question_generation_generations/question_generation_generations/921636fe-4076>
FAILED TO GENERATE QUESTIONS!
Output written to /tmp/augmentoolkit/original/output/question_generation_generations/question_generation_generations/456d0805-b8c7>
Output written to /tmp/augmentoolkit/original/output/question_generation_generations/question_generation_generations/f09a3fd9-3fcf>
FAILED TO GENERATE QUESTIONS!
Output written to /tmp/augmentoolkit/original/output/question_generation_generations/question_generation_generations/bf45b67b-81da>
FAILED TO GENERATE QUESTIONS!
Output written to /tmp/augmentoolkit/original/output/question_generation_generations/question_generation_generations/01b56723-abdf>
Output written to /tmp/augmentoolkit/original/output/question_generation_generations/question_generation_generations/b89ac365-1286>
FAILED TO GENERATE QUESTIONS!
Output written to /tmp/augmentoolkit/original/output/question_generation_generations/question_generation_generations/1f0d3583-f9e7>
COMPLETED PHASE 1
Each file written in phase 1 appears to correspond to questions/answers related to a paragraph in the document: ./input/innovationqkb.glossary.WordPress.2024-07-27.xml.md
What are some possible reasons that files in the input folder are skipped?
The text was updated successfully, but these errors were encountered: