Skip to content

Commit

Permalink
minor grammer changes
Browse files Browse the repository at this point in the history
  • Loading branch information
colinosullivan-ie committed Dec 17, 2024
1 parent 4897c0c commit e2e2872
Showing 1 changed file with 4 additions and 4 deletions.
8 changes: 4 additions & 4 deletions docs/taxonomy/sdg_guidelines.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,13 @@ The OIC initiative promoted discussion spaces for sharing and recording best pra
| Things to Avoid | Historically, LLM is bad in math | Do not provide complex math calculation in Q&A seeds. |
| Context | What if knowledge is based on documents not existing in the base model? | In the qna.yaml file, you can pass context within a chunk of information (text from the document that Q&A are based on). Adding context to the skill QnA file might generate better quality data. |
| Formatting & Front-End specific and may change | How to format data in the Q&A file especially how to format tables? | Currently, only files in Markdown format are supported. If the files are in any other format, they must be converted to Markdown format. For automatic converters, we recommend experimenting with other Markdown conversions like ‘markdown_strict’, ‘asciidoc’ and ‘gfm’. |
| Intervene in Training | Can I use generated json files to prompt-tuning (watsonx.ai) or using HuggingFace directly? | The output of SDG is in json format and can also be used for traditional fine-tuning. |
| Intervene in Training | Can I use generated json files for prompt-tuning (watsonx.ai) or using HuggingFace directly? | The output of SDG is in json format and can also be used for traditional fine-tuning. |
| Quantities | How many seeds should I provide? | The number of seeds: <ol><li>Generating ~300 QnA pairs from ~5 seed examples is recommended by the InstructLab product team.</li><li>Knowledge requires 5 pieces of context from document each with 3 QNAs specific to each context piece for a total of 15 QnA pairs.</li><li>We tried with less than 300 QnA pairs but found the QnA quality only satisfactory.</li><ol> |
| Task description | | The task description should be grounded in the domain/document. - Due to this recommendation we should keep in mind that much complex cases can be splitted into smaller chunks of information |
| Task description | | The task description should be grounded in the domain/document. - Due to this recommendation we should keep in mind that complex cases can be split into smaller chunks of information |
| Context size limitation | What is the size limit of context window in the Q&A file (qna.yaml)? | Context size limitation: There is a ~2300 context size limit in the QnA yaml file. It is advised to keep the ground truth answers concise to respect this limit. |
| After Training | How to check the quality of the data in a large data set of the qna.yaml file? | You don’t have to check out synthetic data generated by the SDG process. After generating synthetic data internally, the IBM Research team is sampling to check quality (no need to check them all, especially for extensive set). |
| Quality | How to measure quality of obtained data? | To evaluate SDG, you can use following rating range (1-5): <ol><li>Irrelevant Answer</li><li>Relevant but not close to ground truth, model might be hallucinating.</li><li>Relevant, model not hallucinating, partly matching the ground truth.</li><li>Relevant, model not hallucinating, model is adding irrelevant/unnecessary information</li><li>Excellent Answer, Matches closely with Ground Truth</li><ol> |
| Manual validation | During the manual validation, it understood the entity and intent of the question and searched for the same entity and intent in the corresponding document. The document information was provided in the generated JSON file. | At the next step, manual search validated it the steps or definitions contained in the answer were indeed in the corresponding document. |
| Quality | How to measure quality of obtained data? | To evaluate SDG, you can use the following rating range (1-5): <ol><li>Irrelevant Answer</li><li>Relevant but not close to ground truth, model might be hallucinating.</li><li>Relevant, model not hallucinating, partly matching the ground truth.</li><li>Relevant, model not hallucinating, model is adding irrelevant/unnecessary information</li><li>Excellent Answer, Matches closely with Ground Truth</li><ol> |
| Manual validation | During the manual validation, it understood the entity and intent of the question and searched for the same entity and intent in the corresponding document. The document information was provided in the generated JSON file. | At the next step, manual search validates that the steps or definitions contained in the answer were indeed in the corresponding document. |
| Quality of data generated | How to enhance the quality of data generated in SD | <ul><li>Task description: Add a task description relevant to the knowledge documents. We tried adding a custom task description to improve the SDG.</li><li>Prompt template: Add guidelines for instruction and output to stick to document-related keywords and generate instructions from tables. We specifically added these instructions to the prompt template.</li><li>Chunk word count: Increase the word count to increase the chunk sizes taken from the documents in SDG for long answered (Q&A pairs)</li><li>Rouge threshold: To strictly enforce/penalize data quality, one can increase the rouge threshold in the iLab generate command.</li><li>The question and answer pairs should be complete sentences, well formed, and use proper grammar. Longer answers are better than a short yes or no.</li><li>Also, the question and answer pairs must be answered by the associated context.</li></ul> |
| Formatting | How many leaf nodes are kept in the taxonomy after adding a Q&A file? | The documents are kept in single leaf node and has one qna file and one attribution.txt. |

0 comments on commit e2e2872

Please sign in to comment.