-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added best practice guide for QNA #3
base: main
Are you sure you want to change the base?
Conversation
Best practices guide from the OCI and Lisa Signed-off-by: JJ Asghar <[email protected]>
e4edff5
to
d7bfe8a
Compare
It would be good if this was formatted better. I think someone volunteered to format it better but I dont remember who. :) |
|
||
- Things to Avoid | ||
- Historically, LLM is bad in math | ||
- Do not provide complex math calculation in Q&A seeds. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if this would look better as a single line because it references the same subject
|
||
- Context | ||
- What if knowledge is based on documents not existing in the base model? | ||
- In the qna.yaml file, you can pass context within a chunk of information (text from the document that Q&A are based on). Adding context to the skill QnA file might generate better-quality data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the formatting. I was thinking instead of a second bullet point, just leave it as a paragraph.
For example
- What if knowledge is based on documents not existing in the base model?
In the qna.yaml file, you can pass context within a chunk of information (text from the document
that Q&A are based on). Adding context to the skill QnA file might generate better-quality data.
And follow that pattern for the other as well. WDYT?
- How to check the quality of the data in a large data set of the qna.yaml file? | ||
- You don’t have to check out synthetic data generated by the SDG process. After generating synthetic data internally, the IBM Research team is sampling to check quality (no need to check them all, especially for extensive set). | ||
|
||
- Quality |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like how this section is formatted!
- What if knowledge is based on documents not existing in the base model? | ||
- In the qna.yaml file, you can pass context within a chunk of information (text from the document that Q&A are based on). Adding context to the skill QnA file might generate better-quality data. | ||
|
||
- Formatting & Front-End specific and may change |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing here, something like
- How to format data in the Q&A file especially how to format tables?
Currently, only files in Markdown format are supported.
- If the files are in any other format, they must be converted to Markdown format
- For automatic converters, we recommend experimenting with other Markdown conversions like ‘markdown_strict’, ‘asciidoc’ and ‘gfm’
- The number of seed examples | ||
- How many seeds I should provide? | ||
- The number of seeds: | ||
- Generating ~300 QnA pairs from ~5 seed examples is recommended by InstructLab product team. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the QnA, QNA and qna should be switched to Q&A to be more consistent. Right know Im between using Q&A or QnA, but I do think we should be consistent. What do folks think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jjasghar I had some ideas for the possible formats, but Id def like to know your thoughts as well!
Best practices guide from the OCI and Lisa