New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Added best practice guide for QNA #3

Open

jjasghar wants to merge 1 commit into main from jjasghar/qna_bestpractices

Member

jjasghar commented Sep 9, 2024

Best practices guide from the OCI and Lisa


          Added best practice guide for QNA

d7bfe8a

Best practices guide from the OCI and Lisa

Signed-off-by: JJ Asghar <[email protected]>

jjasghar force-pushed the jjasghar/qna_bestpractices branch from e4edff5 to d7bfe8a Compare

September 9, 2024 21:04

Member

joesepi commented Sep 23, 2024

It would be good if this was formatted better. I think someone volunteered to format it better but I dont remember who. :)

kelbrown20 reviewed

View reviewed changes

docs/taxonomy/qna_yaml_best_practices.md

+              - Things to Avoid
+                  - Historically, LLM is bad in math
+                  - Do not provide complex math calculation in Q&A seeds.

Contributor

kelbrown20 Nov 18, 2024

I'm wondering if this would look better as a single line because it references the same subject

kelbrown20 reviewed

View reviewed changes

docs/taxonomy/qna_yaml_best_practices.md

+              - Context
+                  - What if knowledge is based on documents not existing in the base model?
+                  - In the qna.yaml file, you can pass context within a chunk of information (text from the document that Q&A are based on). Adding context to the skill QnA file might generate better-quality data.

Contributor

kelbrown20 Nov 18, 2024 •

edited

Loading

For the formatting. I was thinking instead of a second bullet point, just leave it as a paragraph.
For example

- What if knowledge is based on documents not existing in the base model?

  In the qna.yaml file, you can pass context within a chunk of information (text from the document 
  that Q&A are based on). Adding context to the skill QnA file might generate better-quality data.

And follow that pattern for the other as well. WDYT?

kelbrown20 reviewed

View reviewed changes

docs/taxonomy/qna_yaml_best_practices.md

+                  - How to check the quality of the data in a large data set of the qna.yaml file?
+                  - You don’t have to check out synthetic data generated by the SDG process. After generating synthetic data internally, the IBM Research team is sampling to check quality (no need to check them all, especially for extensive set).
+              - Quality

Contributor

kelbrown20 Nov 18, 2024

I really like how this section is formatted!

kelbrown20 reviewed

View reviewed changes

docs/taxonomy/qna_yaml_best_practices.md

+                  - What if knowledge is based on documents not existing in the base model?
+                  - In the qna.yaml file, you can pass context within a chunk of information (text from the document that Q&A are based on). Adding context to the skill QnA file might generate better-quality data.
+              - Formatting & Front-End specific and may change

Contributor

kelbrown20 Nov 18, 2024 •

edited

Loading

Same thing here, something like

- How to format data in the Q&A file especially how to format tables?

  Currently, only files in Markdown format are supported.
     - If the files are in any other format, they must be converted to Markdown format
     - For automatic converters, we recommend experimenting with other Markdown conversions like ‘markdown_strict’, ‘asciidoc’ and ‘gfm’

kelbrown20 reviewed

View reviewed changes

docs/taxonomy/qna_yaml_best_practices.md

+                  - The number of seed examples
+                  - How many seeds I should provide?
+                  - The number of seeds:
+                      - Generating ~300 QnA pairs from ~5 seed examples is recommended by InstructLab product team.

Contributor

kelbrown20 Nov 18, 2024

I think the QnA, QNA and qna should be switched to Q&A to be more consistent. Right know Im between using Q&A or QnA, but I do think we should be consistent. What do folks think?

kelbrown20 reviewed

View reviewed changes

Contributor

kelbrown20 left a comment

@jjasghar I had some ideas for the possible formats, but Id def like to know your thoughts as well!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet