You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/taxonomy/knowledge/file_structure.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -23,7 +23,7 @@ Key | Type | Required | Constraints | Value | Notes
23
23
`created_by` | string | Y | no spaces | Your GitHub username (for the upstream taxonomy) or your name with no spaces (for general intructlab use) | -
24
24
`domain` | string | Y | - | Knowledge sub-category | The knowledge domain which is used in prompts to the teacher model during synthetic data generation. The domain should be brief such as the title to a textbook chapter or section.
25
25
`seed_examples` | Y | array | at least 5 sets | null | This is a collection of questions and answers with context from the knowledge document that InstructLab uses to generate data synthetically.
26
-
`context` | string | Y | < 500 tokens | A chunk of the knowledge document showing off the different **unique** content to help guide the teacher model. If the knowledge documents have only text, all context would be text. If the knowledge documnets have tables or other content formats, ensure samples of those formats are all used. | This should be a copy-paste from the Markdown version of your document
26
+
`context` | string | Y | < 500 tokens | A chunk of the knowledge document showing off the different **unique** content to help guide the teacher model. If the knowledge documents have only text, all context would be text. If the knowledge documents have tables or other content formats, ensure samples of those formats are all used. | This should be a copy-paste from the Markdown version of your document
27
27
`questions_and_answers` | Y | array | at least 3 pairs per context | null | This is a collection of questions and answers.
28
28
`question` | Y | string | \> 250 tokens | A question related to and grounded in the relevant context | Questions are things you'd expect someone to ask the model based on the context given. This will be used for synthetic data generation.
29
29
`answer` | Y | string | \> 250 tokens | An answer for the question, longer than a one-word or one-number answer | Answers are what you'd like the model to give as an answer. It will not be an exact answer the model always gives.
0 commit comments