Skip to content

Commit

Permalink
Update knowledge-submissions-past-wikipedia.md
Browse files Browse the repository at this point in the history
Added new approved Knowledge submission data sources, updated status for several.

Updated documentation with new process to take in requested knowledge sources to be open a PR against this devdoc.

Related to issue #59 which should be closed once this PR is reviewed and merged.

Co-Authored-by: JJ Asghar <[email protected]>
Signed-off-by: Leslie Hawthorn <[email protected]>
  • Loading branch information
lhawthorn and jjasghar committed Jun 26, 2024
1 parent b4e8df2 commit 68b81b2
Showing 1 changed file with 31 additions and 17 deletions.
48 changes: 31 additions & 17 deletions docs/knowledge-submissions-past-wikipedia.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,27 +24,41 @@ Status:
- `denied`: Denied by the legal team, and posted on the [avoided list][avoided].
- `submitted`: Sent to the legal team for review
- `proposed`: The community would like to propose this as a possible place to take knowledge submissions from.
- `reviewed - manually verify`: Legal team has reviewed this domain and while much of its source material meets our open licensing criteria, not all of it does. Each submission from this source must be manually verified to actually be under an appropriate content license or e.g. definitively in the public domain.

For the purposes of Knowledge submissions to the InstructLab project, data sourced from items in the `approved` category require no further vetting from the Triage and/or other Maintainer teams. Items in the `reviewed - manually verify` category will require vetting before the submission can be accepted.

To ensure that the data you would like to include in your knowledge submission meets the project licensing criteria, please make sure to talk to the Taxonomy maintainer team *before* you begin work on your submission. We would hate for you to do a great deal of work only to be told that the data source you selected would not work for the project. Please make sure you review the [Getting Started with Knowledge Submissions](https://github.com/instructlab/taxonomy?tab=readme-ov-file#getting-started-with-knowledge-contributions) documentation prior to submitting your request.

| Domain name | Status | Notes |
| :-- | :-- | :-- |
| <https://en.wikipedia.org/wiki/Main_Page> | approved | |
| Wikipedia: <https://en.wikipedia.org/wiki/Main_Page> | approved | |
| Project Gutenberg: <https://www.gutenberg.org/> | approved | Pre-1927 works; public domain under US copyright law |
| <https://www.congress.gov/> | proposed | |
| <https://www.whitehouse.gov/> | proposed | |
| <https://www.senate.gov/> | proposed | |
| <https://www.irs.gov/> | proposed | |
| NASA: <https://www.nasa.gov/> | proposed | See guidelines: <https://www.nasa.gov/nasa-brand-center/images-and-media/> |
| Smithsonian Libraries: <https://library.si.edu/>| proposed | For any material marked \"No Copyright - United States" or "CC0" as described here: <https://library.si.edu/copyright> |
| European Union (EU): <https://european-union.europa.eu/> | proposed | Specifically documents submitted under "public registrars": <https://european-union.europa.eu/principles-countries-history/principles-and-values/access-information_en> |
| Internet Archive: <https://archive.org/> | proposed | Pre-1927 works; public domain under US copyright law |
| Wikisource (library): <https://en.wikisource.org/> | proposed | "free library that anyone can improve" |

### Next steps

1. We have to find the correct legal person to find a way to be the correct point person for this project.
1. Collect suggested places from the community and add them to the above table
1. Work with our legal team to get approvals and denials.
1. Inform the triage team and triagers of the new locations we can or can not accept.
| Wikisource (library): <https://en.wikisource.org/> | approved | "free library that anyone can improve" |
| OpenStax textbooks family of publications <https://openstax.org/subjects> | approved | |
| The Open Organization publications <https://theopenorganization.org/> | approved | |
| The Scrum Guide <https://scrumguides.org/index.html> | approved | |
| <https://www.congress.gov/> | reviewed - manually verify | |
| <https://www.whitehouse.gov/> | reviewed - manually verify | |
| <https://www.senate.gov/> | reviewed - manually verify | |
| <https://www.irs.gov/> | reviewed - manually verify| |
| NASA: <https://www.nasa.gov/> | reviewed - manually verify | See guidelines: <https://www.nasa.gov/nasa-brand-center/images-and-media/> |
| Smithsonian Libraries: <https://library.si.edu/>| reviewed - manually verify | For any material marked \"No Copyright - United States" or "CC0" as described here: <https://library.si.edu/copyright> |
| European Union (EU): <https://european-union.europa.eu/> | reviewed - manually verify | Specifically documents submitted under "public registrars": <https://european-union.europa.eu/principles-countries-history/principles-and-values/access-information_en> |
| Internet Archive: <https://archive.org/> | reviewed - manually verify | Pre-1927 works; public domain under US copyright law |
| PLOS family of open access journals: <https://plos.org/publish/> | reviewed - manually verify | |
| Open Practice Library: <https://openpracticelibrary.com/> | reviewed - manually verify | |
| Cynefin.io wiki: <https://cynefin.io/wiki/Main_Page> | reviewed - manually verify | |
| The Open Education Project: <https://research.redhat.com/blog/research_project/foundations-in-open-source-education/> | reviewed - manually verify | |

### Process steps

1. Collect suggested places from the community by requesting they submit a pull request against this dev doc.
1. Work with our legal team to adjudicate. [@lhawthorn](https://github.com/lhawthorn) is currently the owner of this step, but is happy to educate & empower other folks to do this work.
1. Inform the triage team and triagers of the new locations we can or can not accept. This is currently done via an announcement in the [daily Triager Standup](https://github.com/instructlab/community/blob/main/Collaboration.md#triager-standup) and via a pull request to update the Knowledge Guide in one of the two locations listed below.

- Approved sources: <https://github.com/instructlab/taxonomy/blob/main/docs/KNOWLEDGE_GUIDE.md#accepted-knowledge>
- Rejected sources: <https://github.com/instructlab/taxonomy/blob/main/docs/KNOWLEDGE_GUIDE.md#avoid-these-topics>

[approved]: https://github.com/instructlab/taxonomy/blob/main/docs/KNOWLEDGE_GUIDE.md#accepted-knowledge
[avoided]: https://github.com/instructlab/taxonomy/blob/main/docs/KNOWLEDGE_GUIDE.md#avoid-these-topics

0 comments on commit 68b81b2

Please sign in to comment.