diff --git a/docs/guides/README.md b/docs/guides/README.md index 4753f7e22..7101a5d81 100644 --- a/docs/guides/README.md +++ b/docs/guides/README.md @@ -2,7 +2,8 @@ This folder contains various task-specific curation guides. -- [Curating new providers](/curation/providers) +- [Curating new providers](curation/providers) +- [Curating new publications and references](curation/publications) ## How to add new guides @@ -11,7 +12,7 @@ This folder contains various task-specific curation guides. (see [here](https://github.com/biopragmatics/bioregistry/blob/fe2a685503ae2c9ff863908bf885c71fd240c21d/docs/guides/providers.md?plain=1#L1-L5) for an example) -3. Add it to the list above +3. Add it to the list above. Don't include a forward slash `/` in the beginning of the link! ## What makes a good guide diff --git a/docs/guides/publications.md b/docs/guides/publications.md new file mode 100644 index 000000000..6f3001b38 --- /dev/null +++ b/docs/guides/publications.md @@ -0,0 +1,76 @@ +--- +layout: page +title: Curating Publications and References +permalink: /curation/publications +--- + +The example below shows a subset of the record for +[3D Metabolites (3dmet)](https://bioregistry.io/3dmet) that highlights the `publications` list. +Note that each entry is a dictionary with several parts: + +1. `title` (required) - the title of the paper +2. `year` (highly recommended) - the year of publication of the paper +3. `pubmed`, `doi`, and `pmc` (one or more required) - identifiers for the paper + +```json +"3dmet": { + "name": "3D Metabolites", + "publications": [ + { + "doi": "10.2142/biophysico.15.0_87", + "pmc": "PMC5992871", + "pubmed": "29892514", + "title": "Chemical curation to improve data accuracy: recent development of the 3DMET database", + "year": 2018 + }, + { + "doi": "10.1021/ci300309k", + "pubmed": "23293959", + "title": "Three-dimensional structure database of natural metabolites (3DMET): a novel database of curated 3D structures", + "year": 2013 + } + ] + }, +``` + +Similarly, there are URL references that are not _publications_ that are worth curating. These can be +stored in the `references` list. For example, the +[Registry of Toxic Effects of Chemical Substances (rtecs)](https://bioregistry.io/rtecs) entry appears in the +Bioregistry because of its usage, but it is hard to find information on the internet about it. Therefore, the +references list is perfect for storing references to PDFs and webpages that describe the resource. + +```json +"rtecs": { + "name": "Registry of Toxic Effects of Chemical Substances", + "publications": [ + { + "doi": "10.1016/s1074-9098%2899%2900058-1", + "title": "An overview of the Registry of Toxic Effects of Chemical Substances (RTECS): Critical information on chemical hazards", + "year": 1999 + } + ], + "references": [ + "https://www.cdc.gov/niosh/docs/97-119/pdfs/97-119.pdf", + "https://www.cdc.gov/niosh/npg/npgdrtec.html" + ] +} +``` + +What else is good to keep track of in the references list: + +1. Bioregistry issue or pull requests about the resource +2. Links to webpages describing the identifier resource +3. Links to discussions on Slack or other platforms (keeping in mind links might not last forever) +4. Any other context that's useful for a Bioregistry reader + +## Why Should I Curate Publications and References? + +1. They give additional context for Bioregistry readers who want to know more about the paper +2. They make it easier to attribute usage of identifiers from a given resource to its authors +3. They enable global landscape analysis of when and where identifier resources are being made. The following image is + automatically regenerated with each Bioregistry update: + + ![](https://raw.githubusercontent.com/biopragmatics/bioregistry/refs/heads/main/docs/img/bibliography_years.svg) +4. They support the training of a machine learning for semi-automated curation of additional literature. See + this [talk](https://docs.google.com/presentation/d/1h2IajyGkUxUPHubEi8_WE6xW6TOuOihn5zsmi4kYrrc/edit?usp=sharing) + from the 2022 Workshop on Prefixes, CURIEs, and IRIs. diff --git a/tests/test_data.py b/tests/test_data.py index ebee8d48c..3496a674a 100644 --- a/tests/test_data.py +++ b/tests/test_data.py @@ -854,13 +854,18 @@ def test_request_issue(self): def test_publications(self): """Test references and publications are sorted right.""" + msg_fmt = ( + "Rather than writing a {} link in the `references` list, " + "you should encode it in the `publications` instead. " + "See https://biopragmatics.github.io/bioregistry/curation/publications for help." + ) for prefix, resource in self.registry.items(): with self.subTest(prefix=prefix): if resource.references: for reference in resource.references: - self.assertNotIn("doi", reference) - self.assertNotIn("pubmed", reference) - self.assertNotIn("pmc", reference) + self.assertNotIn("doi", reference, msg=msg_fmt.format("DOI")) + self.assertNotIn("pubmed", reference, msg=msg_fmt.format("PubMed")) + self.assertNotIn("pmc", reference, msg_fmt.format("PMC")) self.assertNotIn("arxiv", reference) if resource.publications: for publication in resource.publications: