DCN July 2020 Hackathon updates to Jupyter Notebook Primer #19

kekoziar · 2020-07-17T16:11:00Z

Separate commits detail proposed changes.
To summarize the proposed changes:

Expanded and clarified sections that refer to computer science terms.
- Clarified Kernels and how cells run
- Clarified and added examples of dependencies and citation files
- Expanded first key curatorial question
Added to and clarified examples of notebooks which are archived in repositories
Minor corrections: version number, how a resource was referenced, broken links, renumbered endnotes due to additions/minor changes, added title/alt text to images

PR made on behalf of our team:
@kozlowwe
@gjanee
@srerickson
@cincyamyK
@kekoziar
@gdntmoon

Update Jupyter Notebook version number in the format overview table

The guidance is provided by the Software Sustainability Institute (1), and funded by Jisc (2).

Clarified for curators unfamiliar with computer science terminology the relation between a kernel and programming language. Elaborated on the cell order and expectations of users (those who download a notebook)

expand dependencies section to include other types of dependencies file. Annotate citation.cff Clarify that a container metafile is appropriate to request if used.

Added annotations and clarifications.

Add clarifying question to help curator unfamiliar with code. Add examples of ipynb archived in data repositories. add/renumber associated end-notes.

Add title and alt text for decision tree images.

dbouquin · 2020-08-12T15:18:12Z

Jupyter Notebook Data Curation Primer/Jupyter Notebooks Data Curation Primer.md

@@ -100,10 +100,10 @@ The following elements outline recommendations for repositories accepting Jupyte
 - Additional files to request:
  - PDF of the Jupyter Notebook (export from Jupyter web application or [nbviewer](https://nbviewer.jupyter.org/))
  - reST export of the Jupyter Notebook (export from Jupyter web application)
-  - CodeMeta.json
-  - CITATION.cff
+  - CodeMeta.json, requirements.txt, or environment.yml (dependencies)


I would recommend listing CodeMeta.json as preferred at least as it provides the ability to define more extensive structured metadata using a controlled vocab.

dbouquin · 2020-08-12T15:19:06Z

Jupyter Notebook Data Curation Primer/Jupyter Notebooks Data Curation Primer.md

-  - CodeMeta.json
-  - CITATION.cff
+  - CodeMeta.json, requirements.txt, or environment.yml (dependencies)
+  - CITATION.cff (a software citation file appropriate if not depositing in a repository)


Adding a citation file is always appropriate— many repositories do not have the fields necessary to automatically generate a proper software citation.

dbouquin · 2020-08-12T15:23:03Z

Jupyter Notebook Data Curation Primer/Jupyter Notebooks Data Curation Primer.md

-    - Documents what the Jupyter Notebook is for
-    - Request that this file include citation(s) to third-party algorithms and analyses
-    - Recommend code comments within the Notebook file itself in addition to the README file
+    - Documents what the Jupyter Notebook is for (but recommendation is that the Notebook utilize code comments)


Code comments should not be seen as a replacement or alternative to providing a README file. The code comments are used to describe what specific sets of cells do, but the notebook itself can have a much broader description and context.

dbouquin · 2020-08-12T15:23:57Z

Jupyter Notebook Data Curation Primer/Jupyter Notebooks Data Curation Primer.md

  - CITATION.cff for the Notebook
    - Preferred citation; should enable native software citation
+    - Relevant if the Notebook is not being submitted to a repository


Always relevant

kekoziar · 2020-09-09T22:55:16Z

@dbouquin IIRC, we're not saying to not have dependencies listed or citation information; there was concern regarding recommending very specific file types (CITATION.cff and CodeMeta.json) without appropriate explanation of and assistance to help create them.

I think it would be helpful to new curators who aren't familiar with python notebooks and these files to include a link to an example dataset that includes these files. Can you link one?

dbouquin · 2020-09-16T15:16:20Z

Do you think something like this would work? Not sure what you mean by dataset here. https://doi.org/10.5281/zenodo.3953146 (This is code that generates CodeMeta files for R packages— there's a codemeta.json file included)
Here's another random example from Zenodo: https://doi.org/10.5281/zenodo.2610844

kekoziar · 2020-09-16T15:52:52Z

While dataset may be used broadly, I mean dataset specific to this primer. That would be a Python notebook that is an example of the recommended curation level.

dbouquin · 2020-09-16T18:05:25Z

Got it. I know these are "data curation primers" it's just I would never refer to a Jupytner notebook as data so it confused me for a sec. After a very quick search, how about this: https://doi.org/10.5281/zenodo.3569768 And here's a random example on GitHub: https://github.com/donomii/throff-jupyter There's also a nice example on this project from Jupyter: https://github.com/jupyter/nbgrader

…

On Wed, Sep 16, 2020 at 11:53 AM K.E. Koziar ***@***.***> wrote: While dataset may be used broadly, I mean dataset specific to this primer. That would be a Python notebook that is an example of the recommended curation level. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#19 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABLS7VTG32LGK2VSB7WPZSTSGDNOLANCNFSM4O6P4MEA> .

kekoziar added 9 commits July 16, 2020 14:10

Update Jupyter Notebook version number

6d59309

Update Jupyter Notebook version number in the format overview table

Minor correction to background section

81738de

The guidance is provided by the Software Sustainability Institute (1), and funded by Jisc (2).

Expand/clarify Format Description section

3ada1c3

Clarified for curators unfamiliar with computer science terminology the relation between a kernel and programming language. Elaborated on the cell order and expectations of users (those who download a notebook)

Expand/annotate Minimally Required Files section

7d8c207

expand dependencies section to include other types of dependencies file. Annotate citation.cff Clarify that a container metafile is appropriate to request if used.

Fix style in File Requirements section

476fcf8

Update metadata requirements section

305fcc8

Added annotations and clarifications.

Key Curatorial Questions

a358973

Add clarifying question to help curator unfamiliar with code. Add examples of ipynb archived in data repositories. add/renumber associated end-notes.

Fix broken link in recommended reading

555466d

Add alt text for images

2ffe26c

Add title and alt text for decision tree images.

dbouquin reviewed Aug 12, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DCN July 2020 Hackathon updates to Jupyter Notebook Primer #19

DCN July 2020 Hackathon updates to Jupyter Notebook Primer #19

kekoziar commented Jul 17, 2020 •

edited

Loading

dbouquin Aug 12, 2020

dbouquin Aug 12, 2020

dbouquin Aug 12, 2020

dbouquin Aug 12, 2020

kekoziar commented Sep 9, 2020

dbouquin commented Sep 16, 2020

kekoziar commented Sep 16, 2020

dbouquin commented Sep 16, 2020 via email

DCN July 2020 Hackathon updates to Jupyter Notebook Primer #19

Are you sure you want to change the base?

DCN July 2020 Hackathon updates to Jupyter Notebook Primer #19

Conversation

kekoziar commented Jul 17, 2020 • edited Loading

dbouquin Aug 12, 2020

Choose a reason for hiding this comment

dbouquin Aug 12, 2020

Choose a reason for hiding this comment

dbouquin Aug 12, 2020

Choose a reason for hiding this comment

dbouquin Aug 12, 2020

Choose a reason for hiding this comment

kekoziar commented Sep 9, 2020

dbouquin commented Sep 16, 2020

kekoziar commented Sep 16, 2020

dbouquin commented Sep 16, 2020 via email

kekoziar commented Jul 17, 2020 •

edited

Loading