Add JOSS paper. #43

vnmabus · 2024-08-31T22:17:05Z

Describe the proposed changes

This adds a paper to the repo, so that we can submit it to JOSS.

@trossi I had to rewrite part of the text that I had, as it was for an older version (prior to pyOpenSci submission). It can probably (almost surely, I would say) be improved, in case you have suggestions to do so.
Other things to take into account:

Please review your author info. I tried to add your ORCID and affiliation, but it may be incorrect.
Please add your acknowledgements (funding or whatever you need).
Please feel free to add the work in which you used the library at the end of the "Statement of need" section.
I left the section "Ongoing work" (name can be changed) for you to explain the changes you did.
You can compile locally using the Docker they provide (https://joss.readthedocs.io/en/latest/paper.html#docker). I also set up the Github action, to check the paper after you push (e.g. https://github.com/vnmabus/rdata/actions/runs/10648562090).

You can add your changes either as suggestions to this PR or as PRs against this branch.

trossi

@vnmabus Thank you! I have added a description of the writing functionality and some general suggestions.

paper/paper.md

paper/paper.bib

paper/paper.md

vnmabus

I agree with all your suggestions. I commit them and will re-read the article to see if more improvements need to be made. I suggest you to do the same if/when you have time.

paper/paper.bib

paper/paper.md

Co-authored-by: Tuomas Rossi <[email protected]>

trossi · 2024-09-11T12:19:06Z

paper/paper.bib

+  author = {Fajardo, Otto},
+  year = {2024},
+  month = jul,
+  doi = {10.5281/zenodo.13132498},


Suggested change

doi = {10.5281/zenodo.13132498},

publisher = {Zenodo},

doi = {10.5281/zenodo.7110169},

Here is the doi to "all versions" and adding Zenodo as publisher as for pandas.

trossi · 2024-09-11T12:21:25Z

paper/paper.bib

+@misc{diaz-vico+ramos-carreno_2022_scikitdatasets,
+  title = {{{scikit-datasets}}: {{Scikit-learn-compatible}} Datasets},
+  author = {{D{\'i}az-Vico}, David and {Ramos-Carre{\~n}o}, Carlos},
+  year = {2022},
+  month = mar,


Suggested change

@misc{diaz-vico+ramos-carreno_2022_scikitdatasets,

title = {{{scikit-datasets}}: {{Scikit-learn-compatible}} Datasets},

author = {{D{\'i}az-Vico}, David and {Ramos-Carre{\~n}o}, Carlos},

year = {2022},

month = mar,

@misc{diaz-vico+ramos-carreno_2023_scikitdatasets,

title = {{{scikit-datasets}}: {{Scikit-learn-compatible}} Datasets},

author = {{D{\'i}az-Vico}, David and {Ramos-Carre{\~n}o}, Carlos},

year = {2023},

month = aug,

Updating the time stamp. (DOI is already to "all versions".)

trossi · 2024-09-11T12:22:51Z

paper/paper.bib

+@software{pandasdevelopmentteam_2020_pandasdev,
+  title = {{{pandas-dev/pandas}}: {{pandas}}},
+  author = {Pandas Development Team},
+  year = {2020},
+  month = feb,


Suggested change

@software{pandasdevelopmentteam_2020_pandasdev,

title = {{{pandas-dev/pandas}}: {{pandas}}},

author = {Pandas Development Team},

year = {2020},

month = feb,

@software{pandasdevelopmentteam_2024_pandasdev,

title = {{{pandas-dev/pandas}}: {{pandas}}},

author = {{The Pandas Development Team}},

year = {2024},

month = apr,

Updating date and fixing the rendering of author. DOI is already to "all versions".

trossi · 2024-09-11T12:23:10Z

paper/paper.md

+It has a permissive license and can be extended to support additional conversions from custom R classes.
+
+The package `rdata` has been designed as a pure Python package with minimal dependencies, so that it can be easily integrated inside other libraries and applications.
+It currently powers the functionality offered in the `scikit-datasets` package [@diaz-vico+ramos-carreno_2022_scikitdatasets] for loading datasets from the CRAN repository of R packages.


Suggested change

It currently powers the functionality offered in the `scikit-datasets` package [@diaz-vico+ramos-carreno_2022_scikitdatasets] for loading datasets from the CRAN repository of R packages.

It currently powers the functionality offered in the `scikit-datasets` package [@diaz-vico+ramos-carreno_2023_scikitdatasets] for loading datasets from the CRAN repository of R packages.

trossi · 2024-09-11T12:23:48Z

paper/paper.md

+
+Advanced users will probably require loading datasets which contain non standard S3 or S4 classes, translating each of them to a custom Python class.
+This is easy to achieve using `rdata` by simply creating a constructor function that receives the converted object representation and its attributes, and returns a Python object of the desired type.
+As an example, consider the following simple code that constructs a `Pandas` [@pandasdevelopmentteam_2020_pandasdev] `Categorical` object from the internal representation of an R `factor`.


Suggested change

As an example, consider the following simple code that constructs a `Pandas` [@pandasdevelopmentteam_2020_pandasdev] `Categorical` object from the internal representation of an R `factor`.

As an example, consider the following simple code that constructs a `Pandas` [@pandasdevelopmentteam_2024_pandasdev] `Categorical` object from the internal representation of an R `factor`.

paper/paper.bib

trossi · 2024-09-11T12:34:41Z

I agree with all your suggestions. I commit them and will re-read the article to see if more improvements need to be made. I suggest you to do the same if/when you have time.

Thank you! I added the changes related to discussion on paper.bib. I re-read the article and I think it's in good state. I can have another look later this week.

trossi

I added a small suggestion including references to NumPy and Pandas publications.

trossi · 2024-09-16T09:06:43Z

paper/paper.bib

+  publisher = {GitHub},
+  url = {https://github.com/rpy2/rpy2}
+}
+


Suggested change

@article{harris+_2020_numpy,

title = {Array programming with {NumPy}},

author = {Charles R. Harris and K. Jarrod Millman and St{\'{e}}fan J.

van der Walt and Ralf Gommers and Pauli Virtanen and David

Cournapeau and Eric Wieser and Julian Taylor and Sebastian

Berg and Nathaniel J. Smith and Robert Kern and Matti Picus

and Stephan Hoyer and Marten H. van Kerkwijk and Matthew

Brett and Allan Haldane and Jaime Fern{\'{a}}ndez del

R{\'{i}}o and Mark Wiebe and Pearu Peterson and Pierre

G{\'{e}}rard-Marchant and Kevin Sheppard and Tyler Reddy and

Warren Weckesser and Hameer Abbasi and Christoph Gohlke and

Travis E. Oliphant},

year = {2020},

month = sep,

journal = {Nature},

volume = {585},

number = {7825},

pages = {357--362},

doi = {10.1038/s41586-020-2649-2},

}

@inproceedings{mckinney_2010_pandas,

author = {{W}es {M}c{K}inney},

title = {{D}ata {S}tructures for {S}tatistical {C}omputing in {P}ython},

booktitle = {{P}roceedings of the 9th {P}ython in {S}cience {C}onference},

pages = {56 - 61},

year = {2010},

editor = {{S}t\'efan van der {W}alt and {J}arrod {M}illman},

doi = {10.25080/Majora-92bf1922-00a},

}

I'd suggest to cite also NumPy and Pandas papers as suggested in https://numpy.org/citing-numpy/ and https://pandas.pydata.org/about/citing.html .

trossi · 2024-09-16T09:09:33Z

paper/paper.md

+The license can also be a problem, as it is part of the GPL family and does not allow commercial use.
+
+As existing solutions were unsuitable for our needs, the package `rdata` was developed to parse data in the RData format.
+This is a small, extensible, efficient, and very complete implementation in pure Python of a RData parser, that is able to read and convert most datasets in the CRAN repository to equivalent Python objects.


Suggested change

This is a small, extensible, efficient, and very complete implementation in pure Python of a RData parser, that is able to read and convert most datasets in the CRAN repository to equivalent Python objects.

This is a small, extensible, efficient, and very complete implementation in pure Python of a RData parser, that is able to read and convert most datasets in the CRAN repository to equivalent Python objects, such as the built-in types of The Python Standard Library, NumPy arrays [@harris+_2020_numpy], or Pandas dataframes [@mckinney_2010_pandas; @pandasdevelopmentteam_2024_pandasdev].

Referring to the NumPy and Pandas papers.

First version of the paper.

85d5672

trossi reviewed Sep 4, 2024

View reviewed changes

vnmabus commented Sep 7, 2024

View reviewed changes

Apply suggestions from code review

e074b3f

Co-authored-by: Tuomas Rossi <[email protected]>

trossi reviewed Sep 11, 2024

View reviewed changes

trossi reviewed Sep 16, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add JOSS paper. #43

Add JOSS paper. #43

vnmabus commented Aug 31, 2024

trossi left a comment

vnmabus left a comment

trossi Sep 11, 2024

trossi Sep 11, 2024

trossi Sep 11, 2024

trossi Sep 11, 2024

trossi Sep 11, 2024

trossi commented Sep 11, 2024

trossi left a comment

trossi Sep 16, 2024

trossi Sep 16, 2024

	doi = {10.5281/zenodo.13132498},
	publisher = {Zenodo},
	doi = {10.5281/zenodo.7110169},

	It currently powers the functionality offered in the `scikit-datasets` package [@diaz-vico+ramos-carreno_2022_scikitdatasets] for loading datasets from the CRAN repository of R packages.
	It currently powers the functionality offered in the `scikit-datasets` package [@diaz-vico+ramos-carreno_2023_scikitdatasets] for loading datasets from the CRAN repository of R packages.

	As an example, consider the following simple code that constructs a `Pandas` [@pandasdevelopmentteam_2020_pandasdev] `Categorical` object from the internal representation of an R `factor`.
	As an example, consider the following simple code that constructs a `Pandas` [@pandasdevelopmentteam_2024_pandasdev] `Categorical` object from the internal representation of an R `factor`.

+@article{harris+_2020_numpy,
+  title = {Array programming with {NumPy}},
+  author = {Charles R. Harris and K. Jarrod Millman and St{\'{e}}fan J.
+            van der Walt and Ralf Gommers and Pauli Virtanen and David
+            Cournapeau and Eric Wieser and Julian Taylor and Sebastian
+            Berg and Nathaniel J. Smith and Robert Kern and Matti Picus
+            and Stephan Hoyer and Marten H. van Kerkwijk and Matthew
+            Brett and Allan Haldane and Jaime Fern{\'{a}}ndez del
+            R{\'{i}}o and Mark Wiebe and Pearu Peterson and Pierre
+            G{\'{e}}rard-Marchant and Kevin Sheppard and Tyler Reddy and
+            Warren Weckesser and Hameer Abbasi and Christoph Gohlke and
+            Travis E. Oliphant},
+  year = {2020},
+  month = sep,
+  journal = {Nature},
+  volume = {585},
+  number = {7825},
+  pages = {357--362},
+  doi = {10.1038/s41586-020-2649-2},
+}
+@inproceedings{mckinney_2010_pandas,
+  author = {{W}es {M}c{K}inney},
+  title = {{D}ata {S}tructures for {S}tatistical {C}omputing in {P}ython},
+  booktitle = {{P}roceedings of the 9th {P}ython in {S}cience {C}onference},
+  pages = {56 - 61},
+  year = {2010},
+  editor = {{S}t\'efan van der {W}alt and {J}arrod {M}illman},
+  doi = {10.25080/Majora-92bf1922-00a},
+}

	This is a small, extensible, efficient, and very complete implementation in pure Python of a RData parser, that is able to read and convert most datasets in the CRAN repository to equivalent Python objects.
	This is a small, extensible, efficient, and very complete implementation in pure Python of a RData parser, that is able to read and convert most datasets in the CRAN repository to equivalent Python objects, such as the built-in types of The Python Standard Library, NumPy arrays [@harris+_2020_numpy], or Pandas dataframes [@mckinney_2010_pandas; @pandasdevelopmentteam_2024_pandasdev].

Add JOSS paper. #43

Are you sure you want to change the base?

Add JOSS paper. #43

Conversation

vnmabus commented Aug 31, 2024

Describe the proposed changes

trossi left a comment

Choose a reason for hiding this comment

vnmabus left a comment

Choose a reason for hiding this comment

trossi Sep 11, 2024

Choose a reason for hiding this comment

trossi Sep 11, 2024

Choose a reason for hiding this comment

trossi Sep 11, 2024

Choose a reason for hiding this comment

trossi Sep 11, 2024

Choose a reason for hiding this comment

trossi Sep 11, 2024

Choose a reason for hiding this comment

trossi commented Sep 11, 2024

trossi left a comment

Choose a reason for hiding this comment

trossi Sep 16, 2024

Choose a reason for hiding this comment

trossi Sep 16, 2024

Choose a reason for hiding this comment