-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add JOSS paper. #43
base: develop
Are you sure you want to change the base?
Add JOSS paper. #43
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vnmabus Thank you! I have added a description of the writing functionality and some general suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with all your suggestions. I commit them and will re-read the article to see if more improvements need to be made. I suggest you to do the same if/when you have time.
Co-authored-by: Tuomas Rossi <[email protected]>
author = {Fajardo, Otto}, | ||
year = {2024}, | ||
month = jul, | ||
doi = {10.5281/zenodo.13132498}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doi = {10.5281/zenodo.13132498}, | |
publisher = {Zenodo}, | |
doi = {10.5281/zenodo.7110169}, |
Here is the doi to "all versions" and adding Zenodo as publisher as for pandas.
@misc{diaz-vico+ramos-carreno_2022_scikitdatasets, | ||
title = {{{scikit-datasets}}: {{Scikit-learn-compatible}} Datasets}, | ||
author = {{D{\'i}az-Vico}, David and {Ramos-Carre{\~n}o}, Carlos}, | ||
year = {2022}, | ||
month = mar, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@misc{diaz-vico+ramos-carreno_2022_scikitdatasets, | |
title = {{{scikit-datasets}}: {{Scikit-learn-compatible}} Datasets}, | |
author = {{D{\'i}az-Vico}, David and {Ramos-Carre{\~n}o}, Carlos}, | |
year = {2022}, | |
month = mar, | |
@misc{diaz-vico+ramos-carreno_2023_scikitdatasets, | |
title = {{{scikit-datasets}}: {{Scikit-learn-compatible}} Datasets}, | |
author = {{D{\'i}az-Vico}, David and {Ramos-Carre{\~n}o}, Carlos}, | |
year = {2023}, | |
month = aug, |
Updating the time stamp. (DOI is already to "all versions".)
@software{pandasdevelopmentteam_2020_pandasdev, | ||
title = {{{pandas-dev/pandas}}: {{pandas}}}, | ||
author = {Pandas Development Team}, | ||
year = {2020}, | ||
month = feb, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@software{pandasdevelopmentteam_2020_pandasdev, | |
title = {{{pandas-dev/pandas}}: {{pandas}}}, | |
author = {Pandas Development Team}, | |
year = {2020}, | |
month = feb, | |
@software{pandasdevelopmentteam_2024_pandasdev, | |
title = {{{pandas-dev/pandas}}: {{pandas}}}, | |
author = {{The Pandas Development Team}}, | |
year = {2024}, | |
month = apr, |
Updating date and fixing the rendering of author. DOI is already to "all versions".
It has a permissive license and can be extended to support additional conversions from custom R classes. | ||
|
||
The package `rdata` has been designed as a pure Python package with minimal dependencies, so that it can be easily integrated inside other libraries and applications. | ||
It currently powers the functionality offered in the `scikit-datasets` package [@diaz-vico+ramos-carreno_2022_scikitdatasets] for loading datasets from the CRAN repository of R packages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It currently powers the functionality offered in the `scikit-datasets` package [@diaz-vico+ramos-carreno_2022_scikitdatasets] for loading datasets from the CRAN repository of R packages. | |
It currently powers the functionality offered in the `scikit-datasets` package [@diaz-vico+ramos-carreno_2023_scikitdatasets] for loading datasets from the CRAN repository of R packages. |
|
||
Advanced users will probably require loading datasets which contain non standard S3 or S4 classes, translating each of them to a custom Python class. | ||
This is easy to achieve using `rdata` by simply creating a constructor function that receives the converted object representation and its attributes, and returns a Python object of the desired type. | ||
As an example, consider the following simple code that constructs a `Pandas` [@pandasdevelopmentteam_2020_pandasdev] `Categorical` object from the internal representation of an R `factor`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an example, consider the following simple code that constructs a `Pandas` [@pandasdevelopmentteam_2020_pandasdev] `Categorical` object from the internal representation of an R `factor`. | |
As an example, consider the following simple code that constructs a `Pandas` [@pandasdevelopmentteam_2024_pandasdev] `Categorical` object from the internal representation of an R `factor`. |
Thank you! I added the changes related to discussion on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a small suggestion including references to NumPy and Pandas publications.
publisher = {GitHub}, | ||
url = {https://github.com/rpy2/rpy2} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@article{harris+_2020_numpy, | |
title = {Array programming with {NumPy}}, | |
author = {Charles R. Harris and K. Jarrod Millman and St{\'{e}}fan J. | |
van der Walt and Ralf Gommers and Pauli Virtanen and David | |
Cournapeau and Eric Wieser and Julian Taylor and Sebastian | |
Berg and Nathaniel J. Smith and Robert Kern and Matti Picus | |
and Stephan Hoyer and Marten H. van Kerkwijk and Matthew | |
Brett and Allan Haldane and Jaime Fern{\'{a}}ndez del | |
R{\'{i}}o and Mark Wiebe and Pearu Peterson and Pierre | |
G{\'{e}}rard-Marchant and Kevin Sheppard and Tyler Reddy and | |
Warren Weckesser and Hameer Abbasi and Christoph Gohlke and | |
Travis E. Oliphant}, | |
year = {2020}, | |
month = sep, | |
journal = {Nature}, | |
volume = {585}, | |
number = {7825}, | |
pages = {357--362}, | |
doi = {10.1038/s41586-020-2649-2}, | |
} | |
@inproceedings{mckinney_2010_pandas, | |
author = {{W}es {M}c{K}inney}, | |
title = {{D}ata {S}tructures for {S}tatistical {C}omputing in {P}ython}, | |
booktitle = {{P}roceedings of the 9th {P}ython in {S}cience {C}onference}, | |
pages = {56 - 61}, | |
year = {2010}, | |
editor = {{S}t\'efan van der {W}alt and {J}arrod {M}illman}, | |
doi = {10.25080/Majora-92bf1922-00a}, | |
} | |
I'd suggest to cite also NumPy and Pandas papers as suggested in https://numpy.org/citing-numpy/ and https://pandas.pydata.org/about/citing.html .
The license can also be a problem, as it is part of the GPL family and does not allow commercial use. | ||
|
||
As existing solutions were unsuitable for our needs, the package `rdata` was developed to parse data in the RData format. | ||
This is a small, extensible, efficient, and very complete implementation in pure Python of a RData parser, that is able to read and convert most datasets in the CRAN repository to equivalent Python objects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a small, extensible, efficient, and very complete implementation in pure Python of a RData parser, that is able to read and convert most datasets in the CRAN repository to equivalent Python objects. | |
This is a small, extensible, efficient, and very complete implementation in pure Python of a RData parser, that is able to read and convert most datasets in the CRAN repository to equivalent Python objects, such as the built-in types of The Python Standard Library, NumPy arrays [@harris+_2020_numpy], or Pandas dataframes [@mckinney_2010_pandas; @pandasdevelopmentteam_2024_pandasdev]. |
Referring to the NumPy and Pandas papers.
Describe the proposed changes
This adds a paper to the repo, so that we can submit it to JOSS.
@trossi I had to rewrite part of the text that I had, as it was for an older version (prior to pyOpenSci submission). It can probably (almost surely, I would say) be improved, in case you have suggestions to do so.
Other things to take into account:
You can add your changes either as suggestions to this PR or as PRs against this branch.