Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add JOSS paper. #43

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from
Open

Add JOSS paper. #43

wants to merge 2 commits into from

Conversation

vnmabus
Copy link
Owner

@vnmabus vnmabus commented Aug 31, 2024

Describe the proposed changes

This adds a paper to the repo, so that we can submit it to JOSS.

@trossi I had to rewrite part of the text that I had, as it was for an older version (prior to pyOpenSci submission). It can probably (almost surely, I would say) be improved, in case you have suggestions to do so.
Other things to take into account:

  • Please review your author info. I tried to add your ORCID and affiliation, but it may be incorrect.
  • Please add your acknowledgements (funding or whatever you need).
  • Please feel free to add the work in which you used the library at the end of the "Statement of need" section.
  • I left the section "Ongoing work" (name can be changed) for you to explain the changes you did.
  • You can compile locally using the Docker they provide (https://joss.readthedocs.io/en/latest/paper.html#docker). I also set up the Github action, to check the paper after you push (e.g. https://github.com/vnmabus/rdata/actions/runs/10648562090).

You can add your changes either as suggestions to this PR or as PRs against this branch.

Copy link
Contributor

@trossi trossi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vnmabus Thank you! I have added a description of the writing functionality and some general suggestions.

paper/paper.md Outdated Show resolved Hide resolved
paper/paper.md Outdated Show resolved Hide resolved
paper/paper.md Outdated Show resolved Hide resolved
paper/paper.md Outdated Show resolved Hide resolved
paper/paper.md Show resolved Hide resolved
paper/paper.bib Outdated Show resolved Hide resolved
paper/paper.bib Show resolved Hide resolved
paper/paper.md Outdated Show resolved Hide resolved
paper/paper.md Outdated Show resolved Hide resolved
paper/paper.md Outdated Show resolved Hide resolved
Copy link
Owner Author

@vnmabus vnmabus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with all your suggestions. I commit them and will re-read the article to see if more improvements need to be made. I suggest you to do the same if/when you have time.

paper/paper.bib Outdated Show resolved Hide resolved
paper/paper.md Outdated Show resolved Hide resolved
paper/paper.md Outdated Show resolved Hide resolved
paper/paper.md Outdated Show resolved Hide resolved
paper/paper.md Outdated Show resolved Hide resolved
paper/paper.md Outdated Show resolved Hide resolved
paper/paper.md Outdated Show resolved Hide resolved
author = {Fajardo, Otto},
year = {2024},
month = jul,
doi = {10.5281/zenodo.13132498},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
doi = {10.5281/zenodo.13132498},
publisher = {Zenodo},
doi = {10.5281/zenodo.7110169},

Here is the doi to "all versions" and adding Zenodo as publisher as for pandas.

Comment on lines +1 to +5
@misc{diaz-vico+ramos-carreno_2022_scikitdatasets,
title = {{{scikit-datasets}}: {{Scikit-learn-compatible}} Datasets},
author = {{D{\'i}az-Vico}, David and {Ramos-Carre{\~n}o}, Carlos},
year = {2022},
month = mar,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@misc{diaz-vico+ramos-carreno_2022_scikitdatasets,
title = {{{scikit-datasets}}: {{Scikit-learn-compatible}} Datasets},
author = {{D{\'i}az-Vico}, David and {Ramos-Carre{\~n}o}, Carlos},
year = {2022},
month = mar,
@misc{diaz-vico+ramos-carreno_2023_scikitdatasets,
title = {{{scikit-datasets}}: {{Scikit-learn-compatible}} Datasets},
author = {{D{\'i}az-Vico}, David and {Ramos-Carre{\~n}o}, Carlos},
year = {2023},
month = aug,

Updating the time stamp. (DOI is already to "all versions".)

Comment on lines +28 to +32
@software{pandasdevelopmentteam_2020_pandasdev,
title = {{{pandas-dev/pandas}}: {{pandas}}},
author = {Pandas Development Team},
year = {2020},
month = feb,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@software{pandasdevelopmentteam_2020_pandasdev,
title = {{{pandas-dev/pandas}}: {{pandas}}},
author = {Pandas Development Team},
year = {2020},
month = feb,
@software{pandasdevelopmentteam_2024_pandasdev,
title = {{{pandas-dev/pandas}}: {{pandas}}},
author = {{The Pandas Development Team}},
year = {2024},
month = apr,

Updating date and fixing the rendering of author. DOI is already to "all versions".

It has a permissive license and can be extended to support additional conversions from custom R classes.

The package `rdata` has been designed as a pure Python package with minimal dependencies, so that it can be easily integrated inside other libraries and applications.
It currently powers the functionality offered in the `scikit-datasets` package [@diaz-vico+ramos-carreno_2022_scikitdatasets] for loading datasets from the CRAN repository of R packages.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
It currently powers the functionality offered in the `scikit-datasets` package [@diaz-vico+ramos-carreno_2022_scikitdatasets] for loading datasets from the CRAN repository of R packages.
It currently powers the functionality offered in the `scikit-datasets` package [@diaz-vico+ramos-carreno_2023_scikitdatasets] for loading datasets from the CRAN repository of R packages.


Advanced users will probably require loading datasets which contain non standard S3 or S4 classes, translating each of them to a custom Python class.
This is easy to achieve using `rdata` by simply creating a constructor function that receives the converted object representation and its attributes, and returns a Python object of the desired type.
As an example, consider the following simple code that constructs a `Pandas` [@pandasdevelopmentteam_2020_pandasdev] `Categorical` object from the internal representation of an R `factor`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
As an example, consider the following simple code that constructs a `Pandas` [@pandasdevelopmentteam_2020_pandasdev] `Categorical` object from the internal representation of an R `factor`.
As an example, consider the following simple code that constructs a `Pandas` [@pandasdevelopmentteam_2024_pandasdev] `Categorical` object from the internal representation of an R `factor`.

paper/paper.bib Outdated Show resolved Hide resolved
@trossi
Copy link
Contributor

trossi commented Sep 11, 2024

I agree with all your suggestions. I commit them and will re-read the article to see if more improvements need to be made. I suggest you to do the same if/when you have time.

Thank you! I added the changes related to discussion on paper.bib. I re-read the article and I think it's in good state. I can have another look later this week.

Copy link
Contributor

@trossi trossi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a small suggestion including references to NumPy and Pandas publications.

publisher = {GitHub},
url = {https://github.com/rpy2/rpy2}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@article{harris+_2020_numpy,
title = {Array programming with {NumPy}},
author = {Charles R. Harris and K. Jarrod Millman and St{\'{e}}fan J.
van der Walt and Ralf Gommers and Pauli Virtanen and David
Cournapeau and Eric Wieser and Julian Taylor and Sebastian
Berg and Nathaniel J. Smith and Robert Kern and Matti Picus
and Stephan Hoyer and Marten H. van Kerkwijk and Matthew
Brett and Allan Haldane and Jaime Fern{\'{a}}ndez del
R{\'{i}}o and Mark Wiebe and Pearu Peterson and Pierre
G{\'{e}}rard-Marchant and Kevin Sheppard and Tyler Reddy and
Warren Weckesser and Hameer Abbasi and Christoph Gohlke and
Travis E. Oliphant},
year = {2020},
month = sep,
journal = {Nature},
volume = {585},
number = {7825},
pages = {357--362},
doi = {10.1038/s41586-020-2649-2},
}
@inproceedings{mckinney_2010_pandas,
author = {{W}es {M}c{K}inney},
title = {{D}ata {S}tructures for {S}tatistical {C}omputing in {P}ython},
booktitle = {{P}roceedings of the 9th {P}ython in {S}cience {C}onference},
pages = {56 - 61},
year = {2010},
editor = {{S}t\'efan van der {W}alt and {J}arrod {M}illman},
doi = {10.25080/Majora-92bf1922-00a},
}

I'd suggest to cite also NumPy and Pandas papers as suggested in https://numpy.org/citing-numpy/ and https://pandas.pydata.org/about/citing.html .

The license can also be a problem, as it is part of the GPL family and does not allow commercial use.

As existing solutions were unsuitable for our needs, the package `rdata` was developed to parse data in the RData format.
This is a small, extensible, efficient, and very complete implementation in pure Python of a RData parser, that is able to read and convert most datasets in the CRAN repository to equivalent Python objects.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This is a small, extensible, efficient, and very complete implementation in pure Python of a RData parser, that is able to read and convert most datasets in the CRAN repository to equivalent Python objects.
This is a small, extensible, efficient, and very complete implementation in pure Python of a RData parser, that is able to read and convert most datasets in the CRAN repository to equivalent Python objects, such as the built-in types of The Python Standard Library, NumPy arrays [@harris+_2020_numpy], or Pandas dataframes [@mckinney_2010_pandas; @pandasdevelopmentteam_2024_pandasdev].

Referring to the NumPy and Pandas papers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants