Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QuadratiK Submission #180

Open
14 of 32 tasks
rmj3197 opened this issue May 13, 2024 · 11 comments
Open
14 of 32 tasks

QuadratiK Submission #180

rmj3197 opened this issue May 13, 2024 · 11 comments

Comments

@rmj3197
Copy link

rmj3197 commented May 13, 2024

Submitting Author: Raktim Mukhopadhyay (@rmj3197)
All current maintainers: @giovsaraceno
Package Name: QuadratiK
One-Line Description of Package: QuadratiK includes test for multivariate normality, test for uniformity on the sphere, non-parametric two- and k-sample tests, random generation of points from the Poisson kernel-based density and clustering algorithm for spherical data.
Repository Link: https://github.com/rmj3197/QuadratiK
Version submitted: 1.1.0
EIC: @Batalex
Editor: @isabelizimm
Reviewer 1: @acolum
Reviewer 2: @ab93
Archive: TBD
JOSS DOI: TBD
Version accepted: TBD
Date accepted (month/day/year): TBD


Code of Conduct & Commitment to Maintain Package

Description

We introduce the QuadratiK package that incorporates innovative data analysis methodologies. The presented software, implemented in both R and Python, offers a comprehensive set of novel goodness-of-fit tests and clustering techniques using kernel-based quadratic distances. Our software implements one, two and k-sample tests for goodness of fit, providing an efficient and mathematically sound way to assess the fit of probability distributions. Expanded capabilities of our software include supporting tests for uniformity on the $d$-dimensional Sphere based on Poisson kernel densities, and algorithms for generating random samples from Poisson kernel densities. Particularly noteworthy is the incorporation of a unique clustering algorithm specifically tailored for spherical data that leverages a mixture of Poisson kernel-based densities on the sphere. Alongside this, our software includes additional graphical functions, aiding the users in validating, as well as visualizing and representing clustering results. This enhances interpretability and usability of the analysis. In summary, our R and Python packages serve as a powerful suite of tools, offering researchers and practitioners the means to delve deeper into their data, draw robust inference, and conduct potentially impactful analyses and inference across a wide array of disciplines.

Scope

  • Please indicate which category or categories.
    Check out our package scope page to learn more about our
    scope. (If you are unsure of which category you fit, we suggest you make a pre-submission inquiry):

    • Data retrieval
    • Data extraction
    • Data processing/munging
    • Data deposition
    • Data validation and testing
    • Data visualization1
    • Workflow automation
    • Citation management and bibliometrics
    • Scientific software wrappers
    • Database interoperability

Domain Specific

  • Geospatial
  • Education

Community Partnerships

If your package is associated with an existing community please check below:

  • For all submissions, explain how the and why the package falls under the categories you indicated above. In your explanation, please address the following points (briefly, 1-2 sentences for each):

    • Who is the target audience and what are scientific applications of this package?

      • The QuadratiK package offers robust tools for goodness-of-fit testing, a fundamental aspect in statistical analysis, where accurately assessing the fit of probability distributions is essential. This is especially critical in research domains where model accuracy has direct implications on conclusions and further research directions.
        • Spherical data structures are common in fields such as biology, geosciences and astronomy, where data points are naturally mapped to a sphere. QuadratiK provides a tailored approach to effectively handle and interpret these data.
        • This package is also of particular interest to professionals in health and biological sciences, where understanding and interpreting spherical data can be crucial in studies ranging from molecular biology to epidemiology and public health.
    • Are there other Python packages that accomplish the same thing? If so, how does yours differ?

      • SciPy and hyppo also have collections of goodness-of-fit test functionalities. Our interest focuses on tests that are based on the family of kernel-based quadratic distances. The kernels we use are diffusion kernels, that is, probability distributions that depend on a tuning parameter and satisfy the convolution property. We also implement the Poisson kernel-based tests for uniformity on the d-dimensional sphere.

      • We are aware of only a limited number of Python libraries that offer spherical clustering capabilities, such as spherecluster (last updated in November 2018) and soyclustering (last updated in May 2020). spherecluster implements Spherical K-Means and clustering using von Mises Fisher distributions as proposed in "Banerjee, Arindam, et al. "Clustering on the Unit Hypersphere using von Mises-Fisher Distributions." Journal of Machine Learning Research 6.9 (2005).". soyclustering implements spherical k-means for document clustering which has been proposed in Kim, Hyunjoong, Han Kyul Kim, and Sungzoon Cho. "Improving spherical k-means for document clustering: Fast initialization, sparse centroid projection, and efficient cluster labeling." Expert Systems with Applications 150 (2020): 113288.

      • In summary, there are fundamental differences between QuadratiK and existing packages that are as follows -

        • The GOF tests are U-statistics based on centered kernels. The concept and methodology of centering is unique to our methods and is not part of the methods appearing in existing packages.
        • An algorithm for connecting the tuning parameter with the statistical properties of the test, namely power and degrees of freedom (DOF) is provided. This feature differentiates our novel methods from methods in other packages.
        • A new clustering algorithm for data that reside on the sphere using the Poisson kernel-based densities is offered. This aspect is not a feature of the existing packages.
        • We also offer algorithms for generating random samples from Poisson kernel-based densities. This capability is also unique to our package.
      • We also implement a GUI to enable interaction with the software in a non-programmatic manner using the streamlit library. We have not found any python package that implements a GUI for the above described tasks.

    • If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted:
      Please see our pre-submission enquiry for this submission at -
      Pre-submission Inquiry for QuadratiK #168

Technical checks

For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:

  • does not violate the Terms of Service of any service it interacts with.
  • uses an OSI approved license.
  • contains a README with instructions for installing the development version.
  • includes documentation with examples for all functions.
  • contains a tutorial with examples of its essential functions and uses.
  • has a test suite.
  • has continuous integration setup, such as GitHub Actions CircleCI, and/or others.

Publication Options

JOSS Checks
  • The package has an obvious research application according to JOSS's definition in their submission requirements. Be aware that completing the pyOpenSci review process does not guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS.
  • The package is not a "minor utility" as defined by JOSS's submission requirements: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria.
  • The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
  • The package is deposited in a long-term repository with the DOI:

Note: JOSS accepts our review as theirs. You will NOT need to go through another full review. JOSS will only review your paper.md file. Be sure to link to this pyOpenSci issue when a JOSS issue is opened for your package. Also be sure to tell the JOSS editor that this is a pyOpenSci reviewed package once you reach this step.

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.

  • Yes I am OK with reviewers submitting requested changes as issues to my repo. Reviewers will then link to the issues in their submitted review.

Confirm each of the following by checking the box.

  • I have read the author guide.
  • I expect to maintain this package for at least 2 years and can help find a replacement for the maintainer (team) if needed.

Please fill out our survey

P.S. Have feedback/comments about our review process? Leave a comment here

Editor and Review Templates

The editor template can be found here.

The review template can be found here.

Footnotes

  1. Please fill out a pre-submission inquiry before submitting a data visualization package.

@Batalex
Copy link
Contributor

Batalex commented May 25, 2024

Editor in Chief checks

Hi there! Thank you for submitting your package for pyOpenSci
review. Below are the basic checks that your package needs to pass
to begin our review. If some of these are missing, we will ask you
to work on them before the review process begins.

Please check our Python packaging guide for more information on the elements
below.

  • Installation The package can be installed from a community repository such as PyPI (preferred), and/or a community channel on conda (e.g. conda-forge, bioconda).
    • The package imports properly into a standard Python environment import package.
      The package installation does not install the dependencies
  • Fit The package meets criteria for fit and overlap.
  • Documentation The package has sufficient online documentation to allow us to evaluate package function and scope without installing the package. This includes:
    • User-facing documentation that overviews how to install and start using the package.
    • Short tutorials that help a user understand how to use the package and what it can do for them.
    • API documentation (documentation for your code's functions, classes, methods and attributes): this includes clearly written docstrings with variables defined using a standard docstring format.
  • Core GitHub repository Files
    • README The package has a README.md file with clear explanation of what the package does, instructions on how to install it, and a link to development instructions.
    • Contributing File The package has a CONTRIBUTING.md file that details how to install and contribute to the package.
    • Code of Conduct The package has a CODE_OF_CONDUCT.md file.
    • License The package has an OSI approved license.
      NOTE: We prefer that you have development instructions in your documentation too.
  • Issue Submission Documentation All of the information is filled out in the YAML header of the issue (located at the top of the issue template).
  • Automated tests Package has a testing suite and is tested via a Continuous Integration service.
  • Repository The repository link resolves correctly.
  • Package overlap The package doesn't entirely overlap with the functionality of other packages that have already been submitted to pyOpenSci.
  • Archive (JOSS only, may be post-review): The repository DOI resolves correctly.
  • Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

  • Initial onboarding survey was filled out
    We appreciate each maintainer of the package filling out this survey individually. 🙌
    Thank you authors in advance for setting aside five to ten minutes to do this. It truly helps our organization. 🙌


Editor comments

Nice submission, I'll get started on finding the perfect editor for Quadratik!

@Batalex
Copy link
Contributor

Batalex commented May 31, 2024

Hey @rmj3197,
I am super excited to introduce @isabelizimm as the editor for this submission! Isabel will be your privileged point of contact from now on, though you are welcome to ask me anything during the process.
Please note that she will not get started until the week after June 7th.

Happy reviewing!

@isabelizimm
Copy link
Contributor

Hello there! Happy to be ushering this package through 👋 I'm going to go ahead and start looking for reviewers; I'll plan to touch base when I have reviewers lined up OR in 2 weeks (say, June 24), whichever comes first.

@rmj3197
Copy link
Author

rmj3197 commented Jun 13, 2024

Hello @isabelizimm,

Thank you so much for the update and for taking the time to review our package. I look forward to hearing from you soon.

@isabelizimm
Copy link
Contributor

Checking in! I have one reviewer ready (yay!) and have reached out to some possibilities for a second. I'll keep you updated when I know more 👍

@rmj3197
Copy link
Author

rmj3197 commented Jun 26, 2024

Hello @isabelizimm , thank you very much for the update!

@isabelizimm
Copy link
Contributor

isabelizimm commented Jul 1, 2024

Welcome welcome to our fearless reviewers: @acolum and @ab93 👋 Thank you SO MUCH for volunteering to review for pyOpenSci! You are two people with awesome math-y, stats-y, ML-y, Python-y backgrounds, which is perfect for this package, and I am looking forward to learning from you through this review process 🌻

Please fill out our pre-review survey

Before beginning your review, please fill out our pre-review survey. This helps us improve all aspects of our review and better understand our community. No personal data will be shared from this survey - it will only be used in an aggregated format by our Executive Director to improve our processes and programs.

The following resources will help you complete your review:

  1. Here is the reviewers guide. This guide contains all of the steps and information needed to complete your review.
  2. Here is the review template that you will need to fill out and submit
    here as a comment, once your review is complete. You can look at other issues in this repository for examples of what this might look like.

Please get in touch with any questions or concerns! Your review is due 3 weeks from now, which is July 19. New review date: Aug 2. Please let me know if this date does not work for you.

tldr;

Reviewers: @acolum and @ab93
Due date [NOTE: deadline extended]: August 2

@acolum
Copy link

acolum commented Jul 14, 2024

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README.
  • Installation instructions: for the development version of the package and any non-standard dependencies in README.
  • Vignette(s) demonstrating major functionality that runs successfully locally.
  • Function Documentation: for all user-facing functions.
  • Examples for all user-facing functions.
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING.
  • Metadata including author(s), author e-mail(s), a url, and any other relevant metadata e.g., in a pyproject.toml file or elsewhere.

Readme file requirements
The package meets the readme requirements below:

  • Package has a README.md file in the root directory.

The README should include, from top to bottom:

  • The package name
  • Badges for:
    • Continuous integration and test coverage,
    • Docs building (if you have a documentation website),
    • A repostatus.org badge,
    • Python versions supported,
    • Current package version (on PyPI / Conda).

NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)

  • Short description of package goals.
  • Package installation instructions
  • Any additional setup required to use the package (authentication tokens, etc.)
  • Descriptive links to all vignettes. If the package is small, there may only be a need for one vignette which could be placed in the README.md file.
    • Brief demonstration of package usage (as it makes sense - links to vignettes could also suffice here if package description is clear)
  • Link to your documentation website.
  • If applicable, how the package compares to other similar packages and/or how it relates to other packages in the scientific ecosystem.
  • Citation information

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole.
Package structure should follow general community best-practices. In general please consider whether:

  • Package documentation is clear and easy to find and use.
  • The need for the package is clear
  • All functions have documentation and associated examples for use
  • The package is easy to install

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests:
    • All tests pass on the reviewer's local machine for the package version submitted by the author. Ideally this should be a tagged version making it easy for reviewers to install.
    • Tests cover essential functions of the package and a reasonable range of inputs and conditions.
  • Continuous Integration: Has continuous integration setup (We suggest using Github actions but any CI platform is acceptable for review)
  • Packaging guidelines: The package conforms to the pyOpenSci packaging guidelines.
    A few notable highlights to look at:
    • Package supports modern versions of Python and not End of life versions.
    • Code format is standard throughout package and follows PEP 8 guidelines (CI tests for linting pass)

For packages also submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

  • A short summary describing the high-level functionality of the software
  • Authors: A list of authors with their affiliations
  • A statement of need clearly stating problems the software is designed to solve and its target audience.
  • References: With DOIs for all those that have one (e.g. papers, datasets, software).

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 2.5


Review Comments

Overall, this submission was well done and followed most Python package development and documentation best practices. I found no major issues with the package's documentation, usability, and functionality, but I've outlined a few minor issues below.

Potential issues that could be fixed:

  • The README file in the root directory of the package's repository has a .rst file extension instead of a .md file extension. Since I'm a first-time reviewer, I'm not sure how important this difference is to pyOpenSci.
  • This wouldn't be required for final approval, but I think it would be helpful for potential users of your software to list similar packages in the R and Python scientific ecosystems in the README file. For example, what other packages implement similar methods or have similar overall functionality?

Minor issues that need fixing:

  • The README file is missing a repo status badge. With the addition of this badge (and potentially others, like the pyOpenSci peer-review badge, in the future), I'd recommend organizing the badges in your README like the badges are organized in this rOpenSci package's README.
  • Although the README file links to the primary vignette with examples of how to use the package, the other vignettes in the documentation should be linked to here as well.
  • Currently, there are a few citations in the bibliography in the README file, but it's not clear which should be cited when citing the software package. More clarity and/or a BibTeX citation is needed. This guidance from GitHub may help.

@ab93
Copy link

ab93 commented Aug 2, 2024

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README.
  • Installation instructions: for the development version of the package and any non-standard dependencies in README.
  • Vignette(s) demonstrating major functionality that runs successfully locally.
  • Function Documentation: for all user-facing functions.
  • Examples for all user-facing functions.
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING.
  • Metadata including author(s), author e-mail(s), a url, and any other relevant metadata e.g., in a pyproject.toml file or elsewhere.

Readme file requirements
The package meets the readme requirements below:

  • Package has a README.md file in the root directory.

The README should include, from top to bottom:

  • The package name
  • Badges for:
    • Continuous integration and test coverage,
    • Docs building (if you have a documentation website),
    • A repostatus.org badge,
    • Python versions supported,
    • Current package version (on PyPI / Conda).

NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)

  • Short description of package goals.
  • Package installation instructions
  • Any additional setup required to use the package (authentication tokens, etc.)
  • Descriptive links to all vignettes. If the package is small, there may only be a need for one vignette which could be placed in the README.md file.
    • Brief demonstration of package usage (as it makes sense - links to vignettes could also suffice here if package description is clear)
  • Link to your documentation website.
  • If applicable, how the package compares to other similar packages and/or how it relates to other packages in the scientific ecosystem.
  • Citation information

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole.
Package structure should follow general community best-practices. In general please consider whether:

  • Package documentation is clear and easy to find and use.
  • The need for the package is clear
  • All functions have documentation and associated examples for use
  • The package is easy to install

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests:
    • All tests pass on the reviewer's local machine for the package version submitted by the author. Ideally this should be a tagged version making it easy for reviewers to install.
    • Tests cover essential functions of the package and a reasonable range of inputs and conditions.
  • Continuous Integration: Has continuous integration setup (We suggest using Github actions but any CI platform is acceptable for review)
  • Packaging guidelines: The package conforms to the pyOpenSci packaging guidelines.
    A few notable highlights to look at:
    • Package supports modern versions of Python and not End of life versions.
    • Code format is standard throughout package and follows PEP 8 guidelines (CI tests for linting pass)

For packages also submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

  • A short summary describing the high-level functionality of the software
  • Authors: A list of authors with their affiliations
  • A statement of need clearly stating problems the software is designed to solve and its target audience.
  • References: With DOIs for all those that have one (e.g. papers, datasets, software).

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 3 hours


Review Comments

Great submission overall. Documentation is good, and I like the user guide as well.
There are a few tweaks, suggestions and adjustments that I can add here.

Packaging and CI

  • The package uses the Poetry package manager, which is great. As a result, it would be nice to include a small guide to install the package for development using poetry.
    e.g. poetry install
  • In the package dependencies the Python version requirement is given as a broad python = "^3.9, !=3.9.7".
    This can cause some downstream applications using this package to break if any of its dependencies do not support a new Python version.
    So my recommendation will be to only allow Python versions that the package supports, i.e. something like
    python = ">=3.9, <3.13"
  • It would be great to have some linting tools in the CI pipeline. Ruff is a great option, and will eliminate any anti-patterns in the coding practices
  • I see the Black format badge, but I don't see the CI having the black format check. Adding that would be great

Code Practices, which again can be identified using a linter like Ruff

  • Python typing is missing in the code, and would be a very nice to have, e.g. _pkbc.py
  • Raising the base Exception class is not the best practice, a more fine-grained exception raising would be great
  • A good practice is also to make sure all instance variables are first defined in the __init__() function, e.g. in the PKBC class,
    self.dat only gets initialized in fit()
  • Adding __slots__ would be nice, as it reduces memory footprint

@isabelizimm
Copy link
Contributor

Thank you so much to our reviewers @acolum and @ab93 for your thoughts on QuadratiK!🌷 The next step here is for the author to implement the changes suggested by reviewers. This piece can involve a bit of back and forth, @rmj3197, please let us know in this thread if you have questions about the review. Otherwise, post here when the reviews have been addressed and the reviewers will look over the updates and give their final approval!

.rst file extension instead of a .md file extension.

This is okay! As long as there is a README file there, we are good to go 😄

@rmj3197
Copy link
Author

rmj3197 commented Aug 7, 2024

Thank you @acolum and @ab93 for your valuable suggestions and comments. Thank you @isabelizimm for your help and communication. I will address the changes and update you once they are completed. Thank you all for your time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: under-review
Development

No branches or pull requests

7 participants