Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding scPRINT and its suite of packages #195

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

jkobject
Copy link

@jkobject jkobject commented Aug 12, 2024

Sorry to add 4 tools at once but they are quite related, and it seems simpler to review them all at once instead of making 4 distinct but related PRs

Name of the tool:

  • scPRINT
  • scDataLoader
  • benGRN
  • GRnnData

Short description:

  • A single-cell foundation model for Gene Network inference and more.
  • A dataloader to work with large single-cell databases.
  • A benchmarking tool for gene network inference from scRNAseq.
  • Gene Network-enhanced AnnData with utils and conversions.

How does the package use scverse data structures (please describe in a few sentences):

  • scPRINT can be called directly in command line or in python on an anndata object. it also outputs updated anndatas.
  • scDataLoader can work on one to many h5ad/zarr saved anndatas and dataload them in parallel. It serves as an alternative and might be more efficient than anndata's dataloader.
  • GRnnData is AnnData with some more utils and loading mechanism when interfacing with tools producing gene networks. We made it when stumbling upon the many ways people were using gene networks in single cell omics
  • benGRN is the "benchmarking side" of GRnnData. It is like scIB but for gene network inference instead of embeddings.

mandatory

  • The code is publicly available under an OSI-approved license
  • The package provides versioned releases
  • The package can be installed from a standard registry (e.g. PyPI, conda-forge, bioconda)
  • Automated tests cover essential functions of the package and a reasonable range of inputs and conditions 1
  • Continuous integration (CI) automatically executes these tests on each push or pull request 2
  • The package provides API documentation via a website or README3
  • The package uses scverse datastructures where appropriate (i.e. AnnData, MuData or SpatialData and their modality-specific extensions)
  • I am an author or maintainer of the tool and agree on listing the package on the scverse website

Recommended

  • Please announce this package on scverse communication channels (zulip, discourse, twitter)
  • Please tag the author(s) these announcements. Handles (e.g. @scverse_team) to include are:
    • Twitter: @jkobject
    • Zulip:
    • Discourse:
    • Mastodon:
  • The package provides tutorials (or "vignettes") that help getting users started quickly
  • The package uses the scverse cookiecutter template.
  • It is using Rochard's project template

Footnotes

  1. We recommend thtat tests cover at least all user facing (public) functions. Minimal tests ensure that the function does not fail on an example data set. Ideally, tests also ensure the correctness of the results, e.g. by comparing against a snapshot.

  2. Continuous integration means that software tests are automatically executed on every push to the git repository. This guarantees they are always run and that they are run in a clean environment. Scverse ecosystem packages most commonly use GitHub Actions for CI. For an example, check out our cookiecutter template.

  3. By API documentation, we mean an overview of all public functions provided a package, with documentation of their parameters. For an example, see the Scanpy documentation. In simple cases, this can be done manually in a README file. For anything more complex, we recommend the Sphinx Autodoc plugin

@jkobject jkobject changed the title adding scPRINT and simi adding scPRINT and its suite of packages Aug 12, 2024
@jkobject
Copy link
Author

jkobject commented Sep 5, 2024

Hello, do you have any feedback on this?

@grst
Copy link
Contributor

grst commented Sep 9, 2024

Sorry, I'm a bit behind on this due to vacation time.
I'll review it beginning of October.

@grst
Copy link
Contributor

grst commented Oct 2, 2024

Hi @jkobject,

thanks for submitting the packages. Overall, this looks good, my main concern is test coverage. According to our guidelines we require that

Automated tests cover essential functions of the package and a reasonable range of inputs and conditions

which is not the case for any of the packages. Really most/all functions of your public API should be covered and tests should ideally also test for correct outputs and not just that no Exception occurs.

Other feedback

  • The documentation links in the yaml file (e.g. https://jkobject.com/GRnnData) don't work for me
  • Consider renaming the header "Documentation" to "API documentation" or "API", it took me a while to find the API docs
    image

@jkobject
Copy link
Author

jkobject commented Oct 2, 2024

Hi @grst,

Thanks for the feedback! I do understand that coverage should be maximal and will update the codes to test more files. Many files are actually barely used anymore and might be better removed I believe.

For scPRINT, while I test the denoising pipeline and make sure that it is performing well. I might not be able to test the quality of the results as easily for the network inference, embedding, and classification part. But I will add tests rather than exceptions as much as I can. I will come back to you in a few weeks with the updates!

Best,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants