-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: Custom column names and suffixes for overlap and nearest operat…
…ions (#43) * doc: Installation instructions * chore: Readme refactor * feat: Add support for custom column names and suffixes * Fixing needless borrow * Removing assertion and adding test case for non-default suffixes * Creating release 0.3.0
- Loading branch information
Showing
16 changed files
with
421 additions
and
181 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -11,6 +11,7 @@ on: | |
- 'docs/**' | ||
- 'benchmark/**' | ||
- 'mkdocs.yml' | ||
- 'README.md' | ||
pull_request: | ||
workflow_dispatch: | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
[package] | ||
name = "polars_bio" | ||
version = "0.2.11" | ||
version = "0.3.0" | ||
edition = "2021" | ||
|
||
[lib] | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,59 +1,10 @@ | ||
# polars_bio | ||
# polars-bio - Next-gen Python DataFrame operations for genomics! | ||
![CI](https://github.com/biodatageeks/polars-bio/actions/workflows/publish_to_pypi.yml/badge.svg?branch=master) | ||
![Docs](https://github.com/biodatageeks/polars-bio/actions/workflows/publish_documentation.yml/badge.svg?branch=master) | ||
![logo](docs/assets/logo-large.png) | ||
|
||
## Features | ||
|
||
[polars-bio](https://pypi.org/project/polars-bio/) is a Python library for genomics built on top of [polars](https://pola.rs/), [Apache Arrow](https://arrow.apache.org/) and [Apache DataFusion](https://datafusion.apache.org/). | ||
It provides a DataFrame API for genomics data and is designed to be blazing fast, memory efficient and easy to use. | ||
|
||
## Genomic ranges operations | ||
|
||
| Features | Bioframe | polars-bio | PyRanges | Pybedtools | PyGenomics | GenomicRanges | | ||
|--------------|--------------------|---------------------|--------------------|--------------------|--------------------|--------------------| | ||
| overlap | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | | ||
| nearest | :white_check_mark: | :white_check_mark: | :white_check_mark: | | | | | ||
| cluster | :white_check_mark: | | | | | | | ||
| merge | :white_check_mark: | | | | | | | ||
| complement | :white_check_mark: | | | | | | | ||
| select/slice | :white_check_mark: | | | | | | | ||
| | | | | | | | | ||
| coverage | :white_check_mark: | | | | | | | ||
| expand | :white_check_mark: | | | | | | | ||
| sort | :white_check_mark: | | | | | | | ||
|
||
|
||
## Input/Output | ||
| I/O | Bioframe | polars-bio | PyRanges | Pybedtools | PyGenomics | GenomicRanges | | ||
|------------------|--------------------|------------------------|--------------------|------------|------------|---------------| | ||
| Pandas DataFrame | :white_check_mark: | :white_check_mark: | :white_check_mark: | | | | | ||
| Polars DataFrame | | :white_check_mark: | | | | | | ||
| Polars LazyFrame | | :white_check_mark: | | | | | | ||
| Native readers | | :white_check_mark: | | | | | | ||
|
||
|
||
## Genomic file format | ||
| I/O | Bioframe | polars-bio | PyRanges | Pybedtools | PyGenomics | GenomicRanges | | ||
|----------------|--------------------|------------|--------------------|------------|------------|---------------| | ||
| BED | :white_check_mark: | | :white_check_mark: | | | | | ||
| BAM | | | | | | | | ||
| VCF | | | | | | | | ||
|
||
|
||
## Performance | ||
![img.png](benchmark/results-overlap-0.1.1.png) | ||
|
||
![img.png](benchmark/results-overlap-df-0.1.1.png) | ||
|
||
![img.png](benchmark/results-nearest-0.1.1.png) | ||
|
||
## Remarks | ||
|
||
Pyranges is multithreaded, but : | ||
|
||
* Requires Ray backend plus | ||
```bash | ||
nb_cpu: int, default 1 | ||
|
||
How many cpus to use. Can at most use 1 per chromosome or chromosome/strand tuple. | ||
Will only lead to speedups on large datasets. | ||
``` | ||
|
||
* for nearest returns no empty rows if there is no overlap (we follow Bioframe where nulls are returned) | ||
# | ||
Read the [documentation](https://biodatageeks.github.io/polars-bio/) |
Oops, something went wrong.