feat: Custom column names and suffixes for overlap and nearest operat…

…ions (#43) * doc: Installation instructions * chore: Readme refactor * feat: Add support for custom column names and suffixes * Fixing needless borrow * Removing assertion and adding test case for non-default suffixes * Creating release 0.3.0
biodatageeks · Dec 21, 2024 · 0f25a4d · 0f25a4d
1 parent 4bf723c
commit 0f25a4d
Show file tree

Hide file tree

Showing 16 changed files with 421 additions and 181 deletions.
diff --git a/.github/workflows/publish_to_pypi.yml b/.github/workflows/publish_to_pypi.yml
@@ -11,6 +11,7 @@ on:
       - 'docs/**'
       - 'benchmark/**'
       - 'mkdocs.yml'
+      - 'README.md'
   pull_request:
   workflow_dispatch:
 

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -29,3 +29,14 @@ repos:
     hooks:
       - id: fmt
       - id: cargo-check
+
+### FIXME
+#  - repo: https://github.com/ddkasa/check-mkdocs.git
+#    rev: 65e819a4c62ee22c38f244b51b63f2f9b89a66d0
+#    hooks:
+#      - id: check-mkdocs
+#        name: check-mkdocs
+#        args: ["--config", "mkdocs.yml"]  # Optional, mkdocs.yml is the default
+#        # If you have additional plugins or libraries that are not included in
+#        # check-mkdocs, add them here
+#        additional_dependencies: ['mkdocs-material', 'mkdocs-jupyter', 'mkdocstrings-python']
diff --git a/Cargo.lock b/Cargo.lock
diff --git a/Cargo.toml b/Cargo.toml
@@ -1,6 +1,6 @@
 [package]
 name = "polars_bio"
-version = "0.2.11"
+version = "0.3.0"
 edition = "2021"
 
 [lib]

diff --git a/README.md b/README.md
@@ -1,59 +1,10 @@
-# polars_bio
+# polars-bio - Next-gen Python DataFrame operations for genomics!
+![CI](https://github.com/biodatageeks/polars-bio/actions/workflows/publish_to_pypi.yml/badge.svg?branch=master)
+![Docs](https://github.com/biodatageeks/polars-bio/actions/workflows/publish_documentation.yml/badge.svg?branch=master)
+![logo](docs/assets/logo-large.png)
 
-## Features
 
+[polars-bio](https://pypi.org/project/polars-bio/) is a Python library for genomics built on top of [polars](https://pola.rs/), [Apache Arrow](https://arrow.apache.org/) and [Apache DataFusion](https://datafusion.apache.org/).
+It provides a DataFrame API for genomics data and is designed to be blazing fast, memory efficient and easy to use.
 
-## Genomic ranges operations
-
-| Features     | Bioframe           | polars-bio          | PyRanges           | Pybedtools         | PyGenomics         | GenomicRanges      |
-|--------------|--------------------|---------------------|--------------------|--------------------|--------------------|--------------------|
-| overlap      | :white_check_mark: | :white_check_mark:  | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
-| nearest      | :white_check_mark: | :white_check_mark:  | :white_check_mark: |                    |                    |                    |
-| cluster      | :white_check_mark: |                     |                    |                    |                    |                    |
-| merge        | :white_check_mark: |                     |                    |                    |                    |                    |
-| complement   | :white_check_mark: |                     |                    |                    |                    |                    |
-| select/slice | :white_check_mark: |                     |                    |                    |                    |                    |
-|              |                    |                     |                    |                    |                    |                    |
-| coverage     | :white_check_mark: |                     |                    |                    |                    |                    |
-| expand       | :white_check_mark: |                     |                    |                    |                    |                    |
-| sort         | :white_check_mark: |                     |                    |                    |                    |                    |
-
-
-## Input/Output
-| I/O              | Bioframe           | polars-bio             | PyRanges           | Pybedtools | PyGenomics | GenomicRanges |
-|------------------|--------------------|------------------------|--------------------|------------|------------|---------------|
-| Pandas DataFrame | :white_check_mark: | :white_check_mark:     | :white_check_mark: |            |            |               |
-| Polars DataFrame |                    | :white_check_mark:     |                    |            |            |               |
-| Polars LazyFrame |                    | :white_check_mark:     |                    |            |            |               |
-| Native readers   |                    | :white_check_mark:     |                    |            |            |               |
-
-
-## Genomic file format
-| I/O            | Bioframe           | polars-bio | PyRanges           | Pybedtools | PyGenomics | GenomicRanges |
-|----------------|--------------------|------------|--------------------|------------|------------|---------------|
-| BED            | :white_check_mark: |            | :white_check_mark: |            |            |               |
-| BAM            |                    |            |                    |            |            |               |
-| VCF            |                    |            |                    |            |            |               |
-
-
-## Performance
-![img.png](benchmark/results-overlap-0.1.1.png)
-
-![img.png](benchmark/results-overlap-df-0.1.1.png)
-
-![img.png](benchmark/results-nearest-0.1.1.png)
-
-## Remarks
-
-Pyranges is multithreaded, but :
-
-* Requires Ray backend plus
-```bash
-  nb_cpu: int, default 1
-
-            How many cpus to use. Can at most use 1 per chromosome or chromosome/strand tuple.
-            Will only lead to speedups on large datasets.
-```
-
-* for nearest returns no empty rows if there is no overlap (we follow Bioframe where nulls are returned)
-#
+Read the [documentation](https://biodatageeks.github.io/polars-bio/)