Skip to content

Base sequence quality implementation #133

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

bscisel
Copy link

@bscisel bscisel commented May 30, 2025

No description provided.

bscisel added 6 commits May 20, 2025 19:22
- Updated `quality_stats.py` to support FASTQ file format in base sequence quality function.
- Refactored the scanning and frame functions to use new UDAF for base sequence quality calculations.
- Implemented a new UDAF in `udaf.rs` to compute quality scores statistics, including average, median, and quartiles.
- Modified `context.rs` to register the new UDAF for use in SQL queries.
- Adjusted `operation.rs` to execute the new SQL query for base sequence quality analysis.
- Added deregistration functionality for tables in `scan.rs`.
- Ensured compatibility with FASTQ format in input handling.
- Updated `base_sequence_quality` function to accept a quality scores column and output type.
- Introduced `BaseSequenceQualityProvider` and `BaseSequenceQualityExec` in Rust for efficient execution plans.
- Removed the custom UDAF for quality scores and replaced it with a DataFusion table provider.
- Simplified data handling by directly using DataFrames from DataFusion.
- Cleaned up unnecessary code and files related to UDAF implementation.
- Enhanced error handling and type checking for input data.
- Added `SequenceQualityHistogramProvider` and `SequenceQualityHistogramExec` to compute quality histograms from sequence data.
- Introduced `QuantileStatsTableProvider` and `QuantileStatsExec` for calculating quantile statistics based on histogram data.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant