Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a function to read mate alignments #329

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

alumi
Copy link
Member

@alumi alumi commented Dec 23, 2024

Summary

This PR adds a function cljam.io.sam/read-mate-alignments for BAM file which reads R1/R2 counterpart alignments of given alignments.

Added functions

This function can be used for collecting alignments around some regions keeping pairs like samtools view --fetch-pairs.

(let [xs (sam/read-alignments reader {:chr "chr1", :start 1, :end 100000})]
  (concat xs (sam/read-mate-alignments reader xs))

In the example above, it is necessary to recheck which alignment pairs with which. To make things easier, I added another function cljam.io.sam/make-pairs that returns a sequence with paired alignments grouped together.

(let [xs (sam/read-alignments reader {:chr "chr1", :start 1, :end 100000})]
  (sam/make-pairs reader xs))

To achieve these functionalities, I implemented a function cljam.io.bam-index/get-spans-for-regions that queries chunks corresponding to multiple regions at once against the BAI index.

Implementation details

cljam.io.sam/read-mate-alignments for SAM and CRAM is not supported yet.

When searching for mates, a large number of alignment blocks that do not meet the criteria must be discarded. Therefore, the current implementation decodes only the minimal set of fields necessary for condition evaluation and immediately rejects blocks that do not satisfy the criteria.

As a result, a significant part of the execution time is spent decompressing BGZF blocks, so reducing the number of chunks accessed would make a big difference.
However, the current implementation is limited to what can be achieved by combining existing functionalities, leaving room for further optimization in the future.

Tests

Since we don't have a good BAM file with paired reads of appropriate size, I added auto-generated test cases for cljam.io.sam/read-mate-alignments. It writes a temporary BAM file containing paired alignments on the fly, and then perform reading and checking.

Copy link

codecov bot commented Dec 23, 2024

Codecov Report

Attention: Patch coverage is 97.77778% with 2 lines in your changes missing coverage. Please review.

Project coverage is 90.10%. Comparing base (18f8cca) to head (6337013).

Files with missing lines Patch % Lines
src/cljam/io/bam/reader.clj 97.91% 1 Missing ⚠️
src/cljam/io/sam.clj 95.45% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #329      +/-   ##
==========================================
+ Coverage   89.97%   90.10%   +0.12%     
==========================================
  Files         104      104              
  Lines        9341     9435      +94     
  Branches      490      491       +1     
==========================================
+ Hits         8405     8501      +96     
+ Misses        446      443       -3     
- Partials      490      491       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant