Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: bam::Record:new should return a valid record #361

Merged
merged 13 commits into from
May 22, 2024
54 changes: 53 additions & 1 deletion src/bam/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1392,7 +1392,7 @@
header_string.len(),
);

let rec = htslib::sam_hdr_parse((l_text + 1), text as *const c_char);

Check warning on line 1395 in src/bam/mod.rs

View workflow job for this annotation

GitHub Actions / Testing-Features (no-default-features)

unnecessary parentheses around function argument

Check warning on line 1395 in src/bam/mod.rs

View workflow job for this annotation

GitHub Actions / Testing-Features (no-default-features)

unnecessary parentheses around function argument

Check warning on line 1395 in src/bam/mod.rs

View workflow job for this annotation

GitHub Actions / Testing-Features (all-features)

unnecessary parentheses around function argument

Check warning on line 1395 in src/bam/mod.rs

View workflow job for this annotation

GitHub Actions / Testing-Features (all-features)

unnecessary parentheses around function argument
(*rec).text = text as *mut c_char;
(*rec).l_text = l_text;
rec
Expand Down Expand Up @@ -2451,7 +2451,7 @@
where
F: Fn(&record::Record) -> Option<bool>,
{
let mut bam_reader = Reader::from_path(bamfile).unwrap(); // internal functions, just unwarp
let mut bam_reader = Reader::from_path(bamfile).unwrap(); // internal functions, just unwrap
dcroote marked this conversation as resolved.
Show resolved Hide resolved
let header = header::Header::from_template(bam_reader.header());
let mut sam_writer = Writer::from_path(samfile, &header, Format::Sam).unwrap();
for record in bam_reader.records() {
Expand Down Expand Up @@ -3002,6 +3002,58 @@
assert_eq!(header_refseqs[0].get("LN").unwrap(), "10000000",);
}

#[test]
fn test_bam_new() {
// Create the path to write the tmp test BAM
let tmp = tempfile::Builder::new()
.prefix("rust-htslib")
.tempdir()
.expect("Cannot create temp dir");
let bampath = tmp.path().join("test.bam");

// write an unmapped BAM record (uBAM)
{
// Build the header
let mut header = Header::new();

// Add the version
header.push_record(
HeaderRecord::new(b"HD")
.push_tag(b"VN", &"1.6")
.push_tag(b"SO", &"unsorted"),
);

// Build the writer
let mut writer = Writer::from_path(&bampath, &header, Format::Bam).unwrap();

// Build an empty record
let mut record = Record::new();

// Write the record (this previously seg-faulted)
assert!(writer.write(&record).is_ok());
}

// Read the record
{
// Build th reader
let mut reader = Reader::from_path(&bampath).expect("Error opening file.");

// Read the record
let mut rec = Record::new();
match reader.read(&mut rec) {
Some(r) => r.expect("Failed to read record."),
None => panic!("No record read."),
};

// Check a few things
assert!(rec.is_unmapped());
assert_eq!(rec.tid(), -1);
assert_eq!(rec.pos(), -1);
assert_eq!(rec.mtid(), -1);
assert_eq!(rec.mpos(), -1);
}
}

#[test]
fn test_idxstats_bam() {
let mut reader = IndexedReader::from_path("test/test.bam").unwrap();
Expand Down
15 changes: 13 additions & 2 deletions src/bam/record.rs
Original file line number Diff line number Diff line change
Expand Up @@ -114,19 +114,30 @@
impl Record {
/// Create an empty BAM record.
pub fn new() -> Self {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal in this PR is to that one does not need to call any other methods on Record to have a valid record. Currently, this requires setting the read name and alignment positions (and unmapped flag if the positions are -1). So this PR has an opinion, that a new record is unmapped with empty name/seq/qual/cigar.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for indirectly making me think about Default trait vs new(): https://users.rust-lang.org/t/when-to-use-default-trait/11190

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Record {
let mut record = Record {
inner: unsafe { MaybeUninit::zeroed().assume_init() },
own: true,
cigar: None,
header: None,
}
};
// The read/query name needs to be set as empty to properly initialize
// the record
record.set_qname(b"");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the writer would not write the record unless the query name is set to something (empty is ok)

// Developer note: these are needed so the returned record is properly
// initialized as unmapped.
record.set_unmapped();
record.set_tid(-1);
record.set_pos(-1);
record.set_mpos(-1);
record.set_mtid(-1);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue that we should also set the record to be unmapped by default. I mean, how can it be mapped if the tid/pos are -1?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable but I'd quickly compare third party implementations (htslib/htsjdk/noodles) and/or spec footnotes (lots of gotchas there).

record
}

pub fn from_inner(from: *mut htslib::bam1_t) -> Self {
Record {
inner: {
#[allow(clippy::uninit_assumed_init)]
let mut inner = unsafe { MaybeUninit::uninit().assume_init() };

Check warning on line 140 in src/bam/record.rs

View workflow job for this annotation

GitHub Actions / Testing-Features (no-default-features)

the type `hts_sys::bam1_t` does not permit being left uninitialized

Check warning on line 140 in src/bam/record.rs

View workflow job for this annotation

GitHub Actions / Testing-Features (all-features)

the type `hts_sys::bam1_t` does not permit being left uninitialized
unsafe {
::libc::memcpy(
&mut inner as *mut htslib::bam1_t as *mut ::libc::c_void,
Expand Down
2 changes: 2 additions & 0 deletions src/bam/record_serde.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ use serde_bytes::{ByteBuf, Bytes};
use crate::bam::record::Record;

fn fix_l_extranul(rec: &mut Record) {
// first, reset the number of extranuls to 0 for calling .qname(); then calculate how many we actually have
rec.inner_mut().core.l_extranul = 0;
let l_extranul = rec.qname().iter().rev().take_while(|x| **x == 0u8).count() as u8;
rec.inner_mut().core.l_extranul = l_extranul;
}
Expand Down
Loading