You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am wondering how is the output of modkit extract calls sorted as I have noticed that it does not seem to be sorted by genomic position and neither do the reads occur in the same order as in the input BAM file? This also seems to be the case when using the --bgzf flag. Isn't the idea of compressing with bgzip that an index can then be created using tabix, though this requires that the input file was originally sorted by sequence and position? I have sorted the output files myself using sort -k4,4 -k3,3n, though this took several hours due to the size of the output files by extract calls. Would it be possible to request a flag for pre-sorted output to save having to perform this step?
Cheers,
Richard
The text was updated successfully, but these errors were encountered:
If you use the --ignore-index flag the reads in the output should be in the same order as the input modBAM - but this routine doesn't leverage as much parallelism.
Isn't the idea of compressing with bgzip that an index can then be created using tabix, though this requires that the input file was originally sorted by sequence and position?
Actually, the idea is to make the output smaller. As you've likely found, the output is grouped by read and within each group the records are sorted but this isn't the sorting that tabix usually requires (contig/position). If you sort the table by contig and position, then joining the calls by read_id becomes more difficult. If you want to look at methylation calls per-genomic position, I'd recommend using pileup. On the other hand, if you want "rapid-access" to the read-level information I recommend running modkit extract calls with the --region option or an --include-bed file. These options will use an indexed, sorted modBAM and quickly retrieve the reads in the region you're querying for. One nice thing about the current grouping is that if you stream the output to another program you can operate on each read's calls. I'll consider adding a flag that sorts the output the way you're asking, but I think using --region and piping to sort might be a good way to do it. Maybe you can tell me more about your use case?
Hello,
I am wondering how is the output of modkit extract calls sorted as I have noticed that it does not seem to be sorted by genomic position and neither do the reads occur in the same order as in the input BAM file? This also seems to be the case when using the --bgzf flag. Isn't the idea of compressing with bgzip that an index can then be created using tabix, though this requires that the input file was originally sorted by sequence and position? I have sorted the output files myself using sort -k4,4 -k3,3n, though this took several hours due to the size of the output files by extract calls. Would it be possible to request a flag for pre-sorted output to save having to perform this step?
Cheers,
Richard
The text was updated successfully, but these errors were encountered: