add test cases for maximum e-value filter on alignment results #7

katrinakalantar · 2020-06-12T17:54:23Z

Assertion: The maximum e-value for alignments in IDseq is 1.

Implementation Details:
The maximum e-value threshold filter is applied in two different locations within the code base:

For short read alignments, the filter is applied inside the iterate_m8() function in the .m8 utils.
For contig alignments, the filter is applied using filters in PipelineStepBlastContigs.

We expect that there may be alignments with e-values > 1 in the initial alignment files (gsnap.m8, rapsearch2.m8, gsnap.blast.m8, rapsearch2.blast.m8).
The filter is then applied to the raw .m8 results when parsing for the top hits. There should never be e-values > 1 in the following files:

gsnap.deduped.m8
rapsearch2.deduped.m8
gsnap.blast.top.m8
rapsearch2.blast.top.m8

This was implemented as part of chanzuckerberg/czid-dag#309

Test Sample:
This was tested on staging using benchmark sample UnAmbiguouslyMapped_ds.gut. In particular: staging sample ID 19379 was run prior to the fix, staging sample ID 19361 was run after the fix.

For exampe, in sample 19361,
gsnap.m8 has 32 rows with e-value > 1, but gsnap.deduped.m8 has zero.
rapsearch2.m8 has 45 rows with e-value > 1, but rapsearch2.deduped.m8 has zero.
rapsearch2.blast.m8 has 5172 rows with e-value > 1, but rapsearch2.blast.top.m8 has zero.

The text was updated successfully, but these errors were encountered:

katrinakalantar mentioned this issue Jun 12, 2020

add e-value threshold to require internal alignments have e-value < 1 chanzuckerberg/czid-dag#309

Merged

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add test cases for maximum e-value filter on alignment results #7

add test cases for maximum e-value filter on alignment results #7

katrinakalantar commented Jun 12, 2020

add test cases for maximum e-value filter on alignment results #7

add test cases for maximum e-value filter on alignment results #7

Comments

katrinakalantar commented Jun 12, 2020