Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to indel calling. #1679

Closed
wants to merge 31 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
bf0cbf2
Attempt 1 to fix #1459; move left/right based on STR presence.
jkbonfield Feb 11, 2022
7710d96
WIP2. "Shift7-RF1c50-2ref" method so far.
jkbonfield Feb 18, 2022
0e050ce
Fixes to consensus.
jkbonfield Feb 21, 2022
d826490
Fix display of type/TYPE cons. Debug change only
jkbonfield Feb 22, 2022
7db37bb
Apply IDV per type, not just across all indels as a whole.
jkbonfield Feb 22, 2022
9ac3c72
Tweaks to try and improve on insertion calling.
jkbonfield Feb 22, 2022
c50d25b
First stab at returning dual consensus.
jkbonfield Feb 24, 2022
bfc951f
Improvements to 2nd consensus generation
jkbonfield Feb 24, 2022
72b88f7
Correct consensus to match type, even if data disagrees.
jkbonfield Feb 24, 2022
bae483d
Removed ref_sample usage, also fix type call.
jkbonfield Feb 25, 2022
76b542b
Change default indel-size and lower limit
jkbonfield Feb 28, 2022
1295ec1
Remove debugging
jkbonfield Feb 28, 2022
b1a2a10
Improvements to consensus generation
jkbonfield Mar 1, 2022
98f4b74
Adjustments to the ref vs query region for alignments.
jkbonfield Mar 2, 2022
36b972b
Code tidyup
jkbonfield Mar 3, 2022
68485fb
Give deletions the same treatment as insertions for consensus gen.
jkbonfield Mar 4, 2022
333f203
Minor tidying plus tweak STR iscore code
jkbonfield Mar 8, 2022
c450882
Disable INS_PLUS_BASE, plus a start on better band calc.
jkbonfield Mar 8, 2022
bc9cad0
Improve STR finder to cope with longer repeat units
jkbonfield Mar 9, 2022
fa8145d
Improvements for long reads.
jkbonfield Mar 9, 2022
e07fe35
More tweaking of consensus.
jkbonfield Mar 10, 2022
bbfadc1
Lots of tidying up of consensus code.
jkbonfield Mar 10, 2022
1173c62
Fix a small buffer overrun introduced by changing tbeg/tend usage.
jkbonfield Mar 11, 2022
ac4d267
Tidy the consensus code
jkbonfield Mar 11, 2022
5271e9c
Tidy up biggest_del / max_deletion.
jkbonfield Mar 14, 2022
eee86f0
Improve l_run estQ assignment.
jkbonfield Mar 14, 2022
ed5458b
Tidy the indel changes up a bit.
jkbonfield Mar 15, 2022
9de001b
Fix test mpileup files
jkbonfield Mar 15, 2022
16834c7
Further improve the indel calling.
jkbonfield Mar 17, 2022
155e36e
Add bounds checking to heti/hetd arrays.
jkbonfield Mar 17, 2022
72c0f76
Add a check for end of reference when indel calling.
jkbonfield Mar 23, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions INSTALL
Original file line number Diff line number Diff line change
Expand Up @@ -266,3 +266,22 @@ mingw-w64-x86_64-tools-git

(The last is only needed for building libraries compatible with MSVC.)

Windows MSYS2/MINGW64
---------------------

The configure script must be used as without it the compilation will
likely fail.

Follow MSYS2 installation instructions at
https://www.msys2.org/wiki/MSYS2-installation/

Then relaunch to MSYS2 shell using the "MSYS2 MinGW x64" executable.
Once in that environment (check $MSYSTEM equals "MINGW64") install the
compilers using pacman -S and the following package list:

base-devel mingw-w64-x86_64-toolchain
mingw-w64-x86_64-libdeflate mingw-w64-x86_64-zlib mingw-w64-x86_64-bzip2
mingw-w64-x86_64-xz mingw-w64-x86_64-curl mingw-w64-x86_64-autotools
mingw-w64-x86_64-tools-git

(The last is only needed for building libraries compatible with MSVC.)
15 changes: 14 additions & 1 deletion bam2bcf.c
Original file line number Diff line number Diff line change
Expand Up @@ -234,11 +234,14 @@ int bcf_call_glfgen(int _n, const bam_pileup1_t *pl, int ref_base, bcf_callaux_t
// particularly indicative of being a good REF match either,
// at least not in low coverage. So require solid coverage
// before we start utilising such quals.
b = 0;
if (b != 0)
b = 5;
q = (int)bam_get_qual(p->b)[p->qpos];
seqQ = (3*seqQ + 2*q)/8;
}
if (_n > 20 && seqQ > 40) seqQ = 40;
// Note baseQ changes some output fields such as I16, but has no
// significant affect on "call".
baseQ = p->aux>>8&0xff;

is_diff = (b != 0);
Expand Down Expand Up @@ -357,9 +360,19 @@ int bcf_call_glfgen(int _n, const bam_pileup1_t *pl, int ref_base, bcf_callaux_t
for (i=0; i<4; i++) r->ADF[i] += lroundf((float)dp_ambig * r->ADF[i]/dp);
}

// Else consider downgrading bca->bases[] scores by AD vs AD_ref_missed
// ratios. This is detrimental on Illumina, but beneficial on PacBio CCS.
// It's possibly related to the homopolyer error likelihoods or overall
// Indel accuracy. Maybe tie this in to the -h option?

r->ori_depth = ori_depth;
// glfgen
errmod_cal(bca->e, n, 5, bca->bases, r->p); // calculate PL of each genotype

// TODO: account for the number of unassigned reads. If depth is 50,
// but AD is 5,7 then it may look like a variant but it's probably
// should be low quality.

return n;
}

Expand Down
2 changes: 1 addition & 1 deletion bam2bcf.h
Original file line number Diff line number Diff line change
Expand Up @@ -156,7 +156,7 @@ extern "C" {
int bcf_call_combine(int n, const bcf_callret1_t *calls, bcf_callaux_t *bca, int ref_base /*4-bit*/, bcf_call_t *call);
int bcf_call2bcf(bcf_call_t *bc, bcf1_t *b, bcf_callret1_t *bcr, int fmt_flag,
const bcf_callaux_t *bca, const char *ref);
int bcf_call_gap_prep(int n, int *n_plp, bam_pileup1_t **plp, int pos, bcf_callaux_t *bca, const char *ref);
int bcf_call_gap_prep(int n, int *n_plp, bam_pileup1_t **plp, int pos, bcf_callaux_t *bca, const char *ref, int ref_len);
void bcf_callaux_clean(bcf_callaux_t *bca, bcf_call_t *call);

#ifdef __cplusplus
Expand Down
Loading