Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

header format of .tsv did not match! #178

Open
zdk427 opened this issue Apr 25, 2024 · 3 comments
Open

header format of .tsv did not match! #178

zdk427 opened this issue Apr 25, 2024 · 3 comments

Comments

@zdk427
Copy link

zdk427 commented Apr 25, 2024

Hi
I am using version 1.4.2 for SNV analysis. I encounter the issue in Part 4: Filtering the same SNVs from replicates. Which gives me SNV-Singlets but 7.SNV-final only gives empty TSV file. The error snapshot is given below:
Ivar error
Can you please provide any feedback on it.
Thanks

@cmaceves
Copy link
Collaborator

Hi! Sorry that you're having issues, thanks for reaching out. Could you be more specific about the origin of "Part 4" and maybe share the .tsv files with me? Based on the given error, I would assume that the SNV variants files being used are not properly formatted but it's hard to tell without sample files!

@zdk427
Copy link
Author

zdk427 commented Apr 25, 2024

Sure here is the link to .tsv files i got after process 3 in folder 6.Singlet-SNVs
Please let me know if you need any further information.
https://usaskca1-my.sharepoint.com/:f:/g/personal/zdk427_usask_ca/EolxZ7UcEDpPo6qFVXKB_KoBajXB_8WyJ6k5Kx3QoU3OjA?e=Y1ihRt

@Alex-Vasile
Copy link

We also ran into this issue and dug into a bit. It's caused by the extra POS_AA column at the end.

Temporary solution for anyone having this issue

If you don't need the POS_AA data column, pre-process your variant files to remove this column.

In-depth Info

  1. call_variants_from_plup prints out a POS_AA (from a hardcoded set of column headings inside the function).

    "\tALT_CODON"
    "\tALT_AA"
    "\tPOS_AA"
    << std::endl;

  2. common_variants calls read_variant_file which first checks if the headers are correct:

    while (std::getline(line_stream, cell, '\t')) {
    if (cell.compare(fields[ctr]) != 0) {
    return -1;
    }
    ctr++;
    }

However this checks has 2 issues with it:

  1. It uses a parallel, and out of date, set of header names; it's missing POS_AA. This and call_variants_from_plup should be working with a single set of fields so there aren't two parallel structures to update when a change happens.
  2. This code is has an out of bounds error, which is what's happening now. The loop will keep reading heading columns and index into fields even after ctr >= NUM_FIELDS. The loop should terminate if there are more than NUM_FIELDS entries.

const int NUM_FIELDS = 19;
const std::string fields[NUM_FIELDS] = {
"REGION", "POS", "REF", "ALT", "REF_DP",
"REF_RV", "REF_QUAL", "ALT_DP", "ALT_RV", "ALT_QUAL",
"ALT_FREQ", "TOTAL_DP", "PVAL", "PASS", "GFF_FEATURE",
"REF_CODON", "REF_AA", "ALT_CODON", "ALT_AA"};

Also worth considering is changing the error message from common_variants. It currently gives the incorrect impression that the header formats of A_variant and B_variant do not match each other, but what it actually means is that they don't match the expected header. Would be worth changing that message and also printing both the received header and the expected header.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants