Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Many output errors and warnings when running the minimal example #193

Open
inodb opened this issue Jun 9, 2022 · 2 comments
Open

Many output errors and warnings when running the minimal example #193

inodb opened this issue Jun 9, 2022 · 2 comments

Comments

@inodb
Copy link
Member

inodb commented Jun 9, 2022

Seems like things get annotated correctly but tons of errors about reading fields:

Something went wrong reading field Entrez_Gene_Id
Something went wrong reading field dbSNP_RS
Something went wrong reading field Match_Norm_Seq_Allele1
Something went wrong reading field Validation_Method
Something went wrong reading field Match_Norm_Seq_Allele2
Something went wrong reading field n_ref_count
Something went wrong reading field t_alt_count
Something went wrong reading field BAM_File
Something went wrong reading field Variant_Classification
Something went wrong reading field dbSNP_Val_Status
Something went wrong reading field Mutation_Status
Something went wrong reading field Matched_Norm_Sample_Barcode
Something went wrong reading field Validation_Status
Something went wrong reading field Variant_Type
Something went wrong reading field Strand
Something went wrong reading field Hugo_Symbol
Something went wrong reading field Sequencer
Something went wrong reading field n_alt_count
Something went wrong reading field Center
Something went wrong reading field Match_Norm_Validation_Allele2
Something went wrong reading field Tumor_Sample_Barcode
Something went wrong reading field Verification_Status
Something went wrong reading field t_ref_count
Something went wrong reading field Tumor_Seq_Allele2
Something went wrong reading field Match_Norm_Validation_Allele1
Something went wrong reading field Score
Something went wrong reading field Sequencing_Phase
Something went wrong reading field Tumor_Validation_Allele2
Something went wrong reading field Tumor_Validation_Allele1
Something went wrong reading field NCBI_Build
Something went wrong reading field Sequence_Source
Something went wrong reading field Entrez_Gene_Id
Something went wrong reading field dbSNP_RS
Something went wrong reading field Match_Norm_Seq_Allele1
Something went wrong reading field Validation_Method
Something went wrong reading field Match_Norm_Seq_Allele2
Something went wrong reading field n_ref_count
Something went wrong reading field t_alt_count
Something went wrong reading field BAM_File
Something went wrong reading field Variant_Classification
Something went wrong reading field dbSNP_Val_Status
Something went wrong reading field Mutation_Status
Something went wrong reading field Matched_Norm_Sample_Barcode
Something went wrong reading field Validation_Status
Something went wrong reading field Variant_Type
Something went wrong reading field Strand
Something went wrong reading field Hugo_Symbol
Something went wrong reading field Sequencer
Something went wrong reading field n_alt_count
Something went wrong reading field Center
Something went wrong reading field Match_Norm_Validation_Allele2
Something went wrong reading field Tumor_Sample_Barcode
Something went wrong reading field Verification_Status
Something went wrong reading field t_ref_count
Something went wrong reading field Tumor_Seq_Allele2
Something went wrong reading field Match_Norm_Validation_Allele1
Something went wrong reading field Score
Something went wrong reading field Sequencing_Phase
Something went wrong reading field Tumor_Validation_Allele2
Something went wrong reading field Tumor_Validation_Allele1
Something went wrong reading field NCBI_Build
Something went wrong reading field Sequence_Source
Something went wrong reading field Entrez_Gene_Id
Something went wrong reading field dbSNP_RS
Something went wrong reading field Match_Norm_Seq_Allele1
Something went wrong reading field Validation_Method
Something went wrong reading field Match_Norm_Seq_Allele2
Something went wrong reading field n_ref_count
Something went wrong reading field t_alt_count
Something went wrong reading field BAM_File
Something went wrong reading field Variant_Classification
Something went wrong reading field dbSNP_Val_Status
Something went wrong reading field Mutation_Status
Something went wrong reading field Matched_Norm_Sample_Barcode
Something went wrong reading field Validation_Status
Something went wrong reading field Variant_Type
Something went wrong reading field Strand
Something went wrong reading field Hugo_Symbol
Something went wrong reading field Sequencer
Something went wrong reading field n_alt_count
Something went wrong reading field Center
Something went wrong reading field Match_Norm_Validation_Allele2
Something went wrong reading field Tumor_Sample_Barcode
Something went wrong reading field Verification_Status
Something went wrong reading field t_ref_count
Something went wrong reading field Tumor_Seq_Allele2
Something went wrong reading field Match_Norm_Validation_Allele1
Something went wrong reading field Score
Something went wrong reading field Sequencing_Phase
Something went wrong reading field Tumor_Validation_Allele2
Something went wrong reading field Tumor_Validation_Allele1
Something went wrong reading field NCBI_Build
Something went wrong reading field Sequence_Source

Annotation Summary:
	Records with ambiguous SNP and INDEL allele changes:  0
	All variants annotated successfully without failures!
@ozguzMete
Copy link
Contributor

These are basically missing columns in the example maf file. Interestingly, we got the column names from the file. We merge these column names with the predefined column names list inside MutationRecord. MutationRecord has 36 headers while the maf format has 126 columns in total... These 36 columns look like more "required" than the rest but not soo "required" by your comment

Do we really need to use the predefined column names list inside MutationRecord? If not we can solve the problem by removing this merge operation.

The code gets IllegalArgumentException since a column with the given name is not defined. we can simply suppress this exception since the annotation is "correct" -- or -- instead of logging an error we can log a warning

In both cases, we should be printing a clearer error/warn message when we got the IllegalArgumentException at that point and it should be this:
No such column name: XXXXXXXX -- or -- Missing column: XXXXXXX

@inodb
Copy link
Member Author

inodb commented Jun 14, 2022

Let's change the behavior:

  1. Check if minimum 5 columns exist (chrom, start_pos, end_pos, ref, Tumor_Seq_Allele1). Same as data/minimal_example.in.txt
  2. Other columns can be safely ignored (we do want to keep them in the output file)
  3. For the output. Right now a lot of extra empty columns are outputted (see minimal_example.out.uniprot.txt). Maybe we can have an optional argument to indicate "only output new columns" e.g. --output-format minimal or --output-format mskcc. Maybe in the future a format file so you can add custom format files (Custom output format files #194).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants