Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError: 'Coding sequence' #255

Open
gianfilippo opened this issue Jan 9, 2025 · 4 comments
Open

KeyError: 'Coding sequence' #255

gianfilippo opened this issue Jan 9, 2025 · 4 comments
Labels
bug Something isn't working

Comments

@gianfilippo
Copy link

Description of the bug

Hi,
I am getting the following error. Can you please help ?
I am quite new to this sort of analysis and I am not very familiar with the tools. From what I understand, I may be missing some fields in my VEP annotated VCF.
Thanks

ERROR ~ Error executing process > 'NFCORE_EPITOPEPREDICTION:EPITOPEPREDICTION:EPYTOPE_PEPTIDE_PREDICTION_VAR (4)'

Caused by:
Process NFCORE_EPITOPEPREDICTION:EPITOPEPREDICTION:EPYTOPE_PEPTIDE_PREDICTION_VAR (4) terminated with an error exit status (1)

Command executed:

create folder for MHCflurry downloads to avoid permission problems when running pipeline with docker profile and mhcflurry selected

mkdir -p mhcflurry-data
export MHCFLURRY_DATA_DIR=./mhcflurry-data

specify MHCflurry release for which to download models, need to be updated here as well when MHCflurry will be updated

export MHCFLURRY_DOWNLOADS_CURRENT_RELEASE=1.4.0

Add non-free software to the PATH

shopt -s nullglob
IFS=',' read -r -a netmhc_paths_string <<< ""
for p in "${netmhc_paths_string[@]}"; do
export PATH="$(realpath -s "$p"):$PATH";
done
shopt -u nullglob

epaa.py --identifier 649T.GATK_sn_L.gene_region.WES.output-tnhap2.vep.chr12 --alleles 'A01:01;A68:01;B27:05;B57:01;C02:02;C06:02' --tools 'mhcflurry' --max_length 11 --min_length 8 --versions versions.csv --genome_reference 'https://www.ensembl.org' --somatic_mutation 649T.GATK_sn_L.gene_region.WES.output-tnhap2.vep.chr12.vcf

cat <<-END_VERSIONS > versions.yml
"NFCORE_EPITOPEPREDICTION:EPITOPEPREDICTION:EPYTOPE_PEPTIDE_PREDICTION_VAR":
python: $(python --version 2>&1 | sed 's/Python //g')
epytope: $(python -c "import pkg_resources; print(pkg_resources.get_distribution('epytope').version)")
pandas: $(python -c "import pkg_resources; print(pkg_resources.get_distribution('pandas').version)")
pyvcf: $(python -c "import pkg_resources; print(pkg_resources.get_distribution('PyVCF3').version)")
mhcflurry: $(mhcflurry-predict --version 2&gt;&amp;1 | sed 's/^mhcflurry //; s/ .*$//')
mhcnuggets: $(python -c "import pkg_resources; print(pkg_resources.get_distribution('mhcnuggets').version)")
END_VERSIONS

Command exit status:
1

Command output:
2025-01-09 13:45:07,311 - main - INFO - Running Epitope Prediction And Annotation version: 1.1
2025-01-09 13:45:07,312 - main - INFO - Starting predictions at 2025-01-09 13:45:07
2025-01-09 13:45:07,312 - main - INFO - Running epaa for variants...
2025-01-09 13:45:07,338 - main - WARNING - FORMAT entry PID not defined for 649N. Skipping.
2025-01-09 13:45:07,338 - main - WARNING - FORMAT entry PGT not defined for 649N. Skipping.
2025-01-09 13:45:07,338 - main - WARNING - FORMAT entry PS not defined for 649N. Skipping.
2025-01-09 13:45:07,339 - main - WARNING - FORMAT entry PID not defined for 649T. Skipping.
2025-01-09 13:45:07,339 - main - WARNING - FORMAT entry PGT not defined for 649T. Skipping.
2025-01-09 13:45:07,339 - main - WARNING - FORMAT entry PS not defined for 649T. Skipping.
2025-01-09 13:45:07,339 - main - WARNING - FORMAT entry PID not defined for 649N. Skipping.
2025-01-09 13:45:07,339 - main - WARNING - FORMAT entry PGT not defined for 649N. Skipping.
2025-01-09 13:45:07,339 - main - WARNING - FORMAT entry PS not defined for 649N. Skipping.
2025-01-09 13:45:07,339 - main - WARNING - FORMAT entry PID not defined for 649T. Skipping.
2025-01-09 13:45:07,339 - main - WARNING - FORMAT entry PGT not defined for 649T. Skipping.
2025-01-09 13:45:07,339 - main - WARNING - FORMAT entry PS not defined for 649T. Skipping.

Command error:
INFO:main:Running epaa for variants...
WARNING:main:FORMAT entry PID not defined for 649N. Skipping.
WARNING:main:FORMAT entry PGT not defined for 649N. Skipping.
WARNING:main:FORMAT entry PS not defined for 649N. Skipping.
WARNING:main:FORMAT entry PID not defined for 649T. Skipping.
WARNING:main:FORMAT entry PGT not defined for 649T. Skipping.
WARNING:main:FORMAT entry PS not defined for 649T. Skipping.
WARNING:main:FORMAT entry PID not defined for 649N. Skipping.
WARNING:main:FORMAT entry PGT not defined for 649N. Skipping.
WARNING:main:FORMAT entry PS not defined for 649N. Skipping.
WARNING:main:FORMAT entry PID not defined for 649T. Skipping.
WARNING:main:FORMAT entry PGT not defined for 649T. Skipping.
WARNING:main:FORMAT entry PS not defined for 649T. Skipping.
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3361, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Coding sequence'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/myhome/.nextflow/assets/nf-core/epitopeprediction/bin/epaa.py", line 1310, in
main()
File "/home/myhome/.nextflow/assets/nf-core/epitopeprediction/bin/epaa.py", line 1146, in main
transcriptProteinTable,
File "/home/myhome/.nextflow/assets/nf-core/epitopeprediction/bin/epaa.py", line 722, in make_predictions_from_variants
generator.generate_transcripts_from_variants(variants_all, martsadapter, ID_SYSTEM_USED)
File "/home/myhome/.nextflow/assets/nf-core/epitopeprediction/bin/epaa.py", line 720, in
p
File "/usr/local/lib/python3.7/site-packages/epytope/Core/Generator.py", line 406, in generate_proteins_from_transcripts
for t in transcripts:
File "/usr/local/lib/python3.7/site-packages/epytope/Core/Generator.py", line 350, in generate_transcripts_from_variants
query = dbadapter.get_transcript_information(tId, type=id_type, _db=db)
File "/usr/local/lib/python3.7/site-packages/epytope/IO/MartsAdapter.py", line 462, in get_transcript_information
if result.empty or 'Sequence unavailable' in result.at[0, attributes["coding"]]:
File "/usr/local/lib/python3.7/site-packages/pandas/core/indexing.py", line 2275, in getitem
return super().getitem(key)
File "/usr/local/lib/python3.7/site-packages/pandas/core/indexing.py", line 2222, in getitem
return self.obj._get_value(*key, takeable=self._takeable)
File "/usr/local/lib/python3.7/site-packages/pandas/core/frame.py", line 3568, in _get_value
series = self._get_item_cache(col)
File "/usr/local/lib/python3.7/site-packages/pandas/core/frame.py", line 3884, in _get_item_cache
loc = self.columns.get_loc(item)
File "/usr/local/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
raise KeyError(key) from err
KeyError: 'Coding sequence'

Command used and terminal output

nextflow run nf-core/epitopeprediction \
    -r 2.3.1 \
    -profile singularity \
    -c $inDIR/yale.config \
    -work-dir $workDIR \
    --input $inDIR/samplesheet.EpitopePrediction.csv \
    --outdir $outDIR \
    --genome_reference grch38

Relevant files

No response

System information

Nextflow/24.04.4
HPC
slurm
Singularity
Linux EL 8.8
epitopeprediction v2.3.1

@gianfilippo gianfilippo added the bug Something isn't working label Jan 9, 2025
@jonasscheid
Copy link
Contributor

Could be, did you run sarek beforehand to obtain the VEP annotation?

@gianfilippo
Copy link
Author

I did not use Sarek. I used Sentieon for variant calling and and vcf2maf for VEP annotation (and conversion to MAF)

@gianfilippo
Copy link
Author

I reprocessed the VCF with SAREK, annotation only, but I still get an error at the same step
ERROR ~ Error executing process > 'NFCORE_EPITOPEPREDICTION:EPITOPEPREDICTION:EPYTOPE_PEPTIDE_PREDICTION_VAR (4)'

Caused by:
Process NFCORE_EPITOPEPREDICTION:EPITOPEPREDICTION:EPYTOPE_PEPTIDE_PREDICTION_VAR (4) terminated with an error exit status (1)

Command executed:

create folder for MHCflurry downloads to avoid permission problems when running pipeline with docker profile and mhcflurry selected

mkdir -p mhcflurry-data
export MHCFLURRY_DATA_DIR=./mhcflurry-data

specify MHCflurry release for which to download models, need to be updated here as well when MHCflurry will be updated

export MHCFLURRY_DOWNLOADS_CURRENT_RELEASE=1.4.0

Add non-free software to the PATH

shopt -s nullglob
IFS=',' read -r -a netmhc_paths_string <<< ""
for p in "${netmhc_paths_string[@]}"; do
export PATH="$(realpath -s "$p"):$PATH";
done
shopt -u nullglob

epaa.py --identifier 661T.GATK_sn_L.gene_region.WES.output-tnhap2_snpEff_VEP.ann.chr15 --alleles 'A03:01;A03:01;B07:02;B15:01;C04:01;C07:02' --tools 'syfpeithi' --max_length 11 --min_length 8 --versions versions.csv --genome_reference 'https://www.ensembl.org' --somatic_mutation 661T.GATK_sn_L.gene_region.WES.output-tnhap2_snpEff_VEP.ann.chr15.vcf

cat <<-END_VERSIONS > versions.yml
"NFCORE_EPITOPEPREDICTION:EPITOPEPREDICTION:EPYTOPE_PEPTIDE_PREDICTION_VAR":
python: $(python --version 2>&1 | sed 's/Python //g')
epytope: $(python -c "import pkg_resources; print(pkg_resources.get_distribution('epytope').version)")
pandas: $(python -c "import pkg_resources; print(pkg_resources.get_distribution('pandas').version)")
pyvcf: $(python -c "import pkg_resources; print(pkg_resources.get_distribution('PyVCF3').version)")
mhcflurry: $(mhcflurry-predict --version 2&gt;&amp;1 | sed 's/^mhcflurry //; s/ .*$//')
mhcnuggets: $(python -c "import pkg_resources; print(pkg_resources.get_distribution('mhcnuggets').version)")
END_VERSIONS

Command exit status:
1

Command output:
2025-01-10 14:43:17,062 - main - INFO - Running Epitope Prediction And Annotation version: 1.1
2025-01-10 14:43:17,063 - main - INFO - Starting predictions at 2025-01-10 14:43:17
2025-01-10 14:43:17,063 - main - INFO - Running epaa for variants...
2025-01-10 14:43:17,070 - main - WARNING - FORMAT entry PID not defined for 661N. Skipping.
2025-01-10 14:43:17,070 - main - WARNING - FORMAT entry PGT not defined for 661N. Skipping.
2025-01-10 14:43:17,070 - main - WARNING - FORMAT entry PS not defined for 661N. Skipping.
2025-01-10 14:43:17,071 - main - WARNING - FORMAT entry PID not defined for 661T. Skipping.
2025-01-10 14:43:17,071 - main - WARNING - FORMAT entry PGT not defined for 661T. Skipping.
2025-01-10 14:43:17,071 - main - WARNING - FORMAT entry PS not defined for 661T. Skipping.
2025-01-10 14:43:17,071 - main - WARNING - FORMAT entry PID not defined for 661N. Skipping.
2025-01-10 14:43:17,071 - main - WARNING - FORMAT entry PGT not defined for 661N. Skipping.
2025-01-10 14:43:17,071 - main - WARNING - FORMAT entry PS not defined for 661N. Skipping.
2025-01-10 14:43:17,071 - main - WARNING - FORMAT entry PID not defined for 661T. Skipping.
2025-01-10 14:43:17,071 - main - WARNING - FORMAT entry PGT not defined for 661T. Skipping.
2025-01-10 14:43:17,071 - main - WARNING - FORMAT entry PS not defined for 661T. Skipping.

Command error:
INFO: Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred
INFO: Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred
INFO: Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
Using TensorFlow backend.
INFO:main:Running Epitope Prediction And Annotation version: 1.1
INFO:main:Starting predictions at 2025-01-10 14:43:17
INFO:main:Running epaa for variants...
WARNING:main:FORMAT entry PID not defined for 661N. Skipping.
WARNING:main:FORMAT entry PGT not defined for 661N. Skipping.
WARNING:main:FORMAT entry PS not defined for 661N. Skipping.
WARNING:main:FORMAT entry PID not defined for 661T. Skipping.
WARNING:main:FORMAT entry PGT not defined for 661T. Skipping.
WARNING:main:FORMAT entry PS not defined for 661T. Skipping.
WARNING:main:FORMAT entry PID not defined for 661N. Skipping.
WARNING:main:FORMAT entry PGT not defined for 661N. Skipping.
WARNING:main:FORMAT entry PS not defined for 661N. Skipping.
WARNING:main:FORMAT entry PID not defined for 661T. Skipping.
WARNING:main:FORMAT entry PGT not defined for 661T. Skipping.
WARNING:main:FORMAT entry PS not defined for 661T. Skipping.
Traceback (most recent call last):
File "/home/myhome/.nextflow/assets/nf-core/epitopeprediction/bin/epaa.py", line 1310, in
main()
File "/home/myhome/.nextflow/assets/nf-core/epitopeprediction/bin/epaa.py", line 1133, in main
transcriptProteinTable = ma.get_protein_ids_from_transcripts(transcripts, type=EIdentifierTypes.ENSEMBL)
File "/usr/local/lib/python3.7/site-packages/epytope/IO/MartsAdapter.py", line 898, in get_protein_ids_from_transcripts
dataset_attributes = self.get_dataset_attributes(_db)
File "/usr/local/lib/python3.7/site-packages/epytope/IO/MartsAdapter.py", line 232, in get_dataset_attributes
df_part.columns = ['attribute_id', 'attribute_name', 'description']
File "/usr/local/lib/python3.7/site-packages/pandas/core/generic.py", line 5500, in setattr
return object.setattr(self, name, value)
File "pandas/_libs/properties.pyx", line 70, in pandas._libs.properties.AxisProperty.set
File "/usr/local/lib/python3.7/site-packages/pandas/core/generic.py", line 766, in _set_axis
self._mgr.set_axis(axis, labels)
File "/usr/local/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 216, in set_axis
self._validate_set_axis(axis, new_labels)
File "/usr/local/lib/python3.7/site-packages/pandas/core/internals/base.py", line 58, in _validate_set_axis
f"Length mismatch: Expected axis has {old_len} elements, new "
ValueError: Length mismatch: Expected axis has 1 elements, new values have 3 elements

@jonasscheid
Copy link
Contributor

Can you go into epaa.py and use your input files of the work directory to debug what exactly is happening here?

@christopher-mohr Have you seen this error before?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants