Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

column not in metadata? #815

Closed
marwa38 opened this issue Dec 12, 2024 · 8 comments
Closed

column not in metadata? #815

marwa38 opened this issue Dec 12, 2024 · 8 comments

Comments

@marwa38
Copy link

marwa38 commented Dec 12, 2024

Hi

I am using singularity v3.8.5 and nextflow v24.04.3 and ampliseq v2.12.0 and getting the following error while I am sure that I have those column headers (column Region, column Regime).
Here is my .sh job

#!/bin/bash
#SBATCH --partition=uoa-compute
#SBATCH --cpus-per-task=24
#SBATCH --time=48:00:00
#SBATCH --mem-per-cpu=6g
module load nextflow/24.04
module load singularity/3.8.5
module load  r/4.4.0
nextflow run nf-core/ampliseq -r 2.12.0 -name NP_intesParts_ampliseq_silva138_GG2 -profile singularity -resume -params-file nf-params.json

and metadata screenshot
Image

nf-core/ampliseq execution completed unsuccessfully!
The exit status of the task that caused the workflow execution to fail was: 1.

The full error message was:

Error executing process > 'NFCORE_AMPLISEQ:AMPLISEQ:QIIME2_ANCOM:ANCOMBC_FORMULA_TAX (filtered-table-"Region*Regime"-3)'

Caused by:
  Process NFCORE_AMPLISEQ:AMPLISEQ:QIIME2_ANCOM:ANCOMBC_FORMULA_TAX (filtered-table-"Region*Regime"-3) terminated with an error exit status (1)


Command executed:

  export XDG_CONFIG_HOME="./xdgconfig"
  export MPLCONFIGDIR="./mplconfigdir"
  export NUMBA_CACHE_DIR="./numbacache"
  
  # Sum data at the specified level
  qiime taxa collapse \
      --i-table "filtered-table.qza" \
      --i-taxonomy "taxonomy.qza" \
      --p-level 3 \
      --o-collapsed-table "lvl3-"Region*Regime".qza"
  
  # Extract summarised table and output a file with the number of taxa
  qiime tools export \
      --input-path "lvl3-"Region*Regime".qza" \
      --output-path exported/
  biom convert \
      -i exported/feature-table.biom \
      -o "lvl3-"Region*Regime".feature-table.tsv" \
      --to-tsv
  
  if [ $(grep -v '^#' -c "lvl3-"Region*Regime".feature-table.tsv") -lt 2 ]; then
      mkdir differentials
      echo 3 > differentials/"WARNING Summing your data at taxonomic level 3 produced less than two rows (taxa), ANCOMBC can't proceed -- did you specify a bad reference taxonomy?".txt
      mkdir da_barplot
      echo 3 > da_barplot/"WARNING Summing your data at taxonomic level 3 produced less than two rows (taxa), ANCOMBC can't proceed -- did you specify a bad reference taxonomy?".txt
  else
      qiime composition ancombc \
          --i-table "lvl3-"Region*Regime".qza" \
          --m-metadata-file "metadata.tsv" \
          --p-reference-levels "Region=pyloric" --p-prv-cut 0.1 --p-lib-cut 500 --p-alpha 0.05 --p-conserve \
          --p-formula '"Region*Regime"' \
          --o-differentials "lvl3-"Region*Regime".differentials.qza" \
          --verbose
      qiime tools export \
          --input-path "lvl3-"Region*Regime".differentials.qza" \
          --output-path "differentials/Category-"Region*Regime"-level-3"
  
      # Generate tabular view of ANCOM-BC output
      qiime composition tabulate \
          --i-data "lvl3-"Region*Regime".differentials.qza" \
          --o-visualization "lvl3-"Region*Regime".differentials.qzv"
      qiime tools export \
          --input-path "lvl3-"Region*Regime".differentials.qzv" \
          --output-path "differentials/Category-"Region*Regime"-level-3"
  
      # Generate bar plot views of ANCOM-BC output
      qiime composition da-barplot \
          --i-data "lvl3-"Region*Regime".differentials.qza" \
          --p-effect-size-threshold 1 --p-significance-threshold 0.05 --p-label-limit 1000 \
          --o-visualization "lvl3-"Region*Regime".da_barplot.qzv"
      qiime tools export --input-path "lvl3-"Region*Regime".da_barplot.qzv" \
          --output-path "da_barplot/Category-"Region*Regime"-level-3"
  fi
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_AMPLISEQ:AMPLISEQ:QIIME2_ANCOM:ANCOMBC_FORMULA_TAX":
      qiime2: $( qiime --version | sed '1!d;s/.* //' )
  END_VERSIONS

Command exit status:
  1

Command output:
  Saved FeatureTable[Frequency] to: lvl3-Region*Regime.qza
  Exported lvl3-Region*Regime.qza as BIOMV210DirFmt to directory exported/

Command error:
  QIIME is caching your current deployment for improved performance. This may take a few moments and should only happen once per deployment.
  Saved FeatureTable[Frequency] to: lvl3-Region*Regime.qza
  Exported lvl3-Region*Regime.qza as BIOMV210DirFmt to directory exported/
  Traceback (most recent call last):
    File "/opt/conda/envs/qiime2-2023.7/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3802, in get_loc
      return self._engine.get_loc(casted_key)
    File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
    File "pandas/_libs/index.pyx", line 165, in pandas._libs.index.IndexEngine.get_loc
    File "pandas/_libs/hashtable_class_helper.pxi", line 5745, in pandas._libs.hashtable.PyObjectHashTable.get_item
    File "pandas/_libs/hashtable_class_helper.pxi", line 5753, in pandas._libs.hashtable.PyObjectHashTable.get_item
  KeyError: '"Region*Regime"'
  
  The above exception was the direct cause of the following exception:
  
  Traceback (most recent call last):
    File "/opt/conda/envs/qiime2-2023.7/lib/python3.8/site-packages/qiime2/metadata/metadata.py", line 637, in get_column
      series = self._dataframe[name]
    File "/opt/conda/envs/qiime2-2023.7/lib/python3.8/site-packages/pandas/core/frame.py", line 3807, in __getitem__
      indexer = self.columns.get_loc(key)
    File "/opt/conda/envs/qiime2-2023.7/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3804, in get_loc
      raise KeyError(key) from err
  KeyError: '"Region*Regime"'
  
  During handling of the above exception, another exception occurred:
  
  Traceback (most recent call last):
    File "/opt/conda/envs/qiime2-2023.7/lib/python3.8/site-packages/q2cli/commands.py", line 478, in __call__
      results = self._execute_action(
    File "/opt/conda/envs/qiime2-2023.7/lib/python3.8/site-packages/q2cli/commands.py", line 539, in _execute_action
      results = action(**arguments)
    File "", line 2, in ancombc
    File "/opt/conda/envs/qiime2-2023.7/lib/python3.8/site-packages/qiime2/sdk/action.py", line 342, in bound_callable
      outputs = self._callable_executor_(
    File "/opt/conda/envs/qiime2-2023.7/lib/python3.8/site-packages/qiime2/sdk/action.py", line 566, in _callable_executor_
      output_views = self._callable(**view_args)
    File "/opt/conda/envs/qiime2-2023.7/lib/python3.8/site-packages/q2_composition/_ancombc.py", line 41, in ancombc
      return _ancombc(
    File "/opt/conda/envs/qiime2-2023.7/lib/python3.8/site-packages/q2_composition/_ancombc.py", line 121, in _ancombc
      metadata.get_column(term)
    File "/opt/conda/envs/qiime2-2023.7/lib/python3.8/site-packages/qiime2/metadata/metadata.py", line 640, in get_column
      raise ValueError(
  ValueError: '"Region*Regime"' is not a column in the metadata. Available columns: 'sample.name', 'sample.no', 'sample', 'Phase', 'Region', 'Regime', 'Region_Regime', 'sample_regime', 'sample_or_control', 'Sample_Type'
  
  Plugin error from composition:
  
    '"Region*Regime"' is not a column in the metadata. Available columns: 'sample.name', 'sample.no', 'sample', 'Phase', 'Region', 'Regime', 'Region_Regime', 'sample_regime', 'sample_or_control', 'Sample_Type'
  
  See above for debug info.

Work dir:
  /uoa/home/r02km21/Documents/work/9f/72747339026f10cdd85e839875bb31

Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out
@d4straub
Copy link
Collaborator

d4straub commented Dec 12, 2024

Hi there, could you also show the contents of nf-params.json? Specifically where you define the ancombc formula, I suspect that there might be an issue.

@marwa38
Copy link
Author

marwa38 commented Dec 12, 2024

thanks for your reply:
{
"input": "./samplesheet.txt",
"input_folder": "./samples/",
"FW_primer": "CCTACGGGNGGGCWGCAG",
"RV_primer": "GACTACHVGGGTATCTAATCC",
"metadata": "./metadata.tsv",
"outdir": "./results",
"save_intermediates": true,
"extension": "./samples/_R{1,2}.fq.gz",
"qiime_ref_taxonomy": "greengenes2",
"metadata_category": "Region,Regime",
"metadata_category_barplot": "Region,Regime",
"qiime_adonis_formula": ""Region
Regime"",
"picrust": true,
"tax_agglom_max": 7,
"ancom": true,
"ancombc": true,
"ancombc_formula": ""Region*Regime"",
"ancombc_formula_reflvl": ""Region=pyloric"",
"skip_cutadapt": true
}

@d4straub
Copy link
Collaborator

Try it with " instead of ""

@marwa38
Copy link
Author

marwa38 commented Dec 12, 2024

well-noted
here is what is submitted

{
"input": "./samplesheet.txt",
"input_folder": "./samples/",
"FW_primer": "CCTACGGGNGGGCWGCAG",
"RV_primer": "GACTACHVGGGTATCTAATCC",
"metadata": "./metadata.tsv",
"outdir": "./results",
"save_intermediates": true,
"email": "[email protected]",
"extension": "./samples/_R{1,2}.fq.gz",
"qiime_ref_taxonomy": "greengenes2",
"metadata_category": "Region,Regime",
"metadata_category_barplot": "Region,Regime",
"qiime_adonis_formula": "Region
Regime",
"picrust": true,
"tax_agglom_max": 7,
"ancom": true,
"ancombc": true,
"ancombc_formula": "Region*Regime",
"ancombc_formula_reflvl": "Region=pyloric",
"skip_cutadapt": true
}

My turn in the cluster queue usually works for me at 11pm!
will update wether I need help.
Thanks again
Marwa

@marwa38
Copy link
Author

marwa38 commented Dec 12, 2024

I got the following error

nf-core/ampliseq execution completed unsuccessfully!
The exit status of the task that caused the workflow execution to fail was: 1.

The full error message was:

Error executing process > 'NFCORE_AMPLISEQ:AMPLISEQ:QIIME2_ANCOM:ANCOMBC_FORMULA_TAX (filtered-table-Region*Regime-5)'

Caused by:
  Process `NFCORE_AMPLISEQ:AMPLISEQ:QIIME2_ANCOM:ANCOMBC_FORMULA_TAX (filtered-table-Region*Regime-5)` terminated with an error exit status (1)


Command executed:

  export XDG_CONFIG_HOME="./xdgconfig"
  export MPLCONFIGDIR="./mplconfigdir"
  export NUMBA_CACHE_DIR="./numbacache"
  
  # Sum data at the specified level
  qiime taxa collapse \
      --i-table "filtered-table.qza" \
      --i-taxonomy "taxonomy.qza" \
      --p-level 5 \
      --o-collapsed-table "lvl5-Region*Regime.qza"
  
  # Extract summarised table and output a file with the number of taxa
  qiime tools export \
      --input-path "lvl5-Region*Regime.qza" \
      --output-path exported/
  biom convert \
      -i exported/feature-table.biom \
      -o "lvl5-Region*Regime.feature-table.tsv" \
      --to-tsv
  
  if [ $(grep -v '^#' -c "lvl5-Region*Regime.feature-table.tsv") -lt 2 ]; then
      mkdir differentials
      echo 5 > differentials/"WARNING Summing your data at taxonomic level 5 produced less than two rows (taxa), ANCOMBC can't proceed -- did you specify a bad reference taxonomy?".txt
      mkdir da_barplot
      echo 5 > da_barplot/"WARNING Summing your data at taxonomic level 5 produced less than two rows (taxa), ANCOMBC can't proceed -- did you specify a bad reference taxonomy?".txt
  else
      qiime composition ancombc \
          --i-table "lvl5-Region*Regime.qza" \
          --m-metadata-file "metadata.tsv" \
          --p-reference-levels Region=pyloric --p-prv-cut 0.1 --p-lib-cut 500 --p-alpha 0.05 --p-conserve \
          --p-formula 'Region*Regime' \
          --o-differentials "lvl5-Region*Regime.differentials.qza" \
          --verbose
      qiime tools export \
          --input-path "lvl5-Region*Regime.differentials.qza" \
          --output-path "differentials/Category-Region*Regime-level-5"
  
      # Generate tabular view of ANCOM-BC output
      qiime composition tabulate \
          --i-data "lvl5-Region*Regime.differentials.qza" \
          --o-visualization "lvl5-Region*Regime.differentials.qzv"
      qiime tools export \
          --input-path "lvl5-Region*Regime.differentials.qzv" \
          --output-path "differentials/Category-Region*Regime-level-5"
  
      # Generate bar plot views of ANCOM-BC output
      qiime composition da-barplot \
          --i-data "lvl5-Region*Regime.differentials.qza" \
          --p-effect-size-threshold 1 --p-significance-threshold 0.05 --p-label-limit 1000 \
          --o-visualization "lvl5-Region*Regime.da_barplot.qzv"
      qiime tools export --input-path "lvl5-Region*Regime.da_barplot.qzv" \
          --output-path "da_barplot/Category-Region*Regime-level-5"
  fi
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_AMPLISEQ:AMPLISEQ:QIIME2_ANCOM:ANCOMBC_FORMULA_TAX":
      qiime2: $( qiime --version | sed '1!d;s/.* //' )
  END_VERSIONS

Command exit status:
  1

Command output:
  Saved FeatureTable[Frequency] to: lvl5-Region*Regime.qza
  Exported lvl5-Region*Regime.qza as BIOMV210DirFmt to directory exported/

Command error:
  QIIME is caching your current deployment for improved performance. This may take a few moments and should only happen once per deployment.
  Saved FeatureTable[Frequency] to: lvl5-Region*Regime.qza
  Exported lvl5-Region*Regime.qza as BIOMV210DirFmt to directory exported/
  Traceback (most recent call last):
    File "/opt/conda/envs/qiime2-2023.7/lib/python3.8/site-packages/q2cli/commands.py", line 478, in __call__
      results = self._execute_action(
    File "/opt/conda/envs/qiime2-2023.7/lib/python3.8/site-packages/q2cli/commands.py", line 539, in _execute_action
      results = action(**arguments)
    File "", line 2, in ancombc
    File "/opt/conda/envs/qiime2-2023.7/lib/python3.8/site-packages/qiime2/sdk/action.py", line 342, in bound_callable
      outputs = self._callable_executor_(
    File "/opt/conda/envs/qiime2-2023.7/lib/python3.8/site-packages/qiime2/sdk/action.py", line 566, in _callable_executor_
      output_views = self._callable(**view_args)
    File "/opt/conda/envs/qiime2-2023.7/lib/python3.8/site-packages/q2_composition/_ancombc.py", line 41, in ancombc
      return _ancombc(
    File "/opt/conda/envs/qiime2-2023.7/lib/python3.8/site-packages/q2_composition/_ancombc.py", line 143, in _ancombc
      raise ValueError('Too few column-value pair separators found'
  ValueError: Too few column-value pair separators found (`::`) in the following `reference_level`: "Region=pyloric"
  
  Plugin error from composition:
  
    Too few column-value pair separators found (`::`) in the following `reference_level`: "Region=pyloric"
  
  See above for debug info.

Work dir:
  /uoa/home/r02km21/Documents/work/45/03db1eb43b9f2f8007c3d7387d2f99

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

@marwa38
Copy link
Author

marwa38 commented Dec 12, 2024

I reduced the memory and cpus used and I found I was able to run my .sh job
#!/bin/bash
#SBATCH --partition=uoa-compute
#SBATCH --cpus-per-task=6
#SBATCH --mem=36G # Request 36 GB of memory
#SBATCH --mail-type=ALL
module load nextflow/24.04
module load singularity/3.8.5
module load r/4.4.0
nextflow run nf-core/ampliseq -r 2.12.0 -name gg_ampliseq_intesParts_silva -profile singularity -resume -params-file nf-params.json

@d4straub
Copy link
Collaborator

Please check out the line
Too few column-value pair separators found (::) in the following reference_level: "Region=pyloric"
and correct accordingly.
So thats seems solved?

@marwa38
Copy link
Author

marwa38 commented Dec 12, 2024

Many thanks
So far so good.
This means that I have to remove the quotation marks in the formulas as they are added automatically in nf-params.json
Also I have to add :: rather than =

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants