Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

env config not working with GCP Batch #5623

Open
nick-youngblut opened this issue Dec 21, 2024 · 2 comments
Open

env config not working with GCP Batch #5623

nick-youngblut opened this issue Dec 21, 2024 · 2 comments

Comments

@nick-youngblut
Copy link
Contributor

Bug report

My nextflow.config contains:

env {
    MY_ENV_VAR = "my_value"
}

When the executor is local, the process using MY_ENV_VAR via os.environ["MY_ENV_VAR"] runs successfully.
However, when the executor is google-batch, the process throws KeyError: 'MY_ENV_VAR'.

https://www.nextflow.io/docs/latest/reference/config.html#env does not state that env is not supported for google-batch, so I'm assuming that there is either a bug or a lack of docs.

Expected behavior and actual behavior

Environmental variables set via the env config scope should be available for GCP Batch jobs... or the docs should explicitly state that env is not supported for GCP Batch.

Steps to reproduce the problem

  • Set variables in the env scope
  • Run locally; processes using the env variable should succeed
  • Run on GCP Batch; processes using the env variable should fail

Environment

  • Nextflow version: 24.10.2
  • Java version: 21.0.0
  • Operating system: Ubuntu 22.04.5
  • Bash version: 5.1.16
@pditommaso
Copy link
Member

I'm unable to replicate this. Can you please provide a test case ?

@nick-youngblut
Copy link
Contributor Author

My reproducible example:

main.nf

workflow { 
    PRINT_ENV()   
    PRINT_ENV_PY()  
}

process PRINT_ENV {
    publishDir file(params.output_dir),  mode: "copy", overwrite: true
    container "ubuntu:latest"
    
    output:
    path "out.txt"

    script:
    """
    echo "$my_env_var" > out.txt
    """
}

process PRINT_ENV_PY {
    publishDir file(params.output_dir),  mode: "copy", overwrite: true
    container "python:3.11"
    
    output:
    path "out_py.txt"

    script:
    """
    test.py > out_py.txt
    """
}

nextflow.config

params {
    output_dir = "pipeline_output"
}

env {
    my_env_var = "TEST"
}

profiles {
    gcp {
        workDir            = "gs://"###/sandbox/work"
        fusion.enabled     = false
        wave.enabled       = false
        params.output_dir  = "gs://"###/sandbox/pipeline_output"
        process {
            executor       = "google-batch"
            errorStrategy  = "retry"
            maxRetries     = 2
            scratch        = true
        }
        google {
            project   = ""###"
            location  = ""###"
            batch {
                serviceAccountEmail = "###"
                spot                             = true
                maxSpotAttempts     = 3
            }
        }
    }
}

test.py

#!/usr/bin/env python3
import os

if __name__ == '__main__':
    print(os.getenv("my_env_var"))

Results

out.txt will always contain TEST when run locally and on GCP Batch. However, out_py.txt will contain TEST when run locally and None when run on GCP.

Notes

If the following is used:

process PRINT_ENV_PY {
    publishDir file(params.output_dir),  mode: "copy", overwrite: true
    container "python:3.11"
    
    output:
    path "out_py.txt"

    script:
    """
    export my_env_var="$my_env_var"
    test.py > out_py.txt
    """
}

Then, the env variable value ("TEST") is written out to out_py.txt. Importantly, exporting the env variable is only needed when running the pipeline on GCP Batch (versus local).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants