Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: CLIN-3267 draft #406

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

LysianeBouchard
Copy link
Contributor

@LysianeBouchard LysianeBouchard commented Oct 10, 2024

WORK IN PROGRESS

I am opening this pull request for transparency and welcome any feedback. However, please note that this work is not final and not ready to be merged.

DESCRIPTION

This PR introduces the etl variant annotation dag. It also restructure the nextflow operator to mimick the structure in cqdg, i.e. we introduce similar base config / operator classes.

So far, this code has been tested locally with Minikube for test_nextflow_operator and etl_variant_annotation. It has not yet been tested on etl.py.

There are a few TODOs in the code that need to be addressed, especially regarding the variant annotation DAG.

CQDG ALIGNMENT
This implementation reuses a configuration mechanism similar to CQDG, with the following key differences:

  • 2 different k8s contexts: In this repo, there are 2 different k8s contexts: ETL and DEFAULT. We assume that we will create 2 different instances of KubeConfig, one for each context. So far only the ETL context KubeConfig instance is created.
  • service_account_name specified in BaseConfig instead of KubeConfig: This is because nextflow and spark operators have different service account names. I thought the intention of the KubeConfig class is to capture what is always the same and thought that, since it can vary, perhaps it should go in BaseConfig instead. There are other solutions of course. Let me know if you are ok with this one.
  • shallow copy instead deep copy: In BaseConfig, shallow copy is used instead deep copy in build_operator and partial methods. This is to avoid serializing more complex objects as dictionaries.
  • BaseConfig.args renamed to BaseConfig.append_args: I suspect that one might think that args overwrite the arguments instead appending, but well this is subjective.

NEXTFLOW OPERATOR
The nextflow operator now inherits the BaseKubernetesOperator . It is also modified to support multiple config maps.

VARIANT ANNOTATION
We introduce a new variant annotation dag. This part needs a bit more work.

@LysianeBouchard LysianeBouchard marked this pull request as draft October 10, 2024 03:27
@LysianeBouchard LysianeBouchard force-pushed the feat/CLIN-3267-add-variant-annotation-dag branch 2 times, most recently from a28ffa9 to 7ed51f0 Compare October 10, 2024 04:26
@LysianeBouchard LysianeBouchard force-pushed the feat/CLIN-3267-add-variant-annotation-dag branch from 7ed51f0 to e84147f Compare October 10, 2024 19:37
Temporary commit to avoid losing work locally.

This code works locally with minikube for test_nextflow_operator and etl_variant_annotation.
It is not tested on etl.py.

It reuse a configuration mechanism similar as in cqdg
Key modifications in comparison to cqdg:
- service_account_name specified in BaseConfig instead KubeConfig
- shallow copy in build_operator and partial methods in BaseConfig
- BaseConfig.args renamed to BaseConfig.append_args

There are a few todos in the code to address.
@LysianeBouchard LysianeBouchard force-pushed the feat/CLIN-3267-add-variant-annotation-dag branch from e84147f to 8f21246 Compare October 18, 2024 14:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

1 participant