Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ena-submission): Create ena projects #2293

Closed
wants to merge 6 commits into from

Conversation

anna-parker
Copy link
Contributor

@anna-parker anna-parker commented Jul 12, 2024

resolves #2399

preview URL: https://create-ena-projects.loculus.org/

Summary

This adds the following rule to the ena-submission snakemake file :

  • project_creation rule . This function will continuously (in a loop) scan for new sequences where a project needs to be created and trigger their creation. It will also update both the submission_table and the project_table.
  • It also adds some basic unit tests for sub-functions used by create_projects.

High level overview of project_creation:

In a loop:

  1. Get sequences in submission_table in state READY_TO_SUBMIT
  • if (there exists an entry in the project_table for the corresponding (group_id, organism)):
    -- if (entry is in status SUBMITTED): update submission_table to SUBMITTED_PROJECT.
    -- else: update submission_table to SUBMITTING_PROJECT.
  • else: create project entry in project_table for (group_id, organism).
  1. Get sequences in submission_table in state SUBMITTING_PROJECT
  • if (corresponding project_table entry is in state SUBMITTED): update entries to state SUBMITTED_PROJECT.
  1. Get sequences in project_table in state READY, prepare submission object, set status to SUBMITTING
  • if (submission succeeds): set status to SUBMITTED and fill in results: the result of a successful submission is a "bioproject_accession" and an ena-internal "ena_submission_accession".
  • else: set status to HAS_ERRORS and fill in errors
  1. Get sequences in project_table in state HAS_ERRORS for over 15min and sequences in status SUBMITTING for over 15min: #TODO (handle failure ena-submission: Recover from failed project/sample/assembly submission #2311), currently just throw an error

ENA Project

This PR will create projects in ENA with the following attributes:

<PROJECT_SET>
    <PROJECT center_name="{{ Institution }}" alias="{{ group_id }}:{{ organism }}:{{ unique_id }}">
        <NAME>{{ scientific_name }}</NAME>
        <TITLE>{{ scientific_name }} Genome sequencing</TITLE>
        <DESCRIPTION>Automated upload of {{ scientific_name }} sequences submitted by {{ Institution }} from {{ db }}.</DESCRIPTION>
        <SUBMISSION_PROJECT>
            <SEQUENCING_PROJECT/>
            <ORGANISM>
            <TAXON_ID>{{ taxon_id }}</TAXON_ID>
            <SCIENTIFIC_NAME>{{ scientific_name }}</SCIENTIFIC_NAME>
            </ORGANISM>
        </SUBMISSION_PROJECT>
        <PROJECT_LINKS>
            <PROJECT_LINK>
                <XREF_LINK>
                <DB>{{ db }}</DB>
                <ID>{{ group_accession }}</ID>
                </XREF_LINK>
            </PROJECT_LINK>
        </PROJECT_LINKS>
    </PROJECT>
</PROJECT_SET>

Screenshot

PR Checklist

image it gets uploaded to the submission_table and then the project_table: image when it succeeds: image (also the submission_table is updated) when not successful: image

@anna-parker anna-parker force-pushed the create_ena_projects branch 2 times, most recently from 8814be7 to ab8ec79 Compare August 8, 2024 16:08
@anna-parker anna-parker marked this pull request as ready for review August 8, 2024 21:02
@anna-parker anna-parker force-pushed the create_ena_projects branch 4 times, most recently from 65ce5d2 to a11d5b0 Compare August 13, 2024 14:49
@anna-parker anna-parker added the preview Triggers a deployment to argocd label Aug 13, 2024
@@ -6,7 +6,7 @@ disableWebsite: false
disableBackend: false
disablePreprocessing: false
disableIngest: false
disableEnaSubmission: true
disableEnaSubmission: false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For final PR it would be nice to do this in the values_preview_server.yaml rather than in the default Helm chart - until ENA submission is fully mature

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and actually even after that, because many Loculus instances wouldn't use this)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added in #2831

@corneliusroemer
Copy link
Contributor

Drive-by edits in this PR - might be good to merge these frequently to prevent merge conflicts from getting out of hand with the many branches open: #2663

@corneliusroemer corneliusroemer added the insdc tasks related to ENA/INSDC submission label Aug 29, 2024
@anna-parker anna-parker added preview Triggers a deployment to argocd and removed preview Triggers a deployment to argocd labels Aug 29, 2024
@anna-parker anna-parker removed preview Triggers a deployment to argocd labels Sep 11, 2024
@corneliusroemer corneliusroemer added preview Triggers a deployment to argocd and removed preview Triggers a deployment to argocd labels Sep 16, 2024
@anna-parker
Copy link
Contributor Author

Closing as this was part of #2417

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
insdc tasks related to ENA/INSDC submission
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENA submission: Create assembly
3 participants