-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add insdc_submission status table to the backend #2078
Comments
Should the backend send data to ENA or would we again have a separate script for that? @corneliusroemer and I were thinking about the latter where a dedicated ENA upload service (see #331) would fetch data from the backend that haven't been submitted and pass the INSDC submission status and accession information back to the backend. Hereby, instead of having dedicated INSDC accession and status columns in the database, we could, more generally, introduce "managed metadata" which are associated with each sequence and can be set by the INSDC submission service (but not directly by the submitters). The advantage of that concept is (1) the submission service can be developed more independently and decide that it wants to store without modifying the database schema and (2) it can be re-used for other things in the future (e.g., if we want to submit the sequences not only to INSDC but also to another service). |
Ah sorry if this is unclear - this is all with the idea to have a pod (most likely with a snakemake pipeline) that will wrap submissions to ENA - this is just a preliminary list of requirements for the backend endpoints that such a pipeline would require - I will make that clearer |
@chaoran-chen I like the idea of having less structured submission metadata fields to enable upload to multiple databases. But I still think it might be good to have two tables (one for sequence submission status and one for group submission status) as I think this is a common structure across databases. Maybe I could create tables which have a submission metadata column which contains a dictionary that we can add any type of information to? I do think keeping the submission status in a table (in the same way as for preprocessing) is a good design idea. Also, after submission to ENA we want to add the genbank accession to the sequence view page - so we will still have to structure the metadata in a specific manner so that we can retrieve this value. |
For INSDC/ENA we want to have a pod (similar to the prepro pod) that will tackle the submission to ENA and handle issues, however we will need to have backend endpoints that give this pod the data that needs to be submitted and will also store the submission status and the new ENA accessions. This means adding 1-2 tables to the backend and and 2+ endpoints.
Note that in order to submit to ENA we require:
and then finally we will need to submit the actual sequence data (an analysis in ENA with its own unique accession) using these two(+) accession values.
Ideally, we will create 2 tables in our postgres DB mapping:
Probably we would like to have the status fields: PENDING, PROCESSING, COMPLETED and FAILED.
We might also want to store the number of attempts.
The Kotlin code
The text was updated successfully, but these errors were encountered: