Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect validation messages for FASTA sequence upload errors #2922

Open
corneliusroemer opened this issue Oct 1, 2024 · 4 comments
Open
Labels
backend related to the loculus backend component bug Something isn't working

Comments

@corneliusroemer
Copy link
Contributor

The backend currently responds not very helpfully when the FASTA upload file from a submitting user is not correct. We should make the validation more precise. Maybe we could try to use an existing FASTA parsing library, if a good one exists, rather than hand roll our own.

Example error at the moment:

Metadata file contains 2 submissionIds that are not present in the sequence file: custom0, custom3; Sequence file contains 1 submissionIds that are not present in the metadata file: custom0

For this fasta tsv and metadata tsv

>custom1
ACTG
>custom2
ACTG
                    >custom3
ACTG
>custom4
ACTG
ACTG
>custom7
                                            
>custom8
ACTG
>custom9
>custom6


>custom5
ACTG
>custom0                       
ACTG
submissionId	date	region	country	division	host
custom4	2020-12-03	Europe	Switzerland	Zürich	Homo sapiens
custom0	2020-12-26	Europe	Switzerland	Bern	Homo sapiens
custom1	2020-12-15	Europe	Switzerland	Schaffhausen	Homo sapiens
custom2	2020-12-02	Europe	Switzerland	Bern	Homo sapiens
custom6	2020-12-16	Europe	Switzerland	Aargau
custom3		Europe	Switzerland	Schaffhausen	Homo sapiens
custom5	2020-12-23	Europe	Switzerland	Basel-Land	Homo sapiens
custom7	2XXXXX	Europe	Switzerland	Sankt Gallen	Homo sapiens
custom8	2020-12-16	Europe	Switzerland	Aargau	Homo sapiens
custom9	2020-12-01	Europe	Switzerland	Basel-Stadt	Homo sapiens
@corneliusroemer corneliusroemer added bug Something isn't working backend related to the loculus backend component labels Oct 1, 2024
@corneliusroemer
Copy link
Contributor Author

Some other edge cases to try:

  • Emojis in submissionId

@chaoran-chen
Copy link
Member

Interesting, what would be the correct behavior? Is this an invalid fasta file or should the whitespace be ignored?

@theosanderson
Copy link
Member

IMO this is an invalid file. To me this is relatively low priority though nice to have in due course.

@corneliusroemer
Copy link
Contributor Author

corneliusroemer commented Oct 2, 2024

It's an invalid fasta and it's a bug that we provide an incorrect error message to submitters, though as Theo says not high priority bug

I think per fasta spec (there's not a single one, but I think there are few ambiguities) this is invalid but of course better to double check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend related to the loculus backend component bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants