Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transform-strain-name: build strain name by concatenating fields #1515

Open
joverlee521 opened this issue Oct 3, 2023 · 1 comment
Open
Labels
enhancement New feature or request

Comments

@joverlee521
Copy link
Contributor

joverlee521 commented Oct 3, 2023

Context

Following the naming pattern set in SARS-CoV-2 sequences, strain names are usually <country>/<sample_id>/<year>. All three fields are typically available in the metadata so we can concatenate them to "build" a reasonable strain name.

Description

We could extend the existing augur curate transform-strain-name to accept input columns that are concatenated with a provided separator.

Examples

@joverlee521 joverlee521 added the enhancement New feature or request label Oct 3, 2023
@joverlee521
Copy link
Contributor Author

joverlee521 commented Oct 3, 2023

I briefly explored if I could recreate Cornelius' script with csvtk mutate2, but ran into an error:

$ csvtk -t mutate2 -e ' $country + "/" + $accession + "/" + $date ' -n strain_display -s monkeypox-metadata.tsv 
[ERRO] Cannot transition token types from MODIFIER [+] to TIME [2007-10-30 00:00:00 -0700 PDT]

Edit: csvtk also converts dates to floats. This behavior will not change until the underlying evaluation package is updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant