-
Notifications
You must be signed in to change notification settings - Fork 80
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[MRG] Build a
LineageDB
interface for taxonomy databases/information (
#1651) * start making a LineageDB * further abstract taxonomy loading * replace rest of taxonomy parsing * get some basic sqlite stuff going * refactoring, simplification, optimization * fix a few things, alert on bad arg combination * add combine_tax CLI command under tax * rename 'combine_tax' to 'prepare' * allow output database type for tax prepare * format switching now works for sql and csv * adjust loading to try/fail rather than suffix * refactor taxonomy load * fix tests * basic tests for in/out formats * add tests for trying to load bad files * raise appropriate error message * fix with pytest.raises foo * some more tests * more tests, fix bug * make available_ranks work with sqlite db * re-add test for available_ranks * produce more useful errors for dups, and restore test code * catch exception for database already exists * test split idents and keep versions * align keyword args with CLI args * add more end-to-end tests * alias --taxonomy-csv to --taxonomy * 'prepare' docs * clean up
- Loading branch information
Showing
14 changed files
with
1,003 additions
and
197 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
"""combine multiple taxonomy databases into one.""" | ||
|
||
usage=""" | ||
sourmash tax prepare --taxonomy-csv <taxonomy_file> [ ... ] -o <output> | ||
The 'tax prepare' command reads in one or more taxonomy databases | ||
and saves them into a new database. It can be used to combine databases | ||
in the desired order, as well as output different database formats. | ||
Please see the 'tax prepare' documentation for more details: | ||
https://sourmash.readthedocs.io/en/latest/command-line.html#sourmash-tax-prepare-prepare-and-or-combine-taxonomy-files | ||
""" | ||
|
||
import sourmash | ||
from sourmash.logging import notify, print_results, error | ||
|
||
|
||
def subparser(subparsers): | ||
subparser = subparsers.add_parser('prepare', | ||
usage=usage) | ||
subparser.add_argument( | ||
'-q', '--quiet', action='store_true', | ||
help='suppress non-error output' | ||
) | ||
subparser.add_argument( | ||
'-t', '--taxonomy-csv', '--taxonomy', metavar='FILE', | ||
nargs="+", required=True, | ||
help='database lineages' | ||
) | ||
subparser.add_argument( | ||
'-o', '--output', required=True, | ||
help='output file', | ||
) | ||
subparser.add_argument( | ||
'-F', '--database-format', | ||
help="format of output file; default is 'sql')", | ||
default='sql', | ||
choices=['csv', 'sql'], | ||
) | ||
subparser.add_argument( | ||
'--keep-full-identifiers', action='store_true', | ||
help='do not split identifiers on whitespace' | ||
) | ||
subparser.add_argument( | ||
'--keep-identifier-versions', action='store_true', | ||
help='after splitting identifiers, do not remove accession versions' | ||
) | ||
subparser.add_argument( | ||
'--fail-on-missing-taxonomy', action='store_true', | ||
help='fail quickly if taxonomy is not available for an identifier', | ||
) | ||
subparser.add_argument( | ||
'-f', '--force', action = 'store_true', | ||
help='continue past errors in file and taxonomy loading', | ||
) | ||
|
||
def main(args): | ||
import sourmash | ||
return sourmash.tax.__main__.prepare(args) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.