Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor!: update cli #324

Merged
merged 15 commits into from
Apr 3, 2024
Merged

refactor!: update cli #324

merged 15 commits into from
Apr 3, 2024

Conversation

korikuzma
Copy link
Member

@korikuzma korikuzma commented Apr 2, 2024

close #244 . I tried not to make too many changes. I do think we should split up commands in a separate issue. This PR was just focused on updating the CLI for normalizer updates

Changes:

  • Removed --load_normalizers_db. I think this should be an option in the normalizer themselves, similar to --update_all and --udpate_merged. I removed it in the CLI because it would instantiate the normalizer ddb twice
  • Removed the class CLI

Running locally on mac intel (I added all files to s3 bucket):

(metakb) RESC02DJ303MD6R:metakb kxk102$ python3 -m metakb.cli -f -u

Loading Disease Normalizer data...
***Using Disease Database Endpoint: http://localhost:8000/***

Deleting NCIt...
Deleted NCIt in 0.07221 seconds.

Loading NCIt...
* Owlready2 * Warning: optimized Cython parser module 'owlready2_optimized' is not available, defaulting to slower Python implementation
Downloading Thesaurus_24.02d.OWL.zip...
100%|██████████████████████████████████████| 38.8M/38.8M [00:07<00:00, 5.79MB/s]
Transforming and loading data to DB...
Loaded NCIt in 347.95758 seconds.
Total time for NCIt: 348.02979 seconds.

Deleting Mondo...
Deleted Mondo in 0.00858 seconds.

Loading Mondo...
Downloading mondo.obo...
100%|██████████████████████████████████████| 31.8M/31.8M [00:04<00:00, 7.70MB/s]
Transforming and loading data to DB...
Loaded Mondo in 650.05009 seconds.
Total time for Mondo: 650.05866 seconds.

Deleting DO...
Deleted DO in 0.00497 seconds.

Loading DO...
Downloading v2024-03-28...
67.9MB [00:13, 5.39MB/s]
Transforming and loading data to DB...
Loaded DO in 201.55781 seconds.
Total time for DO: 201.56278 seconds.

Deleting OncoTree...
Deleted OncoTree in 0.01262 seconds.

Loading OncoTree...
Downloading tree?version=oncotree_latest_stable...
232kB [00:00, 806kB/s]
Transforming and loading data to DB...
Loaded OncoTree in 6.24740 seconds.
Total time for OncoTree: 6.26002 seconds.

Deleting OMIM...
Deleted OMIM in 0.00367 seconds.

Loading OMIM...
Transforming and loading data to DB...
Loaded OMIM in 75.21636 seconds.
Total time for OMIM: 75.22003 seconds.

Deleting normalized records...
Deleted normalized records in 0.00419 seconds.
Constructing normalized records...
Merged concept generation completed in 1354.51143 seconds

Loading Therapy Normalizer data...
***Using Therapy Database Endpoint: http://localhost:8000/***
***Using Disease Database Endpoint: http://localhost:8000/***

Deleting Wikidata...
Deleted Wikidata in 0.01224 seconds.

Loading Wikidata...
Transforming and loading data to DB...
Loaded Wikidata in 238.57198 seconds.
Total time for Wikidata: 238.58422 seconds.

Deleting ChEMBL...
Deleted ChEMBL in 0.00374 seconds.

Loading ChEMBL...
***Using Disease Database Endpoint: http://localhost:8000/***
Downloading chembl_33_sqlite.tar.gz...
100%|█████████████████████████████████████| 4.41G/4.41G [2:45:54<00:00, 476kB/s]
Transforming and loading data to DB...
Loaded ChEMBL in 16164.99018 seconds.
Total time for ChEMBL: 16164.99392 seconds.

Deleting NCIt...
Deleted NCIt in 0.00610 seconds.

Loading NCIt...
Transforming and loading data to DB...
Loaded NCIt in 443.75895 seconds.
Total time for NCIt: 443.76505 seconds.

Deleting DrugBank...
Deleted DrugBank in 0.00448 seconds.

Loading DrugBank...
Downloading all-drugbank-vocabulary...
100%|████████████████████████████████████████| 926k/926k [00:00<00:00, 1.48MB/s]
Transforming and loading data to DB...
Loaded DrugBank in 216.19705 seconds.
Total time for DrugBank: 216.20153 seconds.

Deleting ChemIDplus...
Deleted ChemIDplus in 0.00254 seconds.

Loading ChemIDplus...
Downloading CurrentChemID.xml...
100%|████████████████████████████████████████| 690M/690M [02:55<00:00, 4.11MB/s]
Transforming and loading data to DB...
Loaded ChemIDplus in 338.13642 seconds.
Total time for ChemIDplus: 338.13896 seconds.

Deleting RxNorm...
Deleted RxNorm in 0.00467 seconds.

Loading RxNorm...
Downloading RxNorm_full_04012024.zip...
239MB [00:59, 4.21MB/s] 
Transforming and loading data to DB...
Loaded RxNorm in 374.83363 seconds.
Total time for RxNorm: 374.83830 seconds.

Deleting HemOnc...
Deleted HemOnc in 0.00476 seconds.

Loading HemOnc...
***Using Disease Database Endpoint: http://localhost:8000/***
Downloading 9CY9C6...
267kB [00:02, 103kB/s]  
Transforming and loading data to DB...
Loaded HemOnc in 31.95612 seconds.
Total time for HemOnc: 31.96088 seconds.

Deleting DrugsAtFDA...
Deleted DrugsAtFDA in 0.00335 seconds.

Loading DrugsAtFDA...
Downloading drug-drugsfda-0001-of-0001.json.zip...
100%|██████████████████████████████████████| 8.31M/8.31M [00:01<00:00, 6.05MB/s]
Transforming and loading data to DB...
Loaded DrugsAtFDA in 408.94660 seconds.
Total time for DrugsAtFDA: 408.94995 seconds.

Deleting GuideToPHARMACOLOGY...
Deleted GuideToPHARMACOLOGY in 0.01188 seconds.

Loading GuideToPHARMACOLOGY...
Downloading ligands.tsv...
100%|███████████████████████████████████████| 5.69M/5.69M [00:09<00:00, 638kB/s]
Downloading ligand_id_mapping.tsv...
100%|██████████████████████████████████████| 2.23M/2.23M [00:01<00:00, 1.56MB/s]
Transforming and loading data to DB...
Loaded GuideToPHARMACOLOGY in 283.02096 seconds.
Total time for GuideToPHARMACOLOGY: 283.03284 seconds.

Deleting normalized records...
Deleted normalized records in 0.00369 seconds.
Constructing normalized records...
Merged concept generation completed in 22249.78497 seconds

Loading Gene Normalizer data...
***Using Gene Database Endpoint: http://localhost:8000/***

Deleting HGNC...
Deleted HGNC in 0.00753 seconds.

Loading HGNC...
Loaded HGNC in 921.76053 seconds.
Total time for HGNC: 921.76806 seconds.

Deleting Ensembl...
Deleted Ensembl in 0.00318 seconds.

Loading Ensembl...
Loaded Ensembl in 606.32836 seconds.
Total time for Ensembl: 606.33154 seconds.

Deleting NCBI...
Deleted NCBI in 0.00319 seconds.

Loading NCBI...
Loaded NCBI in 2237.05520 seconds.
Total time for NCBI: 2237.05839 seconds.

Deleting normalized records...
Deleted normalized records in 0.00377 seconds.
Constructing normalized records...
Merged concept generation completed in 6129.71212 seconds
Normalizers database loaded.

@korikuzma korikuzma added priority:medium Medium priority technical debt A feature/requirement implemented in a sub-optimal way & must be re-written. Contrast to "cleanup" labels Apr 2, 2024
@korikuzma korikuzma requested a review from jsstevenson April 2, 2024 13:09
@korikuzma korikuzma self-assigned this Apr 2, 2024
@korikuzma korikuzma linked an issue Apr 2, 2024 that may be closed by this pull request
Base automatically changed from issue-322 to staging April 2, 2024 13:54
@korikuzma
Copy link
Member Author

@jsstevenson I am going to re-run the CDMs using prod env and update the s3 bucket with those

src/metakb/cli.py Outdated Show resolved Hide resolved
src/metakb/cli.py Show resolved Hide resolved
src/metakb/cli.py Outdated Show resolved Hide resolved
src/metakb/cli.py Outdated Show resolved Hide resolved
src/metakb/cli.py Outdated Show resolved Hide resolved
src/metakb/cli.py Outdated Show resolved Hide resolved
src/metakb/cli.py Outdated Show resolved Hide resolved
src/metakb/cli.py Outdated Show resolved Hide resolved
@korikuzma korikuzma requested a review from jsstevenson April 3, 2024 12:51
@korikuzma
Copy link
Member Author

@jsstevenson I can re-order once you're good with final changes. Just want to make the review easier

src/metakb/cli.py Outdated Show resolved Hide resolved
@korikuzma
Copy link
Member Author

@jsstevenson my bad. I thought you accepted this PR but it was the other MetaKB PR

Co-authored-by: James Stevenson <[email protected]>
jsstevenson
jsstevenson previously approved these changes Apr 3, 2024
Copy link
Member

@jsstevenson jsstevenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 it works again!

@jsstevenson
Copy link
Member

ope this is my fault
Screenshot 2024-04-03 at 9 05 09 AM

src/metakb/cli.py Outdated Show resolved Hide resolved
@korikuzma korikuzma requested a review from jsstevenson April 3, 2024 13:14
@korikuzma korikuzma merged commit 7891226 into staging Apr 3, 2024
15 checks passed
@korikuzma korikuzma deleted the issue-244 branch April 3, 2024 13:53
@github-actions github-actions bot mentioned this pull request Apr 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority:medium Medium priority technical debt A feature/requirement implemented in a sub-optimal way & must be re-written. Contrast to "cleanup"
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update CLI
2 participants