Skip to content

Commit

Permalink
switches build to multistage pixi + rocker/r2u. upgrades mgnifyr.
Browse files Browse the repository at this point in the history
  • Loading branch information
SandyRogers committed Oct 3, 2024
1 parent 7a27daa commit 4f78c4f
Show file tree
Hide file tree
Showing 28 changed files with 13,083 additions and 234 deletions.
2 changes: 2 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# GitHub syntax highlighting
pixi.lock linguist-language=YAML linguist-generated=true
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,14 @@ site_libs
src/docs/*.html
src/*.html
src/notebooks/**/*.html
src/notebooks/**/*_files/
src/*-listing.json

*.parquet
!**/example-data/**/*.parquet
*.sig
ko*.pathview.png
src/notebooks/R Examples/*.tsv
src/notebooks/R Examples/*.txt
src/notebooks/R Examples/*.txt# pixi environments
.pixi
*.egg-info
3 changes: 1 addition & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,6 @@ It should be localhost port 8888, with a random token.

When you're finished editing, use normal `git add` and `git commit` to contribute your changes.

For info, "jovyan" is always the user for these Jupyter Docker images. Jovyan as in jovian (a being from the planet Jupiter), but from Jupyter!

#### Guidance for authoring notebooks

Expand All @@ -85,7 +84,7 @@ For info, "jovyan" is always the user for these Jupyter Docker images. Jovyan as
##### Caching data in the image

MGnifyR uses a cache of pulled MGnify data.
This is populated during the Docker build, into `/home/jovyan/.mgnify_cache`, by the script in `dependencies/populate-mgnify-cache.R`.
This is populated during the Docker build, into `/home/mgnify/.mgnify_cache`, by the script in `dependencies/populate-mgnify-cache.R`.
Add commands to this to include other datasets in the cache.
The cache is zipped and checked into the repo for faster population during builds (`dependencies/mgnify-cache.tgz`), since it rarely changes.
To check in an updated version of the cache...
Expand Down
18 changes: 11 additions & 7 deletions Taskfile.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@ version: '3'

env:
NB_DOCKER_EXE: "docker"
NB_DOCKER_FLAGS: "--platform linux/amd64"
NB_IMAGE: "quay.io/microbiome-informatics/emg-notebooks.dev:latest"

tasks:
add-py-notebook:
Expand Down Expand Up @@ -46,7 +48,7 @@ tasks:
The files in src/notebooks are mounted as editable, and served on port 8888.
cmds:
- $NB_DOCKER_EXE run -it -v $PWD/src/notebooks:/home/jovyan/mgnify-examples -p 8888:8888 quay.io/microbiome-informatics/emg-notebooks.dev:latest
- $NB_DOCKER_EXE run $NB_DOCKER_FLAGS -it -v $PWD/src/notebooks:/home/mgnify/mgnify-examples -p 8888:8888 $NB_IMAGE

build-notebook-docker:
summary: |
Expand All @@ -56,15 +58,15 @@ tasks:
NOT needed if you're just editing/adding notebooks with no additional dependencies.
cmds:
- $NB_DOCKER_EXE build --load -f docker/Dockerfile -t quay.io/microbiome-informatics/emg-notebooks.dev:latest .
- $NB_DOCKER_EXE build $NB_DOCKER_FLAGS --load -f docker/Dockerfile -t $NB_IMAGE .

build-static-docker:
summary: |
Builds a docker image with Quarto included, for statically rendering the notebook outputs.
The built image is tagged as `notebooks-static`.
cmds:
- $NB_DOCKER_EXE build --load -f docker/docs.Dockerfile -t notebooks-static .
- $NB_DOCKER_EXE build $NB_DOCKER_FLAGS --load -f docker/docs.Dockerfile -t notebooks-static .
sources:
- docker/docs.Dockerfile
- docker/Dockerfile
Expand All @@ -75,10 +77,12 @@ tasks:
The site is built to ./_site
cmds:
- $NB_DOCKER_EXE run -it -v $PWD:/opt/repo -w /opt/repo notebooks-static render --execute
- $NB_DOCKER_EXE run $NB_DOCKER_FLAGS -it -v $PWD:/opt/repo -w /opt/repo notebooks-static render
# TODO: reenable --execute, once quarto>jupyter>R is working
deps: [build-static-docker]
sources:
- src/**/*
- _quarto.yml

serve-static:
summary: |
Expand All @@ -87,15 +91,15 @@ tasks:
This serves the contents of ./_site
cmds:
- echo "Browse to http://127.0.0.1:4444"
- $NB_DOCKER_EXE run -it -v $PWD:/opt/repo -w /opt/repo/_site -p 4444:4444 --entrypoint python notebooks-static -m http.server 4444
- $NB_DOCKER_EXE run $NB_DOCKER_FLAGS -it -v $PWD:/opt/repo -w /opt/repo/_site -p 4444:4444 --entrypoint python notebooks-static -m http.server 4444
deps: [render-static]

preview-static:
summary: |
Runs, renders, and serves the notebooks as a static website, watching for changes
cmds:
- echo 'When the rendering is finished, the static preview of notebooks will be at http://127.0.0.1:4444 ...'
- $NB_DOCKER_EXE run -it -v $PWD:/opt/repo -w /opt/repo -p 4444:4444 notebooks-static preview --no-browser --port 4444 --host 0.0.0.0
- $NB_DOCKER_EXE run $NB_DOCKER_FLAGS -it -v $PWD:/opt/repo -w /opt/repo -p 4444:4444 notebooks-static preview --no-browser --port 4444 --host 0.0.0.0
deps: [build-static-docker]

update-mgnifyr-cache:
Expand All @@ -108,4 +112,4 @@ tasks:
It writes a zip of the cache to dependencies/mgnify-cache.tgz
cmds:
- $NB_DOCKER_EXE run -it -v $PWD/dependencies:/opt/dependencies -w /opt/dependencies quay.io/microbiome-informatics/emg-notebooks.dev:latest /bin/bash zip-mgnifyr-cache.sh
- $NB_DOCKER_EXE run $NB_DOCKER_FLAGS -it -v $PWD/dependencies:/opt/dependencies -w /opt/dependencies $NB_IMAGE /bin/bash zip-mgnifyr-cache.sh
1 change: 0 additions & 1 deletion _quarto.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
project:
type: website
render:
- src/docs.qmd
- src/docs/*.md
- src/docs/*.qmd
- src/notebooks_list.qmd
Expand Down
6 changes: 0 additions & 6 deletions dependencies/dependencies.R

This file was deleted.

Binary file modified dependencies/mgnify-cache.tgz
Binary file not shown.
47 changes: 25 additions & 22 deletions dependencies/populate-mgnifyr-cache.R
Original file line number Diff line number Diff line change
Expand Up @@ -12,36 +12,39 @@ library(truncnorm)
library(dplyr)


mg <- mgnify_client(usecache = T, cache_dir = '/home/jovyan/.mgnify_cache')
mg <- MgnifyClient(useCache = T, cacheDir = '/home/mgnify/.mgnify_cache')

# For the "Comparative Metagenomics" notebook
tara_all = mgnify_analyses_from_studies(mg, 'MGYS00002008')
metadata = mgnify_get_analyses_metadata(mg, tara_all)
# TODO
#tara_all = mgnify_analyses_from_studies(mg, 'MGYS00002008')
#metadata = mgnify_get_analyses_metadata(mg, tara_all)

## To generate phyloseq object
clean_acc=c('MGYA00590456','MGYA00590543','MGYA00593110','MGYA00590477','MGYA00593125','MGYA00590448','MGYA00590508','MGYA00589025','MGYA00593139','MGYA00593220','MGYA00590525','MGYA00590534','MGYA00593112','MGYA00590498','MGYA00590535','MGYA00593223','MGYA00590480','MGYA00590496','MGYA00590523','MGYA00590444','MGYA00590517','MGYA00590575','MGYA00589039','MGYA00590574','MGYA00590474','MGYA00590554','MGYA00590469','MGYA00590471','MGYA00590522','MGYA00593141','MGYA00589049','MGYA00593123','MGYA00590564','MGYA00589024','MGYA00590572','MGYA00590545','MGYA00590518','MGYA00593126','MGYA00590526','MGYA00590500','MGYA00590570','MGYA00590520','MGYA00590443','MGYA00589013','MGYA00590449','MGYA00589021','MGYA00593130','MGYA00589047','MGYA00589042','MGYA00590577','MGYA00590470','MGYA00590473','MGYA00593216','MGYA00590562','MGYA00590464','MGYA00590484','MGYA00590462','MGYA00590565','MGYA00590439','MGYA00590472','MGYA00590566','MGYA00590552','MGYA00590485','MGYA00593133','MGYA00590544','MGYA00590455','MGYA00590437','MGYA00589044')
ps = mgnify_get_analyses_phyloseq(mg, clean_acc)
#clean_acc=c('MGYA00590456','MGYA00590543','MGYA00593110','MGYA00590477','MGYA00593125','MGYA00590448','MGYA00590508','MGYA00589025','MGYA00593139','MGYA00593220','MGYA00590525','MGYA00590534','MGYA00593112','MGYA00590498','MGYA00590535','MGYA00593223','MGYA00590480','MGYA00590496','MGYA00590523','MGYA00590444','MGYA00590517','MGYA00590575','MGYA00589039','MGYA00590574','MGYA00590474','MGYA00590554','MGYA00590469','MGYA00590471','MGYA00590522','MGYA00593141','MGYA00589049','MGYA00593123','MGYA00590564','MGYA00589024','MGYA00590572','MGYA00590545','MGYA00590518','MGYA00593126','MGYA00590526','MGYA00590500','MGYA00590570','MGYA00590520','MGYA00590443','MGYA00589013','MGYA00590449','MGYA00589021','MGYA00593130','MGYA00589047','MGYA00589042','MGYA00590577','MGYA00590470','MGYA00590473','MGYA00593216','MGYA00590562','MGYA00590464','MGYA00590484','MGYA00590462','MGYA00590565','MGYA00590439','MGYA00590472','MGYA00590566','MGYA00590552','MGYA00590485','MGYA00593133','MGYA00590544','MGYA00590455','MGYA00590437','MGYA00589044')
#ps = mgnify_get_analyses_phyloseq(mg, clean_acc)

# For the "Fetch Analaysis Metadata for a Study" notebook
analyses_accessions <- mgnify_analyses_from_studies(mg, 'MGYS00005292')
analyses_metadata_df <- mgnify_get_analyses_metadata(mg, head(analyses_accessions, 10))
analyses_ps <- mgnify_get_analyses_phyloseq(mg, analyses_metadata_df$analysis_accession, tax_SU = "SSU")
analyses_accessions <- searchAnalysis(mg, "studies", "MGYS00005116")
analyses_metadata_df <- getMetadata(mg, head(analyses_accessions, 10));
analyses_tse <- getResult(mg, analyses_metadata_df$analysis_accession, get.taxa = TRUE, get.func = FALSE, taxa.su = "SSU")
analyses_phylo <- getResult(mg, analyses_metadata_df$analysis_accession, get.taxa = TRUE, get.func = FALSE, taxa.su = "SSU", output="phyloseq")

# For the "Pathways Visualization" notebook
PATHWAY_STUDY_IDS = c('MGYS00006180', 'MGYS00006178')
all_accessions = mgnify_analyses_from_studies(mg,PATHWAY_STUDY_IDS)
all_metadata = mgnify_get_analyses_metadata(mg, all_accessions)

samples_list = c('MGYA00642773','MGYA00642774','MGYA00642775','MGYA00642777','MGYA00642779','MGYA00642781','MGYA00642782','MGYA00642783','MGYA00642785','MGYA00642787','MGYA00642792','MGYA00642795','MGYA00642798','MGYA00642801','MGYA00642804','MGYA00642807','MGYA00642811','MGYA00642815','MGYA00642819','MGYA00642822','MGYA00642825','MGYA00642828','MGYA00642832','MGYA00642836','MGYA00642840','MGYA00642846','MGYA00642850','MGYA00642853','MGYA00642857','MGYA00642861','MGYA00642865','MGYA00642870','MGYA00642872','MGYA00642875','MGYA00642879','MGYA00642884','MGYA00642887','MGYA00642890','MGYA00642892','MGYA00643488','MGYA00642677','MGYA00642680','MGYA00642681','MGYA00642684','MGYA00642685','MGYA00642687','MGYA00642688','MGYA00642690','MGYA00642692','MGYA00642693','MGYA00642695','MGYA00642697','MGYA00642698','MGYA00642700','MGYA00642702','MGYA00642703','MGYA00642705','MGYA00642706','MGYA00642707','MGYA00642709','MGYA00642710','MGYA00642711','MGYA00642713','MGYA00642714','MGYA00642716','MGYA00642717','MGYA00642719','MGYA00642721','MGYA00642722','MGYA00642724','MGYA00642726','MGYA00642728','MGYA00642730','MGYA00642732','MGYA00642733','MGYA00642735','MGYA00642737','MGYA00642739','MGYA00642741','MGYA00642743','MGYA00642744','MGYA00642746','MGYA00642747','MGYA00642748','MGYA00642750','MGYA00642751','MGYA00642753','MGYA00642754','MGYA00642755','MGYA00642757','MGYA00642758','MGYA00642759','MGYA00642761','MGYA00642763')

list_of_dfs = list()
for (accession in samples_list) {
ko_loc = paste0('analyses/',accession,'/kegg-orthologs')
ko_json = mgnify_retrieve_json(mg, path = ko_loc)
ko_data = as.data.frame(ko_json %>% spread_all)[ , c("attributes.accession", "attributes.count")]
colnames(ko_data) = c('ko_id', accession)
list_of_dfs = append(list_of_dfs, list(ko_data))
}
# TODO
#PATHWAY_STUDY_IDS = c('MGYS00006180', 'MGYS00006178')
#all_accessions = mgnify_analyses_from_studies(mg,PATHWAY_STUDY_IDS)
#all_metadata = mgnify_get_analyses_metadata(mg, all_accessions)

#samples_list = c('MGYA00642773','MGYA00642774','MGYA00642775','MGYA00642777','MGYA00642779','MGYA00642781','MGYA00642782','MGYA00642783','MGYA00642785','MGYA00642787','MGYA00642792','MGYA00642795','MGYA00642798','MGYA00642801','MGYA00642804','MGYA00642807','MGYA00642811','MGYA00642815','MGYA00642819','MGYA00642822','MGYA00642825','MGYA00642828','MGYA00642832','MGYA00642836','MGYA00642840','MGYA00642846','MGYA00642850','MGYA00642853','MGYA00642857','MGYA00642861','MGYA00642865','MGYA00642870','MGYA00642872','MGYA00642875','MGYA00642879','MGYA00642884','MGYA00642887','MGYA00642890','MGYA00642892','MGYA00643488','MGYA00642677','MGYA00642680','MGYA00642681','MGYA00642684','MGYA00642685','MGYA00642687','MGYA00642688','MGYA00642690','MGYA00642692','MGYA00642693','MGYA00642695','MGYA00642697','MGYA00642698','MGYA00642700','MGYA00642702','MGYA00642703','MGYA00642705','MGYA00642706','MGYA00642707','MGYA00642709','MGYA00642710','MGYA00642711','MGYA00642713','MGYA00642714','MGYA00642716','MGYA00642717','MGYA00642719','MGYA00642721','MGYA00642722','MGYA00642724','MGYA00642726','MGYA00642728','MGYA00642730','MGYA00642732','MGYA00642733','MGYA00642735','MGYA00642737','MGYA00642739','MGYA00642741','MGYA00642743','MGYA00642744','MGYA00642746','MGYA00642747','MGYA00642748','MGYA00642750','MGYA00642751','MGYA00642753','MGYA00642754','MGYA00642755','MGYA00642757','MGYA00642758','MGYA00642759','MGYA00642761','MGYA00642763')

# list_of_dfs = list()
# for (accession in samples_list) {
# ko_loc = paste0('analyses/',accession,'/kegg-orthologs')
# ko_json = mgnify_retrieve_json(mg, path = ko_loc)
# ko_data = as.data.frame(ko_json %>% spread_all)[ , c("attributes.accession", "attributes.count")]
# colnames(ko_data) = c('ko_id', accession)
# list_of_dfs = append(list_of_dfs, list(ko_data))
# }



Expand Down
22 changes: 0 additions & 22 deletions dependencies/py-environment.yml

This file was deleted.

22 changes: 0 additions & 22 deletions dependencies/r-environment.yml

This file was deleted.

3 changes: 2 additions & 1 deletion dependencies/zip-mgnifyr-cache.sh
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
#!/bin/bash
rm mgnify-cache.tgz
tar -czf mgnify-cache.tgz --absolute-names /home/jovyan/.mgnify_cache
Rscript populate-mgnifyr-cache.R
tar -czf mgnify-cache.tgz --absolute-names /home/mgnify/.mgnify_cache
Loading

0 comments on commit 4f78c4f

Please sign in to comment.