Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update workflows #2

Merged
merged 17 commits into from
Sep 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,12 @@

## BUGFIXES -->

# dimensionality_reduction 0.1.1 2024-09-18

## NEW FUNCTIONALITY

* Updated workflows to work correctly for this task (PR #2)

# dimensionality_reduction 0.1.0 2024-09-05

## NEW FUNCTIONALITY
Expand Down
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ flowchart LR

The dataset to pass to a method.

Example file: `resources_test/common/pancreas/dataset.h5ad`
Example file: `resources_test/common/cxg_mouse_pancreas_atlas/dataset.h5ad`

Format:

Expand Down Expand Up @@ -149,7 +149,7 @@ Arguments:
The dataset to pass to a method.

Example file:
`resources_test/dimensionality_reduction/pancreas/dataset.h5ad`
`resources_test/task_dimensionality_reduction/cxg_mouse_pancreas_atlas/dataset.h5ad`

Format:

Expand Down Expand Up @@ -181,7 +181,7 @@ Data structure:
The data for evaluating a dimensionality reduction.

Example file:
`resources_test/dimensionality_reduction/pancreas/solution.h5ad`
`resources_test/task_dimensionality_reduction/cxg_mouse_pancreas_atlas/solution.h5ad`

Format:

Expand Down Expand Up @@ -268,7 +268,7 @@ Arguments:
A dataset with dimensionality reduction embedding.

Example file:
`resources_test/dimensionality_reduction/pancreas/embedding.h5ad`
`resources_test/task_dimensionality_reduction/cxg_mouse_pancreas_atlas/embedding.h5ad`

Format:

Expand Down Expand Up @@ -298,7 +298,7 @@ Data structure:
Metric score file

Example file:
`resources_test/dimensionality_reduction/pancreas/score.h5ad`
`resources_test/task_dimensionality_reduction/cxg_mouse_pancreas_atlas/score.h5ad`

Format:

Expand Down
15 changes: 8 additions & 7 deletions _viash.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -67,11 +67,11 @@ info:
# Step 5: Replace the task_template to the name of the task.
test_resources:
- type: s3
path: s3://openproblems-data/resources_test/common/pancreas/
dest: resources_test/common/pancreas/
path: s3://openproblems-data/resources_test/common/cxg_mouse_pancreas_atlas/
dest: resources_test/common/cxg_mouse_pancreas_atlas/
- type: s3
path: s3://openproblems-data/resources_test/dimensionality_reduction/
dest: resources_test/dimensionality_reduction
path: s3://openproblems-data/resources_test/task_dimensionality_reduction/
dest: resources_test/task_dimensionality_reduction

# Step 6: Update the authors of the task.
authors:
Expand Down Expand Up @@ -121,7 +121,8 @@ config_mods: |
.runners[.type == "nextflow"].config.labels := { lowmem : "memory = 20.Gb", midmem : "memory = 50.Gb", highmem : "memory = 100.Gb", lowcpu : "cpus = 5", midcpu : "cpus = 15", highcpu : "cpus = 30", lowtime : "time = 1.h", midtime : "time = 4.h", hightime : "time = 8.h", veryhightime : "time = 24.h" }

repositories:
- name: openproblems-v2
- name: core
type: github
repo: openproblems-bio/openproblems-v2
tag: main_build
repo: openproblems-bio/core
tag: build/main
path: viash/core
3 changes: 0 additions & 3 deletions scripts/.gitignore

This file was deleted.

2 changes: 2 additions & 0 deletions scripts/create_component/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# if users change the scripts, the changes should not be committed.
/create_*_*.sh
8 changes: 8 additions & 0 deletions scripts/create_component/create_python_method.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
#!/bin/bash

set -e

common/scripts/create_component \
--name my_python_method \
--language python \
--type method
8 changes: 8 additions & 0 deletions scripts/create_component/create_python_metric.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
#!/bin/bash

set -e

common/scripts/create_component \
--name my_python_metric \
--language python \
--type metric
8 changes: 8 additions & 0 deletions scripts/create_component/create_r_method.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
#!/bin/bash

set -e

common/scripts/create_component \
--name my_r_method \
--language r \
--type method
8 changes: 8 additions & 0 deletions scripts/create_component/create_r_metric.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
#!/bin/bash

set -e

common/scripts/create_component \
--name my_r_metric \
--language r \
--type metric
4 changes: 3 additions & 1 deletion scripts/create_readme.sh
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
#!/bin/bash

common/scripts/create_task_readme
set -e

common/scripts/create_task_readme --input src/api
26 changes: 26 additions & 0 deletions scripts/create_resources/resources.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#!/bin/bash

# get the root of the directory
REPO_ROOT=$(git rev-parse --show-toplevel)

# ensure that the command below is run from the root of the repository
cd "$REPO_ROOT"

cat > /tmp/params.yaml << 'HERE'
input_states: s3://openproblems-data/resources/datasets/**/state.yaml
rename_keys: 'input:output_dataset'
output_state: '$id/state.yaml'
settings: '{"output_dataset": "$id/dataset.h5ad", "output_solution": "$id/solution.h5ad"}'
publish_dir: s3://openproblems-data/resources/task_dimensionality_reduction/datasets/
HERE

tw launch https://github.com/openproblems-bio/task_dimensionality_reduction.git \
--revision build/main \
--pull-latest \
--main-script target/nextflow/workflows/process_datasets/main.nf \
--workspace 53907369739130 \
--compute-env 6TeIFgV5OY4pJCk8I0bfOh \
--params-file /tmp/params.yaml \
--entry-name auto \
--config common/nextflow_helpers/labels_tw.config \
--labels task_dimensionality_reduction,process_datasets
44 changes: 44 additions & 0 deletions scripts/create_resources/test_resources.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
#!/bin/bash

# get the root of the directory
REPO_ROOT=$(git rev-parse --show-toplevel)

# ensure that the command below is run from the root of the repository
cd "$REPO_ROOT"

set -e

RAW_DATA=resources_test/common
DATASET_DIR=resources_test/task_dimensionality_reduction

mkdir -p $DATASET_DIR

# process dataset
echo Running process_dataset
viash run src/data_processors/process_dataset/config.vsh.yaml -- \
--input $RAW_DATA/cxg_mouse_pancreas_atlas/dataset.h5ad \
--output_dataset $DATASET_DIR/cxg_mouse_pancreas_atlas/dataset.h5ad \
--output_solution $DATASET_DIR/cxg_mouse_pancreas_atlas/solution.h5ad

# run one method
viash run src/methods/pca/config.vsh.yaml -- \
--input $DATASET_DIR/cxg_mouse_pancreas_atlas/dataset.h5ad \
--output $DATASET_DIR/cxg_mouse_pancreas_atlas/embedding.h5ad

# run one metric
viash run src/metrics/clustering_performance/config.vsh.yaml -- \
--input_embedding $DATASET_DIR/cxg_mouse_pancreas_atlas/embedding.h5ad \
--input_solution $DATASET_DIR/cxg_mouse_pancreas_atlas/solution.h5ad \
--output $DATASET_DIR/cxg_mouse_pancreas_atlas/score.h5ad

cat > $DATASET_DIR/cxg_mouse_pancreas_atlas/state.yaml << HERE
id: cxg_mouse_pancreas_atlas
output_dataset: !file dataset.h5ad
output_solution: !file solution.h5ad
HERE

# only run this if you have access to the openproblems-data bucket
aws s3 sync --profile op \
"resources_test/task_dimensionality_reduction" \
s3://openproblems-data/resources_test/task_dimensionality_reduction \
--delete --dryrun
38 changes: 0 additions & 38 deletions scripts/create_test_resources.sh

This file was deleted.

9 changes: 0 additions & 9 deletions scripts/download_resources.sh

This file was deleted.

6 changes: 6 additions & 0 deletions scripts/project/build_all_components.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#!/bin/bash

set -e

# Build all components in a namespace (refer https://viash.io/reference/cli/ns_build.html)
viash ns build --parallel
7 changes: 7 additions & 0 deletions scripts/project/build_all_docker_containers.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#!/bin/bash

set -e

# Build all components in a namespace (refer https://viash.io/reference/cli/ns_build.html)
# and set up the container via a cached build
viash ns build --parallel --setup cachedbuild
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
#!/bin/bash

set -e

# Test all components in a namespace (refer https://viash.io/reference/cli/ns_test.html)
viash ns test --parallel
viash ns test --parallel
23 changes: 0 additions & 23 deletions scripts/run_benchmark.sh

This file was deleted.

47 changes: 47 additions & 0 deletions scripts/run_benchmark/run_full_local.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
#!/bin/bash

# get the root of the directory
REPO_ROOT=$(git rev-parse --show-toplevel)

# ensure that the command below is run from the root of the repository
cd "$REPO_ROOT"

# NOTE: depending on the the datasets and components, you may need to launch this workflow
# on a different compute platform (e.g. a HPC, AWS Cloud, Azure Cloud, Google Cloud).
# please refer to the nextflow information for more details:
# https://www.nextflow.io/docs/latest/

# remove this when you have implemented the script
# echo "TODO: once the 'run_benchmark' workflow has been implemented, update this script to use it."
# echo " Step 1: replace 'task_template' with the name of the task in the following command."
# echo " Step 2: replace the rename keys parameters to fit your run_benchmark inputs"
# echo " Step 3: replace the settings parameter to fit your run_benchmark outputs"
# echo " Step 4: remove this message"
# exit 1

set -e

echo "Running benchmark on test data"
echo " Make sure to run 'scripts/project/build_all_docker_containers.sh'!"

# generate a unique id
RUN_ID="run_$(date +%Y-%m-%d_%H-%M-%S)"
publish_dir="resources/results/${RUN_ID}"

# write the parameters to file
cat > /tmp/params.yaml << HERE
input_states: resources/datasets/**/state.yaml
rename_keys: 'input_dataset:output_dataset;input_solution:output_solution'
output_state: "state.yaml"
publish_dir: "$publish_dir"
HERE

# run the benchmark
nextflow run openproblems-bio/task_dimensionality_reduction \
--revision build/main \
-main-script target/nextflow/workflows/run_benchmark/main.nf \
-profile docker \
-resume \
-entry auto \
-c common/nextflow_helpers/labels_ci.config \
-params-file /tmp/params.yaml
Loading