Merge pull request #98 from ENCODE-DCC/PIP-1432_auto_write_metadata

Pip 1432 auto write metadata
ENCODE-DCC · Nov 4, 2020 · 18b1f27 · 18b1f27
2 parents 5799d0d + e0eba61
commit 18b1f27
Show file tree

Hide file tree

Showing 12 changed files with 72 additions and 119 deletions.
diff --git a/DETAILS.md b/DETAILS.md
@@ -155,7 +155,7 @@ We highly recommend to use a default configuration file described in the section
 	--file-db, -d|File-based metadata DB for Cromwell's built-in HyperSQL database (UNSTABLE)
 	--db-timeout|Milliseconds to wait for DB connection (default: 30000)
 	--java-heap-server|Java heap memory for caper server (default: 10G)
-	--disable-auto-update-metadata| Disable auto update/retrieval/writing of `metadata.json` on workflow's output directory.
+	--disable-auto-write-metadata| Disable auto update/retrieval/writing of `metadata.json` on workflow's output directory.
 	--java-heap-run|Java heap memory for caper run (default: 3G)
 	--show-subworkflow|Include subworkflow in `caper list` search query. **WARNING**: If there are too many subworkflows, then you will see HTTP 503 error (service unavaiable) or Caper/Cromwell server can crash.
 

diff --git a/README.md b/README.md
@@ -1,71 +1,5 @@
 [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) [![CircleCI](https://circleci.com/gh/ENCODE-DCC/caper.svg?style=svg)](https://circleci.com/gh/ENCODE-DCC/caper)
 
-# Major changes for Caper 1.0.
-
-If you are upgrading Caper from previous versions:
-  - Edit your `~/.caper/default.conf` to remove `cromwell=` and `womtool=` from it then Caper will automatically download Cromwell/Womtool version 51, which support new Google Cloud Life Sciences API (v2beta). You can also use `caper init [YOUR_BACKEND]` to locally install Cromwell/Womtool JARs.
-
-> **CRITICAL**: Due to change in Caper 1.0 (Cromwell `47` to `51`), metadata database (`--db`) generated before 1.0 will not work with >= 1.0. See details below.
-
-Upgraded Cromwell from 47 to 51.
-  - Metadata DB generated with Caper<1.0 will not work with Caper>=1.0.
-    - See [this note](https://github.com/broadinstitute/cromwell/releases/tag/49) to find DB migration instruction.
-  - We recommend to use Cromwell-51 with Caper>=1.0 since it's fully test with Cromwell-51.
-
-Changed hashing strategy for all local backends (`local`, `slurm`, `sge`, `pbs`).
-  - Default hashing strategy: `file` (based on md5sum, which is expensive) to `path+modtime`.
-  - Changing hashing strategy and using the same metadata DB will result in cache-miss.
-
-Changed duplication strategy for all local backends (`local`, `slurm`, `sge`, `pbs`).
-  - Default file duplication strategy: `hard-link` to `soft-link`.
-    - For filesystems (e.g. beeGFS) that do not allow hard-linking.
-      - Caper<1.0 hard-linked input files even with `--soft-glob-output`.
-      - For Caper>=1.0, you still need to use `--soft-glob-output` for such filesystems.
-
-Google Cloud Platform backend (`gcp`):
-  - Cau use a service account instead of an application default (end user's auth.).
-    - Added `--gcp-service-account-key-json`.
-    - Make sure that such service account has enough permission (roles) to resources on Google Cloud Platform project (`--gcp-prj`). See [details](docs/conf_gcp.md#how-to-run-caper-with-a-service-account).
-  - Can use Google Cloud Life Sciences API (v2beta) instead of deprecating Google Cloud Genomics API (v2alpha1).
-    - Added `--use-google-cloud-life-sciences`.
-    - For `caper server/run`, you need to specify a region `--gcp-region` to use Life Sciences API. Check [supported regions](https://cloud.google.com/life-sciences/docs/concepts/locations). `--gcp-zones` will be ignored.
-    - Make sure to enable `Google Cloud Life Sciences API` on Google Cloud Platform console (APIs & Services -> `+` button on top).
-    - Also if you use a service account then add a role `Life Sciences Admin` to your service account.
-    - We will deprecate old `Genomics API` support. `Life Sciences API` will become a new default after next 2-3 releases.
-  - Added [`memory-retry`](https://cromwell.readthedocs.io/en/stable/backends/Google/) to Caper. This is for `gcp` backend only.
-    - Retries (controlled by `--max-retries`) on an instance with increased memory if workflow fails due to OOM (out-of-memory) error.
-    - Comma-separated keys to catch OOM: `--gcp-prj-memory-retry-error-keys`.
-    - Multiplier for every retrial due to OOM: `--gcp-prj-memory-retry-multiplier`.
-
-Change of parameter names. Backward compatible.
-  - `--out-dir` -> `--local-out-dir`
-  - `--out-gcs-bucket` -> `--gcp-out-dir`
-  - `--out-s3-bucket` -> `--aws-out-dir`
-  - `--tmp-dir` -> `--local-loc-dir`
-  - `--tmp-gcs-bucket` -> `--gcp-loc-dir`
-  - `--tmp-s3-bucket` -> `--aws-loc-dir`
-
-Added parameters
-  - `--use-google-cloud-life-sciences` and `--gcp-region`: Use Life Sciences API (Cromwell's v2beta scheme).
-  - `--gcp-service-account-key-json`: Use a service account for auth on GCP (instead of application default).
-  - `--gcp-prj-memory-retry-error-keys`: Comma-separated keys to catch OOM error on GCP.
-  - `--gcp-prj-memory-retry-multiplier`: Multiplier for every retrial due to OOM error on GCP.
-  - `--cromwell-stdout`: Redirect Cromwell STDOUT to file.
-
-Improved Python interface.
-  - Old Caper<1.0 was originally designed for CLI.
-  - New Caper>=1.0 is designed for Python interface first and then CLI is based on such Python interface.
-  - Can retrieve `metadata.json` embedded with subworkflows' metadata JSON.
-
-Better logging and troubleshooting.
-  - Defaults to write Cromwell STDOUT to `cromwell.out` (controlled by `--cromwell-stdout`).
-
-
-> **IMPORTANT**: `--use-gsutil-for-s3` requires `gsutil` installed on your system. This flag allows a direct transfer between `gs://` and `s3://`. This requires `gsutil` >= 4.47. See this [issue](https://github.com/GoogleCloudPlatform/gsutil/issues/935) for details. `gsutil` is based on Python 2.
-```bash
-$ pip install gsutil --upgrade
-```
-
 # Caper
 
 Caper (Cromwell Assisted Pipeline ExecutoR) is a wrapper Python package for [Cromwell](https://github.com/broadinstitute/cromwell/).

diff --git a/caper/__init__.py b/caper/__init__.py
@@ -2,4 +2,4 @@
 from .caper_runner import CaperRunner
 
 __all__ = ['CaperClient', 'CaperClientSubmit', 'CaperRunner']
-__version__ = '1.4.1'
+__version__ = '1.4.2'
diff --git a/caper/backward_compatibility.py b/caper/backward_compatibility.py
@@ -10,3 +10,10 @@
     'tmp_s3_bucket': 'aws_loc_dir',
     'ip': 'hostname',
 }
+
+CAPER_1_4_2_PARAM_KEY_NAME_CHANGE = {'auto_update_metadata': 'auto_write_metadata'}
+
+PARAM_KEY_NAME_CHANGE = {
+    **CAPER_1_0_0_PARAM_KEY_NAME_CHANGE,
+    **CAPER_1_4_2_PARAM_KEY_NAME_CHANGE,
+}
diff --git a/caper/caper_args.py b/caper/caper_args.py
@@ -5,7 +5,7 @@
 from autouri import URIBase
 
 from .arg_tool import update_parsers_defaults_with_conf
-from .backward_compatibility import CAPER_1_0_0_PARAM_KEY_NAME_CHANGE
+from .backward_compatibility import PARAM_KEY_NAME_CHANGE
 from .caper_workflow_opts import CaperWorkflowOpts
 from .cromwell import Cromwell
 from .cromwell_backend import (
@@ -533,7 +533,7 @@ def get_parser_and_defaults(conf_file=None):
         help='Cromwell Java heap size for "server" mode (java -Xmx)',
     )
     parent_server.add_argument(
-        '--disable-auto-update-metadata',
+        '--disable-auto-write-metadata',
         action='store_true',
         help='Disable automatic retrieval/update/writing of metadata.json upon workflow/task status change.',
     )
@@ -859,9 +859,7 @@ def get_parser_and_defaults(conf_file=None):
     ]
     if os.path.exists(conf_file):
         conf_dict = update_parsers_defaults_with_conf(
-            parsers=subparsers,
-            conf_file=conf_file,
-            conf_key_map=CAPER_1_0_0_PARAM_KEY_NAME_CHANGE,
+            parsers=subparsers, conf_file=conf_file, conf_key_map=PARAM_KEY_NAME_CHANGE
         )
     else:
         conf_dict = None

diff --git a/caper/caper_runner.py b/caper/caper_runner.py
@@ -451,7 +451,7 @@ def server(
         fileobj_stdout=None,
         embed_subworkflow=False,
         java_heap_server=Cromwell.DEFAULT_JAVA_HEAP_CROMWELL_SERVER,
-        auto_update_metadata=True,
+        auto_write_metadata=True,
         work_dir=None,
         dry_run=False,
     ):
@@ -486,7 +486,7 @@ def server(
                 This is to mimic behavior of Cromwell run mode's -m parameter.
             java_heap_server:
                 Java heap (java -Xmx) for Cromwell server mode.
-            auto_update_metadata:
+            auto_write_metadata:
                 Automatic retrieval/writing of metadata.json upon workflow/task's status change.
             work_dir:
                 Local temporary directory to store all temporary files.
@@ -518,7 +518,7 @@ def server(
             fileobj_stdout=fileobj_stdout,
             embed_subworkflow=embed_subworkflow,
             java_heap_cromwell_server=java_heap_server,
-            auto_update_metadata=auto_update_metadata,
+            auto_write_metadata=auto_write_metadata,
             dry_run=dry_run,
         )
         return th
diff --git a/caper/cli.py b/caper/cli.py
@@ -317,7 +317,7 @@ def subcmd_server(caper_runner, args, nonblocking=False):
         'server_heartbeat': sh,
         'custom_backend_conf': get_abspath(args.backend_file),
         'embed_subworkflow': True,
-        'auto_update_metadata': not args.disable_auto_update_metadata,
+        'auto_write_metadata': not args.disable_auto_write_metadata,
         'java_heap_server': args.java_heap_server,
         'dry_run': args.dry_run,
     }

diff --git a/caper/cromwell.py b/caper/cromwell.py
@@ -326,7 +326,7 @@ def server(
         fileobj_stdout=None,
         embed_subworkflow=False,
         java_heap_cromwell_server=DEFAULT_JAVA_HEAP_CROMWELL_SERVER,
-        auto_update_metadata=True,
+        auto_write_metadata=True,
         on_server_start=None,
         on_status_change=None,
         cwd=None,
@@ -365,7 +365,7 @@ def server(
                 This is to mimic behavior of Cromwell run mode's -m parameter.
             java_heap_cromwell_server:
                 Java heap (java -Xmx) for Cromwell server mode.
-            auto_update_metadata:
+            auto_write_metadata:
                 Automatic retrieval/writing of metadata.json upon workflow/task's status change.
             on_server_start:
                 On server start.
@@ -429,7 +429,7 @@ def server(
             server_port=server_port,
             is_server=True,
             embed_subworkflow=embed_subworkflow,
-            auto_update_metadata=auto_update_metadata,
+            auto_write_metadata=auto_write_metadata,
             on_server_start=on_server_start,
             on_status_change=on_status_change,
         )