Skip to content

Releases: ENCODE-DCC/caper

v0.8.0

31 Mar 17:42
533b060
Compare
Choose a tag to compare

Parameters

Deprecated parameters:

  • --use-netrc: Autouri defaults to use ~/.netrc.
  • --http-user and --http-password: Use ~/.netrc to access private URLs

Change of parameters:

  • --use-gsutil-over-aws-s3 -> --use-gsutil-for-s3: Autouri uses gsutil CLI only for direct transfer between S3 and GCS buckets. Otherwise, it always use Python libraries like google-cloud-storage and boto3.

Added parameters:

  • --debug and --verbose: For better logging.

New features

Localization and preventing repetitive file transfer

  • When a new localization module makes a copy of source on destination cache directory, it compares md5 hash of source and destination if a file already exists on destination. All bucket URIs (s3://, gs://) and most of URLs provide md5 hash information in their headers. If md5 hash of those match, Caper skips unnecessary file transfer. For local paths, Caper calculate md5 hash of them and store md5 hash string in .md5 file since md5 hash calculation is expensive. This happens only when Caper writes on a local storage (i.e. when localizing files on local cache). .md5 file is not valid if its modification time (mtime) is older than the file itself.
  • If md5sum comparison fails, Caper compares file sizes and mtimes instead. If file sizes match and mtime is newer for destination then Caper skips file transfer.

File locking

  • Caper uses a stable file locking tested up to multiple threads (50 for local, 10 for cloud URIs) competing to write on the same file.

Automatic subworkflow zipping

  • Fixed bugs in old auto-zipping module.
  • Caper can automatically zip subworkflow WDLs imported in the main WDL. It can also be manullay defined by users in command line arguments --imports. Caper will skip auto-zipping if --imports is defined.
  • Enabled for caper submit only. i.e. caper run does not use any automatic subworkflow zipping since it is assumed that all sub-WDLs are already localized for caper run.

Womtool validation

  • If --imports is defined or there is an auto-zipped subworkflow WDLs, then Caper creates a temporary directory and put the main WDL and unpack the zip file there. And then Caper runs Womtool to validate those WDLs.
  • You can still skip Womtool validation with --ignore-womtool.

v0.7.0

10 Mar 00:39
f4a56d0
Compare
Choose a tag to compare

New features

  • caper init downloads and Cromwell/Womtool JARs and adds them to Caper's default conf file ~/.caper/default.conf (or whatever defined with caper -c) so that Caper can work completely offline once those JARs are installed.
  • Caper made a copy of outputs on every re-ran workflows (tasks) on GCP. Added --gcp-call-caching-dup-strat to control this behavior. It defaults back to reference instead of copy. Define --gcp-call-caching-dup-strat copy to keep making copies on re-ran (call-cached) tasks.
  • Caper can soft-link globbed outputs instead of hard-linking them. This is useful on file systems where hard-linking is not allowed (e.g. beeGFS). Added a flag --soft-glob-output for local backends (local, slurm, sge and pbs). This flag cannot work with docker (with --docker) or docker-based backends (gcp and aws).

Documentation

  • Heartbeat file and how to run multiple caper server on a single machine.
  • How to configure Caper for a custom backend.
  • Important notes for storage choices on Sherlock cluster.

Bug fixes

  • metadata.json in output directory/bucket is updated correctly while running and after being done.
  • caper list sent too many requests to get label of all workflows. Now it sends a single query to retrieve all information of workflows.

v0.6.4

12 Feb 00:40
Compare
Choose a tag to compare

Improved job submission for SLURM backend (Sherlock, SCG, ...)

  • Fix for the following submission error when server is busy. Caper can try sbatching up to 3 times.
    sbatch: error: Batch job submission failed: Socket timed out on send/recv operation
    

Added warning for Stanford Sherlock platform (SLURM backend)

  • Do not install Caper, Conda and any executable on $OAK or $SCRATCH. Install them on $HOME or $PI_HOME.

Bug fixes

  • Fix for w['submission'] error.

v0.6.3

21 Dec 16:50
Compare
Choose a tag to compare

added warning for parameter tmp-dir

change in default parameters

  • increase default java-heap-run 2G->3G

bug fixes

  • check presence of metadata.json file for troubleshooting
  • submission = w['submission'] error for caper list

v0.6.2

09 Dec 19:56
Compare
Choose a tag to compare

Bug fixes

  • Remove leading/trailing quotes " and ' from values when reading from the conf file (e.g. ~/.caper/default.conf`). Users can use quoted strings in a conf file.

v0.6.1

15 Nov 23:04
Compare
Choose a tag to compare

Minor update for Croo's new feature (task graph)

Bug fixes

  • Permission denied issue for MySQL shell script for docker.

Updated documentation

  • MySQL docker

v0.6.0

07 Nov 03:15
f01fc90
Compare
Choose a tag to compare

IMPORATNT: Caper defaults back to NOT use a file-based metadata DB, which means no call-caching (re-using outputs from previous workflows) by default.

IMPORATNT: Even if you still want to use a file-based DB (--db file and --file-db [DB_PATH]), metadata DB generated from Caper<0.6 (with Cromwell-42) is not compatible with metadata DB generated from Caper>=0.6 (with Cromwell-47). Refer to this doc for such migration.

See this for details about metadata DB. Define a DB type with db= in your conf ~/.caper/default.conf to use a metadata DB.

Engine update

  • Upgraded default crowmell JAR version: 42 -> 47
    • Some feature of Caper will only work with 47 (e.g. PostgreSQL support, some bug fixes).

Then how to choose a DB?

  • You can choose a DB type with --db (or db= in conf file ~/.caper/default.conf). Then define chosen DB's required parameters (nothing required for in-memory DB).
    • Choices: file (unstable), mysql (recommended), postgresql and in-memory (new default but no call-caching).
  • mysql is recommended. We provide shell scripts (run_mysql_server_docker.sh and run_mysql_server_singularity.sh) to run a MySQL server with docker/singularity (without root).
  • See details at "Metadata database" section on README.

New features

  • Support for PostgreSQL DB for call-caching (Cromwell >= 43)

Change of parameters

  • New:
    • --db: in-memory (default), file (unstable), mysql (recommended) or postgresql (experimental).
    • --mysql-db-name: (optional) cromwell by default
    • --postgresql-db-ip: localhost by default
    • --postgresql-db-port: 5432 by default
    • --postgresql-db-user: (optional) cromwell by default
    • --postgresql-db-password: (optional) cromwell by default
    • --postgresql-db-name: (optional) cromwell by default
  • Deprecated:
    • --no-file-db: File DB is disabled by default. Many users reported that a file DB is unstable.

Bug fixes

  • PAPI v10 error (preemption error) on Google Cloud.
  • Caper didn't run WDL without --docker on cloud backends (aws and gcp).
    • Some WDL has docker image definition in each task (runtime { docker : }). Users had to specify a dummy docker image --docker ubuntu:latest to bypass this error.

v0.5.6

06 Nov 01:18
Compare
Choose a tag to compare

Bug fix

  • cloud backends not working due to failed localization of an input JSON file

v0.5.5

05 Nov 23:15
Compare
Choose a tag to compare

Bug fixes

  • womtool validation fails on remote cloud backends (gcp and aws).

v0.5.4

02 Nov 21:42
641db11
Compare
Choose a tag to compare

Validation for WDL/input JSON

  • added womtool validation for WDL and input JSON
  • useful to find missing/wrong parameters in input JSON

New parameters

  • --womtool: womtool JAR location (URL or path). 42 by default.
  • --ignore-womtool: Flag. Ignore womtool validation.

Added dict_tool.py (for new tool qc2tsv)

  • useful dict functions
    • merge_dict(a, b): Merge dict b into dict a.
    • split_dict(): Split dict into multiple dicts according to given "split_rule" (REGEX)
    • flatten_dict()/unflatten_dict(): Flatten dict with a 1-level tuple key. This tuple key keeps hierachy of original dict object in it.