Releases: ENCODE-DCC/caper
v0.8.0
Parameters
Deprecated parameters:
--use-netrc
: Autouri defaults to use~/.netrc
.--http-user
and--http-password
: Use~/.netrc
to access private URLs
Change of parameters:
--use-gsutil-over-aws-s3
->--use-gsutil-for-s3
: Autouri usesgsutil
CLI only for direct transfer between S3 and GCS buckets. Otherwise, it always use Python libraries likegoogle-cloud-storage
andboto3
.
Added parameters:
--debug
and--verbose
: For better logging.
New features
Localization and preventing repetitive file transfer
- When a new localization module makes a copy of source on destination cache directory, it compares md5 hash of source and destination if a file already exists on destination. All bucket URIs (
s3://
,gs://
) and most of URLs provide md5 hash information in their headers. If md5 hash of those match, Caper skips unnecessary file transfer. For local paths, Caper calculate md5 hash of them and store md5 hash string in.md5
file since md5 hash calculation is expensive. This happens only when Caper writes on a local storage (i.e. when localizing files on local cache)..md5
file is not valid if its modification time (mtime) is older than the file itself. - If md5sum comparison fails, Caper compares file sizes and mtimes instead. If file sizes match and mtime is newer for destination then Caper skips file transfer.
File locking
- Caper uses a stable file locking tested up to multiple threads (50 for local, 10 for cloud URIs) competing to write on the same file.
Automatic subworkflow zipping
- Fixed bugs in old auto-zipping module.
- Caper can automatically zip subworkflow WDLs imported in the main WDL. It can also be manullay defined by users in command line arguments
--imports
. Caper will skip auto-zipping if--imports
is defined. - Enabled for
caper submit
only. i.e.caper run
does not use any automatic subworkflow zipping since it is assumed that all sub-WDLs are already localized forcaper run
.
Womtool validation
- If
--imports
is defined or there is an auto-zipped subworkflow WDLs, then Caper creates a temporary directory and put the main WDL and unpack the zip file there. And then Caper runs Womtool to validate those WDLs. - You can still skip Womtool validation with
--ignore-womtool
.
v0.7.0
New features
caper init
downloads and Cromwell/Womtool JARs and adds them to Caper's default conf file~/.caper/default.conf
(or whatever defined withcaper -c
) so that Caper can work completely offline once those JARs are installed.- Caper made a copy of outputs on every re-ran workflows (tasks) on GCP. Added
--gcp-call-caching-dup-strat
to control this behavior. It defaults back toreference
instead ofcopy
. Define--gcp-call-caching-dup-strat copy
to keep making copies on re-ran (call-cached) tasks. - Caper can soft-link globbed outputs instead of hard-linking them. This is useful on file systems where hard-linking is not allowed (e.g. beeGFS). Added a flag
--soft-glob-output
for local backends (local
,slurm
,sge
andpbs
). This flag cannot work with docker (with--docker
) or docker-based backends (gcp
andaws
).
Documentation
- Heartbeat file and how to run multiple
caper server
on a single machine. - How to configure Caper for a custom backend.
- Important notes for storage choices on Sherlock cluster.
Bug fixes
metadata.json
in output directory/bucket is updated correctly while running and after being done.caper list
sent too many requests to get label of all workflows. Now it sends a single query to retrieve all information of workflows.
v0.6.4
Improved job submission for SLURM backend (Sherlock, SCG, ...)
- Fix for the following submission error when server is busy. Caper can try
sbatch
ing up to 3 times.sbatch: error: Batch job submission failed: Socket timed out on send/recv operation
Added warning for Stanford Sherlock platform (SLURM backend)
- Do not install Caper, Conda and any executable on
$OAK
or$SCRATCH
. Install them on$HOME
or$PI_HOME
.
Bug fixes
- Fix for
w['submission']
error.
v0.6.3
v0.6.2
v0.6.1
v0.6.0
IMPORATNT: Caper defaults back to NOT use a file-based metadata DB, which means no call-caching (re-using outputs from previous workflows) by default.
IMPORATNT: Even if you still want to use a file-based DB (--db file
and --file-db [DB_PATH]
), metadata DB generated from Caper<0.6 (with Cromwell-42) is not compatible with metadata DB generated from Caper>=0.6 (with Cromwell-47). Refer to this doc for such migration.
See this for details about metadata DB. Define a DB type with db=
in your conf ~/.caper/default.conf
to use a metadata DB.
Engine update
- Upgraded default crowmell JAR version: 42 -> 47
- Some feature of Caper will only work with 47 (e.g. PostgreSQL support, some bug fixes).
Then how to choose a DB?
- You can choose a DB type with
--db
(ordb=
in conf file~/.caper/default.conf
). Then define chosen DB's required parameters (nothing required forin-memory
DB).- Choices:
file
(unstable),mysql
(recommended),postgresql
andin-memory
(new default but no call-caching).
- Choices:
mysql
is recommended. We provide shell scripts (run_mysql_server_docker.sh
andrun_mysql_server_singularity.sh
) to run a MySQL server with docker/singularity (without root).- See details at "Metadata database" section on README.
New features
- Support for PostgreSQL DB for call-caching (Cromwell >= 43)
Change of parameters
- New:
--db
:in-memory
(default),file
(unstable),mysql
(recommended) orpostgresql
(experimental).--mysql-db-name
: (optional)cromwell
by default--postgresql-db-ip
:localhost
by default--postgresql-db-port
:5432
by default--postgresql-db-user
: (optional)cromwell
by default--postgresql-db-password
: (optional)cromwell
by default--postgresql-db-name
: (optional)cromwell
by default
- Deprecated:
--no-file-db
: File DB is disabled by default. Many users reported that a file DB is unstable.
Bug fixes
- PAPI v10 error (preemption error) on Google Cloud.
- Caper didn't run WDL without
--docker
on cloud backends (aws
andgcp
).- Some WDL has docker image definition in each task (
runtime { docker : }
). Users had to specify a dummy docker image--docker ubuntu:latest
to bypass this error.
- Some WDL has docker image definition in each task (
v0.5.6
v0.5.5
v0.5.4
Validation for WDL/input JSON
- added
womtool
validation for WDL and input JSON - useful to find missing/wrong parameters in input JSON
New parameters
--womtool
: womtool JAR location (URL or path). 42 by default.--ignore-womtool
: Flag. Ignore womtool validation.
Added dict_tool.py
(for new tool qc2tsv
)
- useful dict functions
merge_dict(a, b)
: Merge dict b into dict a.split_dict()
: Split dict into multiple dicts according to given "split_rule" (REGEX)flatten_dict()/unflatten_dict()
: Flatten dict with a 1-level tuple key. This tuple key keeps hierachy of original dict object in it.