Releases: ropensci/targets
AWS/crew efficiency, random number safety
targets 1.4.0
Invalidating changes
Because of the changes below, upgrading to this version of targets
will unavoidably invalidate previously built targets in existing pipelines. Your pipeline code should still work, but any targets you ran before will most likely need to rerun after the upgrade.
- Use SHA512 during the creation of target-specific pseudo-random number generator seeds (#1139). This change decreases the risk of overlapping/correlated random number generator streams. See the "RNG overlap" section of the
tar_seed_create()
help file for details and justification. Unfortunately, this change will invalidate all currently built targets because the seeds will be different. To avoid rerunning your whole pipeline, setcue = tar_cue(seed = FALSE)
intar_target()
. - For cloud storage: instead of the hash of the local file, use the ETag for AWS S3 targets and the MD5 hash for GCP GCS targets (#1172). Sanitize with
targets:::digest_chr64()
in both cases before storing the result in the metadata. - For a cloud target to be truly up to date, the hash in the metadata now needs to match the current object in the bucket, not the version recorded in the metadata (#1172). In other words,
targets
now tries to ensure that the up-to-date data objects in the cloud are in their newest versions. So if you roll back the metadata to an older version, you will still be able to access historical data versions with e.g.tar_read()
, but the pipeline will no longer be up to date.
Other changes to seeds
- Add a new exported function
tar_seed_create()
which creates target-specific pseudo-random number generator seeds. - Add an "RNG overlap" section in the
tar_seed_create()
help file to justify and defend howtargets
andtarchetypes
approach pseudo-random numbers. - Add function
tar_seed_set()
which sets a seed and sets all the RNG algorithms to their defaults in the R installation of the user. Each target now usestar_seed_set()
function to set its seed before running its R command (#1139). - Deprecate
tar_seed()
in favor of the newtar_seed_get()
function.
Other cloud storage improvements
- For all cloud targets, check hashes in batched LIST requests instead of individual HEAD requests (#1172). Dramatically speeds up the process of checking if cloud targets are up to date.
- For AWS S3 targets,
tar_delete()
,tar_destroy()
, andtar_prune()
now use efficient batched calls todelete_objects()
instead of costly individual calls todelete_object()
(#1171). - Add a new
verbose
argument totar_delete()
,tar_destroy()
, andtar_prune()
. - Add a new
batch_size
argument totar_delete()
,tar_destroy()
, andtar_prune()
. - Add new arguments
page_size
andverbose
totar_resources_aws()
(#1172). - Add a new
tar_unversion()
function to remove version IDs from the metadata of cloud targets. This makes it easier to interact with just the current version of each target, as opposed to the version ID recorded in the local metadata.
Other improvements
- Migrate to the changes in
clustermq
0.9.0 (@mschubert). - In progress statuses, change "started" to "dispatched" and change "built" to "completed" (#1192).
- Deprecate
tar_started()
in favor oftar_dispatched()
(#1192). - Deprecate
tar_built()
in favor oftar_completed()
(#1192). - Console messages from reporters say "dispatched" and "completed" instead of "started" and "built" (#1192).
- The
crew
scheduling algorithm no longer waits on saturated controllers, and targets that are ready are greedily dispatched tocrew
even if all workers are busy (#1182, #1192). To appropriately set expectations for users, reporters print "dispatched (pending)" instead of "dispatched" if the task load is backlogged at the moment. - In the
crew
scheduling algorithm, waiting for tasks is now a truly event-driven process and consumes 5-10x less CPU resources (#1183). Only the auto-scaling of workers uses polling (with an inexpensive default polling interval of 0.5 seconds, configurable throughseconds_interval
in the controller). - Simplify stored target tracebacks.
- Print the traceback on error.
CRAN patch
targets 1.3.2
- Try to fix function help files for CRAN.
Cloud metadata fixes
targets 1.3.1
- Add
tar_config_projects()
andtar_config_yaml()
(#1153, @psychelzh). - Apply error modes to
builder_wait_correct_hash()
intarget_conclude.tar_builder()
(#1154, @gadenbuie). - Remove duplicated error message from
builder_error_null()
. - Allow
tar_meta_upload()
andtar_meta_download()
to avoid errors if one or more metadata files do not exist. Add a new argumentstrict
to control error behavior. - Add new arguments
meta
,progress
,process
, andcrew
to control individual metadata files intar_meta_upload()
,tar_meta_download()
,tar_meta_sync()
, andtar_meta_delete()
. - Avoid newly deprecated arguments and functions in
crew
0.5.0.9003 (https://github.com/wlnadau/crew/issues/131). - Allow
tar_read()
etc. inside a pipeline whenever it uses a different data store (#1158, @MilesMcBain). - Set
seed = FALSE
infuture::future()
(#1166, @svraka). - Add a new
physics
argument totar_visnetwork()
andtar_glimpse()
(#925, @Bdblodgett-usgs).
Cloud metadata and settings to reduce overhead
targets 1.3.0
Invalidating changes
Because of these changes, upgrading to this version of targets
will unavoidably invalidate previously built targets in existing pipelines. Your pipeline code should still work, but any targets you ran before will most likely need to rerun after the upgrade.
- In the
hash_deps()
method of the metadata class, exclude symbols which are not actually dependencies, rather than just giving them empty strings. This change decouples the dependency hash from the hash of the target's command (#1108).
Cloud metadata
- Continuously upload metadata files to the cloud during
tar_make()
,tar_make_clustermq()
, andtar_make_future()
(#1109). Upload them to the repository specified in therepository_meta
tar_option_set()
option, and use the bucket and prefix set in theresources
tar_option_set()
option.repository_meta
defaults to the existingrepository
tar_option_set()
option. - Add new functions
tar_meta_download()
,tar_meta_upload()
,tar_meta_sync()
, andtar_meta_delete()
to directly manage cloud metadata outside the pipeline (#1109).
Other changes
- Fix solution of #1103 so the copy fallback actually runs (@jds485, #1102, #1103).
- Switch back to
tempdir()
for #1103. - Move
path_scratch_dir_network()
tofile.path(tempdir(), "targets")
and make suretar_destroy("all")
andtar_destroy("cloud")
delete it. - Display
tar_mermaid()
subgraphs with transparent fills and black borders. - Allow
database$get_data()
to work with list columns. - Disallow functions that access the local data store (including metadata) from inside a target while the pipeline is running (#1055, #1063). The only exception to this is local file targets such as
tarchetypes
literate programming target factories liketar_render()
andtar_quarto()
. - In the
hash_deps()
method of the metadata class, use a new customsort_chr()
function which temporarily sets theLC_COLLATE
locale to"C"
for sorting. This ensures lexicographic comparisons are consistent across platforms (#1108). - In
tar_source()
, use thefile
argument andkeep.source = TRUE
to help with interactive debugging (#1120). - Deprecated
seconds_interval
intar_config_get()
,tar_make()
,tar_make_clustermq()
andtar_make_future()
. Replace it withseconds_meta
(to control how often metadata gets saved) andseconds_reporter
(to control how often to print messages to the R console) (#1119). - Respect
seconds_meta
andseconds_reporter
for writing metadata and console messages even for currently building targets (#1055). - Retry all cloud REST API calls with HTTP error codes (429, 500-599) with the exponential backoff algorithm from
googleAuthR
(#1112). - For
format = "url"
, only retry on the HTTP error codes above. - Make cloud temp file instances unique in order to avoid file conflicts with the same target.
- Un-deprecate
seconds_interval
andseconds_timeout
fromtar_resources_url()
, and implementmax_tries
arguments intar_resources_aws()
andtar_resources_gcp()
(#1127). - Use
file
andkeep.source
inparse()
incallr
utils and target Markdown. - Automatically convert
"file_fast"
format to"file"
format for cloud targets. - In
tar_prune()
andtar_delete()
, do not try to delete pattern targets which have no cloud storage. - Add new arguments
seconds_timeout
,close_connection
,s3_force_path_style
totar_resources_aws()
to support the analogous arguments inpaws.storage::s3()
(#1134, @snowpong).
CRAN patch
- Fix a documentation issue in an Rd file.
Storage improvements
targets 1.2.1
- Add
tar_prune_list()
(#1090, @mglev1n). - Wrap
file.rename()
intryCatch()
and fall back on a copy-then-remove workaround (@jds485, #1102, #1103). - Stage temporary cloud upload/download files in
tools::R_user_dir(package = "targets", which = "cache")
instead oftempdir()
.tar_destroy(destroy = "cloud")
andtar_destroy(destroy = "all")
remove any leftover files from failed uploads/downloads (@jds485, #1102, #1103). - Use
paws.storage
instead of all ofpaws
.
Improved crew integration
targets 1.2.0
crew
integration
- Do not assume S3 classes when validating
crew
controllers. - Suggest a crew controller in the
_targets.R
file fromuse_targets()
. - Make
tar_crew()
compatible withcrew
>= 0.3.0. - Rename argument
terminate
toterminate_controller
intar_make()
. - Add argument
use_crew
intar_make()
and add an option intar_config_set()
to make it configurable. - Write progress data and metadata in
target_prepare()
.
Other improvements
CRAN patch 3
targets 1.1.3
- Decide on
nanonext
usage intime_seconds_local()
at runtime and not installation time. That way, ifnanonext
is removed aftertargets
is installed, functions intargets
still work. Fixes the CRAN issues seen intarchetypes
,jagstargets
, andgittargets
.
Remarks
R CMD check shows a NOTE with messages such as "#STDOFF 2:05:08.9". This is caused by an issue in the arrow
package (apache/arrow#35594) which is in "Suggests:" in the DESCRIPTION file of targets
. The NOTE will go away on its own when the next arrow
is released to CRAN.
CRAN patch 2
targets 1.1.2
- Remove
crew
-related startup messages.
Remarks
R CMD check shows a NOTE with messages such as "#STDOFF 2:05:08.9". This is caused by an issue in the arrow
package (apache/arrow#35594) which is in "Suggests:" in the DESCRIPTION file of targets
. The NOTE will go away on its own when the next arrow
is released to CRAN.
CRAN patch
targets 1.1.1
- Pre-compute
cli
colors and bullets to improve performance in RStudio. - Use
packageStartupMessage()
for package startup messages.
Remarks
R CMD check shows a NOTE with messages such as "#STDOFF 2:05:08.9". This is caused by apache/arrow#35594 because the arrow
package is in "Suggests", in the DESCRIPTION file of targets
. The NOTE will go away on its own when the next arrow
is released to CRAN.