parabar
is a package designed to
provide a simple interface for executing tasks in parallel, while also providing
functionality for tracking and displaying the progress of the tasks.
This package is aimed at two audiences: (1) end-users who want to execute a task
in parallel in an interactive R
session and track the execution progress, and
(2) R
package developers who want to use
parabar
as a solution for parallel
processing in their packages.
You can install parabar
directly from CRAN
using the following command:
# Install the package from `CRAN`.
install.packages("parabar")
# Load the package.
library(parabar)
Alternatively, you can also install the latest development version from GitHub
via:
# Install the package from `GitHub`.
remotes::install_github("mihaiconstantin/parabar")
# Load the package.
library(parabar)
Below you can find a few examples of how to use
parabar
in your R
scripts, both for
end-users, and for developers. All examples below assume that you have already
installed and loaded the package.
In general, the usage of parabar
consists of the following steps:
- Start a backend for parallel processing.
- Execute a task in parallel.
- Stop the backend.
Optionally, you can also configure the progress bar if the backend created supports progress tracking, or perform additional operations on the backend.
The simplest, and perhaps least interesting, way to use
parabar
is by requesting a synchronous
backend.
# Start a synchronous backend.
backend <- start_backend(cores = 4, cluster_type = "psock", backend_type = "sync")
# Run a task in parallel.
results <- par_sapply(backend, 1:1000, function(x) {
# Sleep a bit.
Sys.sleep(0.01)
# Compute and return.
x + 1
})
At this point you will notice the following warning message:
Warning message:
Progress tracking not supported for backend of type 'SyncBackend'.
The reason for this is because progress tracking only works for asynchronous
backends, and parabar
enables progress
tracking by default at load time. We can disable this by option to get rid of
the warning message.
# Disable progress tracking.
set_option("progress_track", FALSE)
We can verify that the warning message is gone by running the task again, reusing the backend we created earlier.
# Run a task in parallel.
results <- par_sapply(backend, 1:1000, function(x) {
# Sleep a bit.
Sys.sleep(0.01)
# Compute and return.
x + 1
})
When we are done with this backend, we can stop it to free up the resources.
# Stop the backend.
stop_backend(backend)
The more interesting way to use parabar
is by requesting an asynchronous backend. This is the default backend type, and
highlights the strengths of the package.
First, let's ensure progress tracking is enabled (i.e., we disabled it above).
# Enable progress tracking.
set_option("progress_track", TRUE)
Now, we can proceed with creating the backend and running the task.
# Start an asynchronous backend.
backend <- start_backend(cores = 4, cluster_type = "psock", backend_type = "async")
# Run a task in parallel.
results <- par_sapply(backend, 1:1000, function(x) {
# Sleep a bit.
Sys.sleep(0.01)
# Compute and return.
x + 1
})
At this point, we can see that the progress bar is displayed, and that the progress is tracked. The progress bar is updated in real-time, after each task execution, e.g.:
> completed 928 out of 1000 tasks [ 93%] [ 3s]
We can also configure the progress bar. For example, suppose we want to display an actual progress bar.
# Change the progress bar options.
configure_bar(type = "modern", format = "[:bar] :percent")
# Run a task in parallel.
results <- par_sapply(backend, 1:1000, function(x) {
# Sleep a bit.
Sys.sleep(0.01)
# Compute and return.
x + 1
})
The progress bar will now look like this:
[====================>-------------------------------------------------] 30%
By default, parabar
uses the
progress
package to display the
progress bar. However, we can easily swap it with another progress bar engine.
For example, suppose we want to use the built-in
utils::txtProgressBar
.
# Change to and adjust the style of the `basic` progress bar.
configure_bar(type = "basic", style = 3)
# Run a task in parallel.
results <- par_sapply(backend, 1:1000, function(x) {
# Sleep a bit.
Sys.sleep(0.01)
# Compute and return.
x + 1
})
Check out ?configure_bar
for more information on the possible ways of
configuring the progress bar.
We can also disable the progress bar for asynchronous backends altogether, by adjusting the package options.
# Disable progress tracking.
set_option("progress_track", FALSE)
# Run a task in parallel.
results <- par_sapply(backend, 1:1000, function(x) {
# Sleep a bit.
Sys.sleep(0.01)
# Compute and return.
x + 1
})
We can stop the backend when we are done.
# Stop the backend.
stop_backend(backend)
Finally, we can also the ?par_sapply
function without a backend, which will
resort to running the task sequentially by means of
utils::sapply
.
# Run the task sequentially using the `base::sapply`.
results <- par_sapply(backend = NULL, 1:300, function(x) {
# Sleep a bit.
Sys.sleep(0.01)
# Compute and return.
x + 1
})
As indicated above, the general workflow consists of starting a backend, executing a task in parallel, and stopping the backend. However, there are additional operations that can be performed on a backend (i.e., see Developers section). The table below lists all available operations that can be performed on a backend.
Operation | Description |
---|---|
start_backend(backend) |
Start a backend. |
stop_backend(backend) |
Stop a backend. |
clear(backend) |
Remove all objects from a backend. |
peek(backend) |
List the names of the variables on a backend. |
export(backend, variables, environment) |
Export objects to a backend. |
evaluate(backend, expression) |
Evaluate expressions on a backend. |
par_sapply(backend, x, fun) |
Run tasks in parallel on a backend. |
par_lapply(backend, x, fun) |
Run tasks in parallel on a backend. |
par_apply(backend, x, margin, fun) |
Run tasks in parallel on a backend. |
Check the documentation corresponding to each operation for more information and examples.
parabar
provides a rich API for
developers who want to use the package in their own projects.
From a high-level perspective, the package consists of backends
and
contexts
in which these backends are employed for executing tasks in
parallel.
A backend
represents a set of operations, defined by the ?BackendService
interface. Backends can be synchronous (i.e., ?SyncBackend
) or asynchronous
(i.e., ?AsyncBackend
). The former will block the execution of the current R
session until the parallel task is completed, while the latter will return
immediately and the task will be executed in a background R
session.
The ?BackendService
interface defines the following operations:
start
: Start the backend.stop
: Stop the backend.clear
: Remove all objects from the backend.peek
: Show the variables names available on the backend.export
: Export variables from a given environment to the backend.evaluate
: Evaluate an arbitrary expression on the backend.sapply
: Run a task on the backend.lapply
: Run a task on the backend.apply
: Run a task on the backend.get_output
: Get the output of the task execution.
Check out the documentation for BackendService
for more information on each
method.
A context
represents the specific conditions in which a backend object
operates. The default context class (i.e., ?Context
) simply forwards the call
to the corresponding backend method. However, a more complex context can augment
the operation before forwarding the call to the backend. One example of a
complex context is the ?ProgressTrackingContext
class. This class extends the
regular ?Context
class and decorates, e.g., the backend sapply
operation to
log the progress after each task execution and display a progress bar.
The following are the main classes provided by
parabar
:
BackendService
: Interface for backend operations.SyncBackend
: Synchronous backend extending the abstractBackend
class and implementing theBackendService
interface.AsyncBackend
: Asynchronous backend extending the abstractBackend
class and implementing theBackendService
interface.Specification
: Backend specification used when starting a backend.BackendFactory
: Factory for creatingBackend
objects.Context
: Default context for executing backend operations without interference.ProgressTrackingContext
: Context for decorating thesapply
operation to track and display the progress.ContextFactory
: Factory for creatingContext
objects.UserApiConsumer
: Wrapper around the developerAPI
.
Additionally, parabar
also provides
several classes for creating and updating different progress bars, namely:
BasicBar
: A simple, but robust, bar created viautils::txtProgressBar
extending theBar
abstract class.ModernBar
: A modern bar created viaprogress::progress_bar
extending theBar
abstract class.BarFactory
: Factory for creatingBar
objects.
Below there is an example of how to use the package
R6
class API.
We start by creating a ?Specification
object instructing the ?Backend
object
how to create a cluster via the built-in function
parallel::makeCluster
.
# Create a specification object.
specification <- Specification$new()
specification$set_cores(4)
specification$set_type("psock")
We proceed by obtaining an asynchronous backend instance from the
?BackendFactory
and starting the backend using the ?Specification
instance
above.
# Create a backend factory.
backend_factory <- BackendFactory$new()
# Get an asynchronous backend instance.
backend <- backend_factory$get("async")
# Start the backend.
backend$start(specification)
Finally, we can run a task in parallel by calling, e.g., the sapply
method on
the backend
instance.
# Run a task in parallel.
backend$sapply(1:1000, function(x) {
# Sleep a bit.
Sys.sleep(0.01)
# Compute and return.
x + 1
})
At this point, the task was deployed in a background R
session, and the caller
process is free to do other things.
Calling backend$get_output
immediately after the backend$sapply
call will
throw an error, indicating that the task is still running, i.e.:
Error: A task is currently running.
We can, however, block the caller process and wait for the task to complete before fetching the results.
results <- backend$get_output(wait = TRUE)
We can now introduce the context
concept to decorate the backend
instance
and, in this example, track the progress of the task. First, we obtain an
?Context
instance from the ?ContextFactory
. Furthermore, since we are using
an asynchronous backend, we can request a context that facilitates
progress-tracking.
# Create a context factory.
context_factory <- ContextFactory$new()
# Get a progress-tracking context.
context <- context_factory$get("progress")
# Register the backend with the context.
context$set_backend(backend)
The ?Context
class (i.e., and it's subclasses) implements the
?BackendService
interface, which means that we can use it to execute backend
operations.
Since we are using the ?ProgressTrackingContext
context, we also need to
register a ?Bar
instance with the context. First, let's obtain a ?Bar
instance from the ?BarFactory
.
# Create a bar factory.
bar_factory <- BarFactory$new()
# Get a `modern` bar (i.e., via `progress::progress_bar`).
bar <- bar_factory$get("modern")
We can now register the bar
instance with the context
instance.
# Register the `bar` with the `context`.
context$set_bar(bar)
We may also configure the bar
, or change its appearance. For instance, it may
be a good idea is to show the progress bar right away.
# Configure the `bar`.
context$configure_bar(
show_after = 0,
format = " > completed :current out of :total tasks [:percent] [:elapsed]"
)
At this point, the backend$sapply
operation is decorated with progress
tracking. Finally, we can run the task in parallel and enjoy our progress bar
using the context
instance.
# Run a task in parallel with progress tracking.
context$sapply(1:1000, function(x) {
# Sleep a bit.
Sys.sleep(0.01)
# Compute and return.
x + 1
})
All there is left to do is to fetch the results and stop the backend.
# Get the results.
results <- context$get_output()
# Stop the backend.
context$stop()
Check out the UML diagram below for a quick overview of the package design.
Note. For the sake of clarity, the diagram only displays the sapply
operation for running tasks in parallel. However, other operations are supported
as well (i.e., see table in the section Additional Operations).
- Any contributions are welcome and greatly appreciated. Please open a pull
request on
GitHub
. - To report bugs, or request new features, please open an
issue on
GitHub
.
- The package source code in this repository is licensed under the MIT license.
-
The documentation, vignettes, and other website materials by Mihai Constantin are licensed under CC BY 4.0 .