-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HPC profiling #937
Comments
I can also run your test case on my SLURM cluster, in case the cluster scheduler makes a difference. |
Awesome! I will let you know when the time comes. |
Just uploaded a test case to https://github.com/wlandau/drake-examples/tree/master/hpc-profiling. Here are baseline results (without hpc). As expected, the bottlenecks are reading and writing the datasets (320 MB each). The next step is to try the same script on a cluster and show a flame graph. library(dplyr)
library(drake)
library(jointprof) # remotes::install_github("r-prof/jointprof")
library(profile)
# Just throw around some medium-ish data.
create_plan <- function() {
plan <- drake_plan(
data1 = target(x, transform = map(i = !!seq_len(8))),
data2 = target(data1, transform = map(data1, .id = FALSE)),
data3 = target(data2, transform = map(data2, .id = FALSE)),
data4 = target(data3, transform = map(data3, .id = FALSE)),
data5 = target(bind_rows(data4), transform = combine(data4))
)
plan$format <- "fst"
plan
}
# Profiling.
run_profiling <- function(
path = "profile.proto",
n = 2e7,
parallelism = "loop",
jobs = 1L
) {
unlink(path)
unlink(".drake", recursive = TRUE, force = TRUE)
on.exit(unlink(".drake", recursive = TRUE, force = TRUE))
x <- data.frame(x = runif(n), y = runif(n))
plan <- create_plan()
rprof_file <- tempfile()
Rprof(filename = rprof_file)
options(
clustermq.scheduler = "sge",
clustermq.template = "sge_clustermq.tmpl"
)
make(
plan,
parallelism = parallelism,
jobs = jobs,
memory_strategy = "preclean",
garbage_collection = TRUE
)
Rprof(NULL)
data <- read_rprof(rprof_file)
write_pprof(data, path)
}
# Visualize the results.
vis_pprof <- function(path, host = "localhost") {
server <- sprintf("%s:%s", host, random_port())
message("local pprof server: http://", server)
system2(find_pprof(), c("-http", server, shQuote(path)))
}
# These ports should be fine for the pprof server.
random_port <- function(from = 49152L, to = 65355L) {
sample(seq.int(from = from, to = to, by = 1L), size = 1)
}
# Run stuff.
path <- "profile.proto"
run_profiling(path, n = 2e7)
vis_pprof(path) |
Here's what the flame graph looks like on an HPC system (with, admittedly, |
I just tried a larger number of small targets on SGE, and there is no clear bottleneck. I think the biggest source of overhead is still reading and writing to the cache. Makes me question how much #933 (comment), mschubert/clustermq#147, and mschubert/clustermq#154 will really help With #971, it should be possible to spoof the cache so nothing is saved at all, which should increase speed. I may implement this if someone requests it, but not until people are interested. Related: #575. library(dplyr)
library(drake)
library(jointprof) # remotes::install_github("r-prof/jointprof")
library(profile)
create_plan <- function() {
plan <- drake_plan(
data1 = target(x, transform = map(i = !!seq_len(1e2))),
data2 = target(data1, transform = map(data1, .id = FALSE)),
data3 = target(data2, transform = map(data2, .id = FALSE))
)
plan
}
# Profiling.
run_profiling <- function(
path = "profile.proto",
n = 2e7,
parallelism = "loop",
jobs = 1L
) {
unlink(path)
clean(destroy = TRUE)
x <- data.frame(x = runif(n), y = runif(n))
plan <- create_plan()
rprof_file <- tempfile()
Rprof(filename = rprof_file)
options(
clustermq.scheduler = "sge",
clustermq.template = "sge_clustermq.tmpl"
)
make(
plan,
parallelism = parallelism,
jobs = jobs,
cache = storr::storr_environment(),
history = FALSE,
console_log_file = "drake.log"
)
Rprof(NULL)
data <- read_rprof(rprof_file)
write_pprof(data, path)
clean(destroy = TRUE)
}
# Visualize the results.
vis_pprof <- function(path, host = "localhost") {
server <- sprintf("%s:%s", host, random_port())
message("local pprof server: http://", server)
system2(find_pprof(), c("-http", server, shQuote(path)))
}
# These ports should be fine for the pprof server.
random_port <- function(from = 49152L, to = 65355L) {
sample(seq.int(from = from, to = to, by = 1L), size = 1)
}
# Run stuff.
path <- "profile.proto"
run_profiling(path, n = 10, parallelism = "clustermq", jobs = 16)
vis_pprof(path) |
Prework
drake
's code of conduct.Description
I have a workflow on an SGE cluster where the command of each target takes 5-6 seconds to run but the total processing time is > 2 min. I think it is about time we did a profiling study on HPC. We can use
Rprof()
on both the master and a worker and visualize the results withprofvis
orpprof
.The text was updated successfully, but these errors were encountered: