Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache engine for reticulate using dill #1210

Open
wants to merge 29 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
228624b
added unit tests that cover knitr #1505
tmastny Feb 26, 2018
b52ecaa
added cache_eng_python to add Python session caching between chunks. …
tmastny Feb 26, 2018
c345ce2
dill caching engine for knitr, with tests
tmastny Feb 28, 2018
ff39889
changes from feedback on knitr #1518 with updated tests
tmastny Apr 18, 2018
8e07779
fixed testing utils source in dill tests
tmastny Apr 19, 2018
638a4e7
Merge 'rstudio/main' with 'tmastny/master' into branch 'cache-engine'
leogama May 13, 2022
9753870
cache engine: update 'r' object identification logic
leogama Apr 14, 2022
02c1771
fix 'cache_path' when 'output.dir' is different from 'knitr:::input_d…
leogama Apr 20, 2022
dbebab3
cache loading should run in the input directory
leogama Apr 20, 2022
bd29f84
remove duplicated conversion functions
leogama May 26, 2022
f8497a0
Merge branch 'main' into cache-engine
leogama Sep 3, 2022
fe4cd9f
remove trailing whitespaces and empty line
leogama Sep 3, 2022
5d6f7a7
First version of cache implementation with new knitr API
leogama Sep 13, 2022
a33ed39
Expose the cache$available() method to knitr
leogama Sep 14, 2022
266463c
Use the same warning for missing and old dill module cases
leogama Sep 15, 2022
445a5ca
Set environment() as default argument in eng_python_initialize()
leogama Sep 15, 2022
c6a88ad
Basic test for knitr engine cache
leogama Sep 17, 2022
975c1b0
minor
leogama Sep 17, 2022
401b1ba
Workflows: install module dill in the testing virtualenv
leogama Sep 17, 2022
62c77d8
Docs: remove @params from cache_eng_python, add it to pkgdown index
leogama Sep 17, 2022
f487b52
Correctly initialize Python in knitr, honoring 'engine.path'
leogama Sep 19, 2022
cb9ee1f
Implement the 'cache.vars' chunk option; some style changes
leogama Sep 19, 2022
7d4eeec
Remove unused 'envir' parameter from 'eng_python_initialize*' functions
leogama Sep 20, 2022
395627e
update cache engine docs
leogama Dec 13, 2022
55d1e03
cache: adapt code and tests to dill package v0.3.6
leogama Dec 13, 2022
d43b593
fix typo, update generated documentation
leogama Dec 13, 2022
f354f60
Workflow: use PR branch from knitr for testing
leogama Dec 16, 2022
38ef3ce
fix typo
leogama Dec 20, 2022
79b9732
Merge branch 'main' into cache-engine
t-kalinowski Jun 21, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,7 @@ export("%as%")
export(PyClass)
export(array_reshape)
export(as_iterator)
export(cache_eng_python)
export(conda_binary)
export(conda_clone)
export(conda_create)
Expand Down
65 changes: 65 additions & 0 deletions R/knitr-engine.R
Original file line number Diff line number Diff line change
Expand Up @@ -244,6 +244,10 @@ eng_python <- function(options) {
outputs$push(output)
}

if(options$cache > 0) {
save_python_session(options$hash)
}

# if we had held outputs, add those in now (merging text output as appropriate)
text_output <- character()

Expand Down Expand Up @@ -292,6 +296,16 @@ eng_python_initialize <- function(options, envir) {
ensure_python_initialized()
eng_python_initialize_hooks(options, envir)

if (options$cache > 0) {
module <- tryCatch(import("dill"), error = identity)
if (inherits(module, "error")) {
if (module$message == "ImportError: No module named dill") {
leogama marked this conversation as resolved.
Show resolved Hide resolved
warning("The Python module dill was not found. This module is needed for full cache functionality.")
} else {
stop(module$message)
}
}
}
}

eng_python_knit_figure_path <- function(options, suffix = NULL) {
Expand Down Expand Up @@ -628,3 +642,54 @@ eng_python_autoprint <- function(captured, options, autoshow) {
}

}

save_python_session <- function(cache_path) {
module <- tryCatch(import("dill"), error = identity)
if (inherits(module, "error")) {
if (module$message == "ImportError: No module named dill") return()
leogama marked this conversation as resolved.
Show resolved Hide resolved
signalCondition(module$message)
}

r_obj_exists <- "'r' in globals()"
r_is_R <- "type(r).__module__ == '__main__' and type(r).__name__ == 'R'"
if (py_eval(r_obj_exists) && py_eval(r_is_R)) {
leogama marked this conversation as resolved.
Show resolved Hide resolved
py_run_string("del globals()['r']")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this have side effects (basically meaning the r object is no longer visible after this code is run)?

Copy link
Author

@leogama leogama May 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is run at the end of a knitir block, and currently the r object in injected again in the Python namespace at the beginning of the next block. An alternative to putting this object directly in the user's __main__ namespace would be to add it to __builtins__. This would bring another advantage: the user could then create an r global variable without overwriting it, it would only be masked. Then del r would unmask it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't put the r object in __builtins__, but I did put the "R object class" in __builtins__. The r object is not removed before saving the cache anymore, it's just ignored. However, my previous suggestion is still open for discussion.

}

cache_path <- file.path(knitr::opts_knit$get("output.dir"), cache_path)
module$dump_session(filename = paste0(cache_path, ".pkl"), byref = TRUE)
}

#' A reticulate cache engine for Knitr
#'
#' This provides a `reticulate` cache engine for `knitr`. The cache engine
#' allows `knitr` to save and load Python sessions between cached chunks. The
#' cache engine depends on the `dill` Python module. Therefore, you must have
#' `dill` installed in your Python environment.
#'
#' The engine can be activated by setting (for example)
#'
#' ```
#' knitr::cache_engines$set(python = reticulate::cache_eng_python)
#' ```
#'
#' Typically, this will be set within a document's setup chunk, or by the
#' environment requesting that Python chunks be processed by this engine.
#'
#' @param options
#' List of chunk options provided by `knitr` during chunk execution.
#' Contains the caching path.
#'
#' @export
cache_eng_python <- function(options) {
module <- tryCatch(import("dill"), error = identity)
if (inherits(module, "error")) {
if (module$message == "ImportError: No module named dill") return()
leogama marked this conversation as resolved.
Show resolved Hide resolved
stop(module$message)
}

cache_path <- normalizePath(paste0(options$hash, ".pkl"), mustWork = TRUE)
knitr:::in_input_dir(module$load_session(filename = cache_path))
}


60 changes: 60 additions & 0 deletions R/python.R
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,66 @@ summary.python.builtin.object <- function(object, ...) {
str(object)
}


#' Convert between Python and R objects
#'
#' @inheritParams import
#' @param x Object to convert
#'
#' @return Converted object
#'
#' @name r-py-conversion
#' @export
py_to_r <- function(x) {

ensure_python_initialized()

if (!inherits(x, "python.builtin.object"))
stop("Object to convert is not a Python object")

# get the default wrapper
x <- py_ref_to_r(x)

# allow customization of the wrapper
wrapper <- py_to_r_wrapper(x)
attributes(wrapper) <- attributes(x)

# return the wrapper
wrapper
}

#' R wrapper for Python objects
#'
#' S3 method to create a custom R wrapper for a Python object.
#' The default wrapper is either an R environment or an R function
#' (for callable python objects).
#'
#' @param x Python object
#'
#' @export
py_to_r_wrapper <- function(x) {
UseMethod("py_to_r_wrapper")
}

#' @export
py_to_r_wrapper.default <- function(x) {
x
}





#' @rdname r-py-conversion
#' @export
r_to_py <- function(x, convert = FALSE) {

ensure_python_initialized()

r_to_py_impl(x, convert = convert)
}


#' @export
`$.python.builtin.module` <- function(x, name) {

Expand Down
25 changes: 25 additions & 0 deletions man/cache_eng_python.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

20 changes: 20 additions & 0 deletions tests/testthat/resources/eng-reticulate-cache-test.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
---
title: "Using reticulate's Python Engine with knitr"
---

```{r setup, include = FALSE}
library(reticulate)
knitr::opts_chunk$set(cache=TRUE)
```

Cache can handle changes to second chunk:

```{python}
x = 1
```

```{python}
print(x + 1)
```


46 changes: 46 additions & 0 deletions tests/testthat/test-python-cache-engine.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
context("knitr-cache")

test_that("An R Markdown document can be rendered with cache using reticulate", {

skip_on_cran()
skip_if_not_installed("rmarkdown")
skip_if_not_installed("callr")

unlink("resources/eng-reticulate-cache-test_cache/", recursive = TRUE)

path <- callr::r(
function() {
rmarkdown::render("resources/eng-reticulate-cache-test.Rmd", quiet = TRUE, envir = new.env())
})
expect_true(file.exists(path))
on.exit(unlink(path), add = TRUE)
})

test_that("An R Markdown document builds if a cache is modified", {

skip_on_cran()
skip_if_not_installed("rmarkdown")
skip_if_not_installed("callr")

old_var <- "1"
new_var <- "0"
mutate_chunk <- function(x) {
print_line <- 17
file_text <- readLines("resources/eng-reticulate-cache-test.Rmd")
file_text[print_line] <- paste0("print(x + ", x, ")")
writeLines(file_text, "resources/eng-reticulate-cache-test.Rmd")
}
mutate_chunk(old_var)
mutate_chunk(new_var)
path <- callr::r(
function() {
rmarkdown::render("resources/eng-reticulate-cache-test.Rmd", quiet = TRUE, envir = new.env())
})
mutate_chunk(old_var)
expect_true(file.exists(path))
on.exit(unlink(path), add = TRUE)
on.exit(unlink("resources/eng-reticulate-cache-test_cache/", recursive = TRUE), add = TRUE)
})



37 changes: 37 additions & 0 deletions tests/testthat/test-python-dill.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
context("dill")

source("helper-utils.R")

test_that("Interpreter sessions can be saved and loaded with dill", {
skip_if_no_python()
skip_if_not_installed("callr")

session_one_vars <- callr::r(
function() {
module_load <- tryCatch(
dill <- reticulate::import("dill"),
error = function(c) {
py_error <- reticulate::py_last_error()
if(py_error$type == "ImportError" && py_error$value == "No module named dill") {
"No dill"
}})
if (module_load == "No dill") return(module_load)
main <- reticulate::py_run_string("x = 1")
reticulate::py_run_string("y = x + 1")
dill$dump_session(filename = "x.dill", byref = TRUE)
c(main$x, main$y)
})
if (session_one_vars[1] == "No dill")
skip("The dill Python module is not installed")

session_two_vars <- callr::r(
function() {
dill <- reticulate::import("dill")
dill$load_session(filename = "x.dill")
main <- reticulate::py_run_string("pass")
c(main$x, main$y)
})
on.exit(unlink("x.dill"), add = TRUE)
expect_equal(session_one_vars, session_two_vars)
})

27 changes: 27 additions & 0 deletions tests/testthat/test-python-globals.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
context("globals")

source("utils.R")

test_that("Interpreter sessions can be saved and loaded with dill", {
skip_if_no_python()

py_run_string("x = 1")
py_run_string("y = 1")
py_run_string("[globals().pop(i) for i in ['x', 'y']]")

test_x <- tryCatch(
py_run_string("x = x + 1"),
error = function(e) {
py_last_error()$value
}
)
test_y <- tryCatch(
py_run_string("y = y + 1"),
error = function(e) {
py_last_error()$value
}
)
expect_equal(test_x, "name 'x' is not defined")
expect_equal(test_y, "name 'y' is not defined")
})