-
Notifications
You must be signed in to change notification settings - Fork 330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache engine for reticulate
using dill
#1210
base: main
Are you sure you want to change the base?
Conversation
This is what cache saving does, therefore it is necesseray that load_session() runs in the same dir or it won't find local python modules.
Hi, thank you for reviving this PR! I will take a closer look at this next week, but after taking a glance I have quick question: does this need to be rebased on the current main branch? It looks like there are some unrelated changes in the PR (e.g., changes to |
Hi, thank you for reviving this PR! I will take a closer look at this next week, but after taking a glance I have Quick question: does this need to be rebased on the current main branch? It looks like there are some unrelated changes in the PR (e.g., changes to |
Question 1: Yes, it needs to be rebased on main because the merge cause conflicts in at least 2 files. I had to fix the conflicts manually. Question 2: This is precisely one of the parts I didn't have time to review. But, yes, I spotted this |
I don't think that any of the changes in the python.R file ( |
R/knitr-engine.R
Outdated
r_obj_exists <- "'r' in globals()" | ||
r_is_R <- "type(r).__module__ == '__main__' and type(r).__name__ == 'R'" | ||
if (py_eval(r_obj_exists) && py_eval(r_is_R)) { | ||
py_run_string("del globals()['r']") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this have side effects (basically meaning the r
object is no longer visible after this code is run)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is run at the end of a knitir block, and currently the r
object in injected again in the Python namespace at the beginning of the next block. An alternative to putting this object directly in the user's __main__
namespace would be to add it to __builtins__
. This would bring another advantage: the user could then create an r
global variable without overwriting it, it would only be masked. Then del r
would unmask it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't put the r
object in __builtins__
, but I did put the "R object class" in __builtins__
. The r
object is not removed before saving the cache anymore, it's just ignored. However, my previous suggestion is still open for discussion.
Hello, there! I'm back to finish this PR. The One potential problem we'll face is that |
Hi @leogama That's great to hear, this will be a great addition to reticulate! Regarding python version compatibility, would it be possible to only enable this feature for newer versions of Python? |
Of course, I prefer to start this as simple as possible and add features incrementally without many edge cases to care about. If you are OK with Python ≥ 3.7, then let's begin with that. I've been studying the Here is a diagram of the execution model I've found so far, which probably has some holes: |
@yihui Can you please advise on #1210 (comment)? |
The diagram looks about right to me. Great job! :) BTW, it will be great to have @tmastny look at this PR if he has time. |
@yihui, I'm trying to generate this with My idea is to generate something like a UML activity diagram to truthfully represent the execution model. When it's polished enough, e.g. with file/line references in the nodes, I may submit it to I'll probably also need to write down some schemes to wrap my head around the various cache options and the cache invalidation criteria, if we'd like to reproduce them for Python... |
@t-kalinowski: do you think it's better to add features to the cache mechanism in this single PR or to just implement the basics here and split the features in separate PRs? |
@yihui We have a small issue concerning the working directories where the cache code chunks are run. It seems that the R cache code always run in the Both functions
I advocate for the third option, and that cache engines should use the same API as the R cache, i.e. a list object with functions as "methods", like |
However you think is easiest. I'm happy to engage and review either way. |
@leogama I agree with you. That seems to require a change in knitr, right? Please feel free to submit a PR there. Thanks! |
Great. I think I'll restrict this PR to the basic cache mechanism and then open other PRs for extra features like the chunk options |
We have a test file now. I think it's time to run the workflows. |
Thanks for authorizing the workflows. The new test didn't run because I hadn't add I'll work on the documentation next. |
@kevinushey How should I generate new |
I added the |
|
Hi @leogama, is this ready to merge? |
Not yet. It's waiting for a new release of https://github.com/uqfoundation/dill |
@t-kalinowski dill v0.3.6 was finally released. I'll update the PR (it needs some changes) and finish this by the weekend. |
Hi, @t-kalinowski. Sorry for the hiatus. I've adapted the code and tests for the released (*) I think it'll be necessary to somehow install my approved (but not merged) branch from |
Hi @leogama, welcome back! I'm glad to help get this into main. In the interim, we can add a github actions workflow step that installs the appropriate knitr branch in the runners, just to confirm everything passes. Once we're happy we can help coordinate getting the knitr PR merged and into the next CRAN release, and then merge this PR and out to CRAN. It'll have to be done in stages, with knitr going to CRAN first I believe. |
Sure! How do I set up a custom package installation from a GitHub branch in Workflows? I have absolutely no idea. |
@t-kalinowski I've added the |
@leogama I'd like to help get this ready to merge into main. I think that all the packages where this branch had dependencies on development versions of have since been released to CRAN, so it should simplify fixing the CI. |
Hello there! I bring good news.
The Python package
dill
—"serialize all of python"— is about to release a new version (0.3.5), that will likely include my patches for its session saving and restoring functionality. Therefore, I'm back to work in the cache feature forreticulate
based on @tmastny's proposal from some years ago (#167).My idea is then to require
dill>=0.3.5
as the previous versions have too many problems to work around.I didn't review the entire code from the original PR yet, just enough to make it work again with the current
main
branch, so there's probably a little more work to do (maybe even onknitr
's side). Criticism and suggestions are welcome.