-
-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a public API for getting a read-only view of the shared model #275
Add a public API for getting a read-only view of the shared model #275
Conversation
jupyter_collaboration/app.py
Outdated
fork_ydoc = Doc() | ||
fork_ydoc.apply_update(update) | ||
|
||
return YDOCS.get(content_type, YDOCS["file"])(fork_ydoc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this will be a snapshot of the document at the time this method is called, but don't you want a live document, with all future updates applied?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. For my use case I am fine with having just a snapshot, but maybe having a proper view which gets the future updates would make more sense for others?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This view is what we call a fork in the suggestion system (see here). The fork is a copy of the root, just like you did, but it is continuously rebased on the root, so that it gets all the live updates from the root. On the other hand, the fork's updates are not applied to the root just yet, only when the suggestion is accepted do we merge it back into the root.
I don't know if you would be fine with having a live fork, or you strictly want a snapshot. In the latter case, I'm wondering why you still need a shared document, rather than just converting it to a plain Python object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if you would be fine with having a live fork.
Maybe.
I do not strictly want a snapshot. I am indifferent to that, as long as it does not force me to wait for the lock/update/etc.
In the latter case, I'm wondering why you still need a shared document, rather than just converting it to a plain Python object?
Converting a notebook to a plain Python object takes 4 times longer (#270 (comment)). 30 ms on a notebook with 100 empty cells makes me worried that this would be unsuitable for the completions use case especially when notebooks have large visualisations in outputs etc. Often we will only want last/next ~10 cells so we would not be converting full notebook to Python ever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also directly connect to the shared document (not a copy), and convert a subset of it in the frontend. For instance, get the content of the last 10 cells.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is exactly what I am doing in jupyterlab/jupyter-ai#708 and plan to use this PR for. The question is do we want to return:
- a frozen read-only model copy
- a live read-only model
- a live read-write model
Again, for my use case this is really indifferent. From the point of view of API design I thought that starting with the smallest possible API surface (frozen model copy) and then expanding it in backward-compatible way (adding live updates, adding write access) seems like an easy decision (well maybe that is not fully backward compatible, but it would not be breaking for most use cases).
Currently this PR returns a frozen (but not read-only) copy of the model. I believe I can make it read-only by monkey-patching some methods in the Model instance after creating the copy so that they raise an exception when write is attempted.
What is the best next step/solution in your opinion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm missing something, but I don't understand the benefit of having a frozen copy of the model. Memory-wise, it will be bigger than converting the live model to a Python object.
I think get_document()
should just return the shared model, not a copy of it. Then a subset of it can be converted to a Python object, which can be considered read-only as it won't affect the original shared model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but I don't understand the benefit of having a frozen copy of the model
There is not any - other than that it is a simple way of ensuring that other extensions do not start writing to the shared model via this new public API.
I think
get_document()
should just return the shared model, not a copy of it.
I am happy with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about adding a parameter get_document(copy=True)
? In the future we could also add live=True
, which would correspond to the use-case we have in the suggestion system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds fine to me, done in: 3a8ba7d
Co-authored-by: David Brochart <[email protected]>
24d90cc
to
a87e20e
Compare
@davidbrochart should we merge and iterate or do you have any further suggestions to address here first? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's merge, thanks Mike!
Closes #270
Adds
get_document(path, content_type, file_format)
method to theYDocExtension
.The returned object is a fork of the original document, thus any modifications are not propagated back.