Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a public API for getting a read-only view of the shared model #275

Merged

Conversation

krassowski
Copy link
Member

Closes #270

Adds get_document(path, content_type, file_format) method to the YDocExtension.

The returned object is a fork of the original document, thus any modifications are not propagated back.

@krassowski krassowski added the enhancement New feature or request label Apr 1, 2024
Copy link
Contributor

github-actions bot commented Apr 1, 2024

Binder 👈 Launch a Binder on branch krassowski/jupyter_collaboration/public-api-for-getting-document

fork_ydoc = Doc()
fork_ydoc.apply_update(update)

return YDOCS.get(content_type, YDOCS["file"])(fork_ydoc)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this will be a snapshot of the document at the time this method is called, but don't you want a live document, with all future updates applied?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. For my use case I am fine with having just a snapshot, but maybe having a proper view which gets the future updates would make more sense for others?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This view is what we call a fork in the suggestion system (see here). The fork is a copy of the root, just like you did, but it is continuously rebased on the root, so that it gets all the live updates from the root. On the other hand, the fork's updates are not applied to the root just yet, only when the suggestion is accepted do we merge it back into the root.
I don't know if you would be fine with having a live fork, or you strictly want a snapshot. In the latter case, I'm wondering why you still need a shared document, rather than just converting it to a plain Python object?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if you would be fine with having a live fork.

Maybe.

I do not strictly want a snapshot. I am indifferent to that, as long as it does not force me to wait for the lock/update/etc.

In the latter case, I'm wondering why you still need a shared document, rather than just converting it to a plain Python object?

Converting a notebook to a plain Python object takes 4 times longer (#270 (comment)). 30 ms on a notebook with 100 empty cells makes me worried that this would be unsuitable for the completions use case especially when notebooks have large visualisations in outputs etc. Often we will only want last/next ~10 cells so we would not be converting full notebook to Python ever.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also directly connect to the shared document (not a copy), and convert a subset of it in the frontend. For instance, get the content of the last 10 cells.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly what I am doing in jupyterlab/jupyter-ai#708 and plan to use this PR for. The question is do we want to return:

  • a frozen read-only model copy
  • a live read-only model
  • a live read-write model

Again, for my use case this is really indifferent. From the point of view of API design I thought that starting with the smallest possible API surface (frozen model copy) and then expanding it in backward-compatible way (adding live updates, adding write access) seems like an easy decision (well maybe that is not fully backward compatible, but it would not be breaking for most use cases).

Currently this PR returns a frozen (but not read-only) copy of the model. I believe I can make it read-only by monkey-patching some methods in the Model instance after creating the copy so that they raise an exception when write is attempted.

What is the best next step/solution in your opinion?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing something, but I don't understand the benefit of having a frozen copy of the model. Memory-wise, it will be bigger than converting the live model to a Python object.
I think get_document() should just return the shared model, not a copy of it. Then a subset of it can be converted to a Python object, which can be considered read-only as it won't affect the original shared model.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I don't understand the benefit of having a frozen copy of the model

There is not any - other than that it is a simple way of ensuring that other extensions do not start writing to the shared model via this new public API.

I think get_document() should just return the shared model, not a copy of it.

I am happy with that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about adding a parameter get_document(copy=True)? In the future we could also add live=True, which would correspond to the use-case we have in the suggestion system.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds fine to me, done in: 3a8ba7d

jupyter_collaboration/app.py Outdated Show resolved Hide resolved
jupyter_collaboration/app.py Show resolved Hide resolved
@krassowski krassowski force-pushed the public-api-for-getting-document branch from 24d90cc to a87e20e Compare April 11, 2024 16:32
@krassowski
Copy link
Member Author

@davidbrochart should we merge and iterate or do you have any further suggestions to address here first?

Copy link
Collaborator

@davidbrochart davidbrochart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge, thanks Mike!

@davidbrochart davidbrochart merged commit cf774a4 into jupyterlab:main Apr 12, 2024
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Public API to get a view of the shared document model
2 participants