-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve stores #86
base: main
Are you sure you want to change the base?
Improve stores #86
Conversation
Can you explain why this is needed? |
We need more control over the documents that are stored. I don't think the stores should be making decisions about what to do with the document, the room or the server should make those decisions. |
I may be missing something, but I don't understand why stores shouldn't be independent. Actually, I think they should even live in a separate package. That way we could use them for other transport layers than WebSockets.
The store is created from a document, so an existing store ensures the document exists.
We could add a method to the current YStores, to remove all updates of a document.
That should be at the WebSocket server IMO, not the stores.
I don't see why it's not currently possible. |
That kind of feature is useful for maintenance or debug tooling. It makes sense to provide an API to introspect stored documents outside of the collaboration context. I'm trying to compare the two approaches to get a balance of pros and cons; here is a starting point: Manager of YStores Current code encapsulated in a store manager. class StoreManager(Mapping[str, YStore]):
def __init__(self, store_factory):
self._factory = store_factory
def list(self):
return self.keys()
def __get(self, path):
# Create store if it does not exist
# start get called here I guess?
return store;
def delete(self, path):
# Destroy/stop the store ?
self.__get(path).delete()
def write(self, path, data):
self.__get(path).write(data)
def read(self):
return self.__get(path).read(data)
class BaseYStore:
@abstractmethod
def write(self, data):
pass
@abstractmethod
def read(self):
pass
# Option for start and stop who should be responsible for this?
# Create storage manager
def factory(...):
return MyStore(...)
manager = StoreManager(store_factory=factory) To avoid a risk of inconsistency the document store should not be accessed directly as otherwise the caller can temper with the lifecycle (start and stop) of each document store. And it is harder to control that lifecycle as caller can keep reference of a store they should not. Question: Are YStore stateful? If not - what I think it should -, what is the advantage of keeping in memory a YStore per document that is a stateless actor to carry out IO operations? YStoreManager This proposal class BaseStoreManager:
async def list(self):
return self.keys()
async def delete(self, path):
# Destroy/stop the store
@abstractmethod
async def write(self, path, data):
pass
@abstractmethod
async def read(self, path):
pass
# Start and stop could be handled by the
# object initiating the store manager or
# within the object.
# Create storage manager
manager = MyStoreManager(...) One advantage I see with this proposal is a simpler API. |
The advantage I see with a "Manager of YStores" is that it can manage heterogeneous YStores (for instance a mix of |
This constraint leads to duplicating the Store API in the StoreManager, therefore I agree that it complicates the design. Unless we need to use different stores in the same StoreManager, or we relax this constraint, the second solution looks better (the Store hierarchy class becomes an implementation detail for the end user, so let's keep it simple). |
You could easily achieve it with a multiplexer manager as done for example by some people for the jupyter server content manager. |
I agree that it's better to do simple things easily, and more complicated things with more effort, so let's go with the "YStoreManager" solution. |
5a1adb7
to
6f84e6f
Compare
Thanks, I'll take a closer look soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @hbcarlos
I have one question. Otherwise code looks good for me.
path = tmp_path / "tmp" | ||
store = FileYStore(str(path)) | ||
await store.start() | ||
await store.initialize() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain what is the difference between start
and initialize
?
From what I can see, start
now does nothing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The initialize
is used to create or ensure all the resources needed are available before using the store. I moved it out of start
because the entity that calls it should be the one deciding whether to call it and forget about it or wait until it finishes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that the new initialize
is the old start
. What is start
used for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To create the task group because I saw that some other classes are adding tasks there.
@@ -120,9 +120,6 @@ def on_message(self, value: Callable[[bytes], Awaitable[bool] | bool] | None): | |||
self._on_message = value | |||
|
|||
async def _broadcast_updates(self): | |||
if self.ystore is not None and not self.ystore.started.is_set(): | |||
self._task_group.start_soon(self.ystore.start) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain how the store is started?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One store contains multiple documents now, so it is no longer the responsibility of the room to start the store.
The store should be initialized by the same entity that instantiates it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, and in ypy-websocket
where is it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not in ypy-websocket
. Where is the store instantiated in ypy-websocket
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The store is instantiated outside, but I think it should be the responsibility of the WebSocket server to start and stop the store, since the store lifetime is tied to the server's.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other entities can access the store by starting and stopping it as they wish
Yes, but starting and stopping don't give you fine-grain control over what to start/stop.
We are designing a store that multiple rooms are going to access at the same time I can not just cancel everything I will be cancelling tasks from other rooms!
But it is better for users of ypy-websocket to not care about starting and stopping the store
No is not, less control is never better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's put aside the question of who has the responsibility to start/stop the store for now.
My point is that you basically reverted the use of AnyIO in the way a store is started. A store should create a root task group in which every other tasks are run. Cancelling the root task group cancels all the tasks that were launched in it. It's the whole point of using AnyIO. It ensures that no task is running when the store is stopped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A store should create a root task group in which every other tasks are run. Cancelling the root task group cancels all the tasks that were launched in it. It's the whole point of using AnyIO. It ensures that no task is running when the store is stopped
No, it is not. The point of AnyIO is that every task should have a parent (a task group that handles its life cycle), but It doesn't have to be the same parent for every task. At the same time, the parent doesn't have to be in the class that implements the logic of a task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this is the decision I made for ypy-websocket in general, and stores in particular. I want a store to be self-contained, as far as the tasks it launches. No task should escape from it. See #86 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree with that decision, the store shouldn't be self-contained because the store is not the one launching those tasks.
ypy_websocket/stores/sqlite_store.py
Outdated
self._task_group = create_task_group() | ||
self.started.set() | ||
self._starting = False | ||
task_status.started() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You removed the starting logic that was launching the initialization task in a task group. This start
method now seems useless. It seems that you moved the logic to the initialize
method, but without the benefit of launching it in a task group, which was the whole point of using AnyIO.
Correct me if I'm wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the initialize
because the entity that calls it should decide to wait until it is done or forget about it.
Looking at the code now, I don't think we should use AnyIO in the stores. We should use it in the rooms or the server to organize the different tasks, but not here.
For example, if a room is writing to the store while also reading, then the room should have a task group called write_tasks
, and before cleaning that room, we should wait for that task group to finish, but we can cancel the reading task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but we can cancel the reading task.
You can cancel tasks with AnyIO.
The goal of the start
method was to launch an initialization task in the background, but read/write operations must wait until initialization is done. This way starting the WebSocket server can be quick, if no access to the store is done yet. You should restore this behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can accomplish the same behavior from outside and have more control over the different tasks running.
Creating a task group and adding every task there. This is not a good practice. We should differentiate between tasks., and the lifecycle of a task should be handled by the entity that launches the task not by the store.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creating a task group and adding every task there is not a good practice.
I disagree, it's the whole point of structured concurrency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are misunderstanding the idea of structured concurrency.
Where does it say every task should be in the same group? Where does it say every task should have the same direct parent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In structured concurrency, you have to launch a task in a task group. You can of course nest task groups, but cancelling the root task group cancels all the (sub)tasks.
In ypy-websocket, I made the choice for stores to have a root task group that makes it easy to start and stop them. All tasks a store creates are contained in it, they cannot leak outside.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All tasks a store creates are contained in it, they cannot leak outside.
The problem is that the store doesn't create those tasks. It is not the store that calls self.write()
or self.read()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes the stores to have one instance handling multiple documents instead of instantiating one store per document.