Introduce redis storage which syncs between multiple instances #4074

rodja · 2024-12-08T08:00:12Z

This PR tries to solve #1606 by implementing a redis variant of the persistent dictionaries currently used for app.storage. This PR will not solve every remote-syncing-problem out there. It is meant as a transparent (eg. non-api-breaking) drop in replacement for the existing local storage mechanism.

NOTE: this is in very early development.

ToDos

make redis server url configureable via ui.run and ui.run_with, if none is given we use the local storage solution
introduce optional redis dependency in pyproject.toml
make db prefix configurable?
think about an elegant way to not sync the whole data set (and hence decide if we can do this in the scope of this PR)
make it possible to use app.storage.user with redis
make it possible to use app.storage.tab with redis
documentation

…p.storage.general

rodja · 2024-12-08T08:07:10Z

I started experimenting and found a simple testing setup:

start a redis server locally with docker run -d --name redis -p 6379:6379 redis

use the code in this PR and start two NiceGUI servers (with different ports):

from nicegui import ui, app

@ui.page('/')
def index():
    ui.input().bind_value(app.storage.general, 'text')

ui.run(port=int(sys.argv[1]))

if you type something in the input it appears also in the input of the other instances

Alyxion · 2024-12-08T11:33:54Z

Very interesting feature. We are currently synching manually with redis, using the browser tab uid as part of the key, if this could be automated some point this were great of course.

As changes in such "shared" variables are in real application likely not as consequenceless like synching two edit fields but might need to trigger a programmatical reaction if some data is changed, is this integrated already?

rodja · 2024-12-08T12:40:12Z

@Alyxion app.storage uses ObservableDict and hence can have code which executes when something changes. To make this available between instances we would need to publish only the data which changed instead of the full dict. This is not easy because redis only allows a flat key/value hierarchy. Another way would be to always publish the whole dict but on the subscriber side walk the whole dict and trigger events only for keys which have changed.

Alyxion · 2024-12-09T00:20:22Z

Thanks, just found the time to do a full code review, really good job so far.

The redis server url can not be configured yet nor can the prefix but I assume thats due to the fact its still work in progress.

Conceptual though my 2 cents:

The RedisDict is always synching the whole dict in one go, I think this will massively limit the real-world scenarios for this feature beyond some visitor or traffic counters or hello world samples for the nicegui documentation.

Imagine for example you have one big shared setting, 500 kb in size which shall be synched via the redis storage. And then you have e.g. a call meter, token counter for an LLM or what ever dynamic kind of live data. When ever you now just want to change the call meter, may be thousand times per hour, the 500kb will get distributed over and over again as well, the big JSON needs to be parsed burning performance. Or with other words: I definitlely think at least a minimal approach, e..g for different "groups" of data (changing thrice a day vs may be 10 times a second) should be considered.
If I interpreted the code right the lazy background task for publishing to REDIS will trigger "without remorse" for every minor change. This could / will easily pile up to a bombardement of broadcast publishs and resulting REDIS gets. Just imagine 1000 concurrent users, each just triggering a change just once all 5 seconds, then each server would sync 200 times per second.
Basically the same as for the previous point: I think there might be variables which need an asap synch and others which might also be totally fine just be synched once all 10 seconds.
Potential changes in format of the data, e.g. version changes, are not considered yet. This could for example be part of a configurable prefix.
One remark independent of this feature but general about NiceGUIs storage mechanisms: Most modern Python solutions from FastAPI itself via OpenAI, HuggingFace etc. nowadays use Pydantic to define data models, document them, ensure type safety when parsing them, transporting them between different languages and/or systems etc.

This is in my opinion also the way to go and thus mandatory for all projects at our company - and accessing dictionaries by string constants definitely a relict of the past. Unfortunately also this new feature does not embrace "modern Python" even though it would benefit a lot from it. For example if the user would provide a Pydantic model for the redis storage, then you could easily create an MD5 of the schema to ensure compatibility between "whats stored in REDIS" and what the new software actually awaits, e.g. by putting it into the prefix or by storing it in the data itself to ensure the versions are still compatible. Pydantics Field metadata could also e.g be used to solve the points above, e.g. "how often shall this sub object be synchronized" such as
```
class MyRedisData(BaseModel):
   my_dynamic_data: MyNiceGuiAppsDynamicData = Field(..., description="Very often changing data...", sync_interval_s=5.0)
   my_large_data: MyNiceGuisLazyUpdatingData = Field(..., sync_interval=60)
```
And yes, of course you can somehow "hack" Pydantic models also in the current ObservableDics by always dumping them completely and always parsing them completely and storing there somewhere else but its not very elegant.

rodja · 2024-12-09T04:44:35Z

Thanks for this excellent observations. You are right. There are still a lot of things to be done. I've started a checklist in the PR description to clarify what is missing to go from draft to review.

The RedisDict is always synching the whole dict in one go

Yes. After further investigation, I believe it’s beyond the scope of this pull request to handle it differently. The underlying ObservableDict simply fires an on_change event without indicating what has changed. We either need to add these infos or override all dict modification operators (pop, clear, update, ...) in the redis persistent dict.

If I interpreted the code right the lazy background task for publishing to REDIS will trigger "without remorse" for every minor change.

Yes, this is by design to keep things simple. See below.

Potential changes in format of the data, e.g. version changes, are not considered yet.

This is also true for the existing local persistence.app.storage is just a storage mechanism and does not care about migrations. These should be done by the user code.

Most modern Python solutions from FastAPI itself via OpenAI, HuggingFace etc. nowadays use Pydantic to define data models

You are right. But Pydantic is quite slow compared to json etc. We used to have Pydantic everywhere a few years back and need to throw it out the window because of performance cosiderations in our robotics applications. Pydantic is great when you get external data where you can not be sure about the right structure, types etc. But I don't think this is the case when using app.storage.

As a closing comment, I want to clarify a point which I'll also put in the description of the PR because it is so fundamental:

This PR will not solve every remote-syncing-problem out there. It is meant as a transparent (eg. non-api-breaking) drop in replacement for the existing local storage mechanism.

Your points are all valid but some are just out of scope of what we want to provide here. In the future we may provide a more flexible storage api. See #1606 (reply in thread)

Alyxion · 2024-12-09T20:47:17Z

Thank you for the through explanation.

I personally wasn't aware of the performance implications of Pydantic but I think thats because we are more thinking here in dozens of messages exchanged between our servers per second rather than thousands per second on an edge device which might be sent to your Feldfreund, good to know though, will keep an eye on it.

Regarding your closing comment: Totally aware of that. Though I still think not being to somehow sync the storage "smarter" than always dumping and parsing that whole thing and very quickly having 95% garbage and just 5% relevant update data is sub optimal :).

rodja · 2024-12-10T04:18:20Z

I think there are quite some use cases which would work fine with a full dictionary update. Our existing local storage is also doing it that way because app.storage was not meant to be a database but rather a quick way to save some settings. We can add smarter and more powerful storage options later on. These could be done with RedisJSON, etcd or even MongoDB. But for this PR I would like to keep things simple to allow easy setup and quickly have something to work with until more sophisticated solutions are implemented.

extracted classes from storage.py and first minimal Redis sync for ap…

7aff68d

…p.storage.general

rodja added the enhancement New feature or request label Dec 8, 2024

cleanup

c98a79d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce redis storage which syncs between multiple instances #4074

Introduce redis storage which syncs between multiple instances #4074

rodja commented Dec 8, 2024 •

edited

Loading

rodja commented Dec 8, 2024

Alyxion commented Dec 8, 2024 •

edited

Loading

rodja commented Dec 8, 2024

Alyxion commented Dec 9, 2024

rodja commented Dec 9, 2024 •

edited

Loading

Alyxion commented Dec 9, 2024

rodja commented Dec 10, 2024

Introduce redis storage which syncs between multiple instances #4074

Are you sure you want to change the base?

Introduce redis storage which syncs between multiple instances #4074

Conversation

rodja commented Dec 8, 2024 • edited Loading

rodja commented Dec 8, 2024

Alyxion commented Dec 8, 2024 • edited Loading

rodja commented Dec 8, 2024

Alyxion commented Dec 9, 2024

rodja commented Dec 9, 2024 • edited Loading

Alyxion commented Dec 9, 2024

rodja commented Dec 10, 2024

rodja commented Dec 8, 2024 •

edited

Loading

Alyxion commented Dec 8, 2024 •

edited

Loading

rodja commented Dec 9, 2024 •

edited

Loading