-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Subscriptions #337
Comments
Not at the moment or in the near-term future, but it could be something we'd want to add later on (sorry for a bit of a late response). |
On thanks .. will close then |
i Wonder if the new rust delta lane storage changes this ?? |
Not really; note that the delta-rs migration effectively only changed the storage layer behind DataFusion's If you have some particular scenario/problem in mind feel free to add it here, so that it can serve as a guideline if/when we come back to this issue (it may also be that we think of another solution). |
Well I am thinking about how a users does a mutation into data fusion. One way is use CDC off the original system like postresql or google sheets. So the mutations goes into that system. Then we have the reactive data scenarios. If you have a CDC then you can see the changes of the master data source like postresql or google sheets, so then you can get the the middle and correlate what data changes against what your uses are currently looking out that they did a query for a minute again , and so send them a change stream. This only works though if you can do that correlation. do you get the concept ? i already do cdc on all this stuff . also you guys are not using Flight SQL yet ? Cause this helps with doing the correlation . |
I'm probably missing something here, but if the underlying issue is simply about making sure the users see the latest data note that this happens automatically once a change has been made. This touches on
in Seafowl we support a couple of ways of changing data (all of them batch-oriented):
As soon as some data is written to a table, this results in a new table version. This annuls the cached responses, and all subsequent queries always target the latest version by default. You can also target previous table versions using the time travel syntax (as well as look at e.g. the diff between successive table versions).
More likely, maybe you're referring to enabling streaming replication from master data sources using CDC files in a particular format, as a new approach to changing/writing data in addition to the ones above?
No, not currently. |
Thanks @gruuya for the in-depth links. It’s me that’s missing something :) I did not realise these aspects were done by seafowl for mutations. I am still getting familiar with data fusion aspects. about Streaming changes via CDC. I currently do this already in other projects using nats message broker. I really want to adopt DataFusion and so am brain storming it with you. The more I think about it the more I see it as being non intrusive. The cdc system just needs a way to talk to data fusion system and transform the data in. I can track what user is “ watching what query “ out of bounds , and so then just rerun the query on data fusion and send the deltas to the user. I wonder if there is a way to track the version of a query and the data as a hash ? So then because data fusion versions data I can iWork out the diff a d send that to the user. So then I have subscriptions out of bounds of the data fusion system. If you want to have a chat / video about it I would really like to explore it .. |
Am reopening . Hoping to discuss this .. |
Is this available at the API / http level ? Cause then I can do subscriptions and only send what changed. I do all this by wrapping seafowl api with NATS message broker and so know what users are doing what. It also gives my global load balancing and auth / authz. |
Aha no worries, I think we're (getting) on the same page.
You could look at the ETag of the response, as this is currently based on the which specific data files are used in a query. Note that this doesn't involve changes to the query plan, hence issue #57. (There's also the
Oh sure, I'd love to see if we can converge here—how about discord for starters: https://discord.gg/eFEFRKm?
Yup, it works as any other query (check out the prior-to-last section of this blog post for an example); we just have the awkward syntax since the |
I'm also interested in this topic. I'm grateful to be trying Seafowl for log storage. As new data is constantly coming in, the table version keeps increasing, which causes the cache to be invalidated continuously. As a workaround strategy, I'm creating tables by date or using similar approaches. However, it would be great if we could achieve the same effect within a single table. I'm not quite sure how to do this yet! |
Is there a way to subscribe to a query and only get changed / new / deleted results ?
The text was updated successfully, but these errors were encountered: