-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: sink from Vector to Risingwave #21308
Comments
If risingwave aims to be postgres compatible, can’t this be done with the postgres sink? I’ve opened a PR for that #21248 Also, it seems that your sink encodes the payload as Bytes, it does not support structured data in arbitrary table schemas, right? With the use of jsonb_populate_record (https://docs.risingwave.com/docs/current/sql-function-json/#jsonb_populate_record) as I did in the postgres sink, will allow to support structured events easier. Does it make sense to have a completely separate sink if risingwave seems postgres-compatible @jszwedko ? |
Thank you @jorgehermo9 for the pointer! I agree that almost all the logic in #21248 can be and should be reused for RW. By using the approach in PR #21248, in RW we can do the following:
Unfortunately,
is not supported in RW yet. But we can also support it in RW if this is required by Vector's PostgreSQL sink. Another difference between RW and Postgres is that RW needs a Instead of issuing an explicit |
risingwavelabs/risingwave#18601 We will support |
Hi @lmatz , sorry I couldn't answer earlier.
I could do that in my opened PR... But maybe it makes sense to merge my PR without that (so it is not bloated with lot of changes) and once it is stabilized, I can submit another PR addressing the implicit flush for RW via the sink config. What do you think?
Thats very good news! Please, note that in my PR I use If you need some help about that, I'm willingly to help (both Vector or RW side). Although I'm not familiar with RW code, I always wanted to contribute to it and this seems like a good chance. I may take a look these days if I'm bored enough 😄 |
If it is possible to reuse the PostgreSQL sink that is in-flight, that would be easier for us to maintain (the less code the better 😄). Would there be any downsides to that approach? |
I think once the postgres sink stabilizes, we can add a sink config like |
A note for the community
Use Cases
The user can use vector.dev as the source of RW to easily re-ingest/replay events into Risingwave on demand.
Risingwave is a streaming database that tries to be compatible with PostgreSQL as much as possible.
We have a working example in RW's repo: https://github.com/risingwavelabs/risingwave/tree/main/integration_tests/vector
Attempted Solutions
No response
Proposal
RW has forked Vector in https://github.com/risingwavelabs/vector.
The way the sink inserts data into RW is done via the "insert into" statement: https://github.com/risingwavelabs/vector/blob/f9c186f01b1a84ac402b6657e48d83e7af01b0c4/src/sinks/risingwave/service.rs#L133
Then it issues a
flush
command, which is a special command in RW, to ask RW to commit the data just inserted. https://github.com/risingwavelabs/vector/blob/f9c186f01b1a84ac402b6657e48d83e7af01b0c4/src/sinks/risingwave/service.rs#L145After
flush
successfully returns, in case of a failure of RW, the data will still be stored in RW after recovery.We have verified this in RW's customers' production environment.
References
No response
Version
No response
The text was updated successfully, but these errors were encountered: