-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Goal: real-time Indexes and Primitives #199
Comments
@brennanjl do you have any ideas on how we can achieve this? Did you guys spin-off the doc by any chance? |
Hi @zolotokrylin, in
I suggest utilizing the RPC or HTTP endpoints in our Python repos. So, after there is an insert to the Truflation Database, the Python repos also hit the RPC or HTTP endpoints of |
@zolotokrylin it seems like there is not a description for this goal. Could you provide more context? |
I have some ideas. I'll share them soon. |
@brennanjl - This refers to the goal of enabling Kwil data to be served in real-time, rather than in batch. A rudimentary approach is for RPC Nodes to facilitate WebSocket connections to clients where specific queries are passed as parameters. The connected Node will poll the query locally on each new block for the duration of the WebSocket connection, and serve the latest data over the connection. An alternative approach for near-real-time ticks is to:
If mempool data is indeed inconsistently distributed across Nodes, please let me know... Otherwise, I assume each data point is broadcasted, and that mempool is the state between received data, and consensus data. Furthermore, if Temp Tables are too persistent, database schema relevant to active WebSocket connections can be replicated in real-time to something like https://github.com/electric-sql/pglite, where relevant mempool data can reside for serving data until it is flushed after connection is closed + cooldown period.
Near real-time feeds must be optimistic to be fast. A security vulnerability: MEV - Bad actors can participate in the Network to serve bad data specifically for DeFi protocols powered by Truflation, where tampered data can yield a positive value extraction that outweighs the slash/loss as a Kwil Operator. Side note: All of these suggestions can be incrementally incorporated. PS. Happy to workshop this over a call. |
A very good question @zolotokrylin for this topic is: What's the business case? High-frequency ticks look good, but DeFi protocols are generally restricted regardless of the transaction speed of the destination blockchain. Furthermore, Oracles are generally pull-based. If this is for Web2 users... then MEV isn't an issue (but should be publicly documented) and optimistic data delivery works. |
Similar to temp tables, the best bet here would be to use Postgres's materialized views. There's a really cool project that uses these to essentially create two states of a database: https://github.com/xataio/pgroll. They do it for migration purposes, but the concept is the same. This would require a fairly deep protocol level change to Kwil, but nothing mind-blowing. We recently re-did how mempool gets applied, and I totally see how we could fit unconfirmed data into it. Agreed on the questions regarding business use case; I know @itscameronlee was mostly concerned with latency for changes to queries resulting from already settled data (this was what Pyth talked about in their whitepaper). This is already handled in the current system, but if there is a need for handling mempool data as well, its certainly doable. |
@rsoury @brennanjl @itscameronlee @MicBun
I was told that you guys already discussed this issue with @itscameronlee. I have attached a document with the notes from this thread. Please add your comments directly in the document (same as with the previous EPIC/Goal).
I ported this question into the document. Let's continue there. |
@zolotokrylin - Please give me Editor access to the Doc 🙏 |
@rsoury done |
Ok @zolotokrylin and @brennanjl, @itscameronlee has provided an update. This means we can dismiss security concerns regarding MEV, fraud proofs, etc. @brennanjl - Is this a Kwil-centric issue, you want to lead - serving mempool data over socket connections? What's your take? |
Very interesting, thank you for sharing. I definitely need to get deeper into what it would take to get full-mempool functionality working, but I am confident that this should be done on the core Kwil protocol. Especially with the way weighting and stream composability is handled in TSN, I cannot imagine a solution that is able to achieve this external to the core Kwil codebase that is not 100x more complex. Is this the current top priority (cc @zolotokrylin)? I can make it a top priority to dive deeper and identify what I believe to be the optimal solution that is quickest to implement, however I have other threads in parallel on finishing procedures, phase 2 of the indexer, and database drop restrictions. I can elevate this to be a top priority if needed, but want to make sure priorities are set properly. |
@brennanjl what's the ETA for each of the "in-process threads" and completion %, and what's the rough estimation for this real-time goal (cost-wise, how long it might take)? |
Apologies. ETA for procedures: Official beta will be cut today. There's likely an extra 3 days of works for bug fixes from the beta, as well as documentation improvements, which will need to take place over the next 2 weeks. ETA for phase 2 of indexer: I think this is a few days of work. One of our engineers is freeing up from another project early next week, and will be able to tackle it. ETA for db drop restrictions: unsure yet, we still need to figure out how to actually make this. ETA for mempool: I'm not sure how long the mempool would take, but I think I should get on a call with Ryan (and ideally Cameron, and optionally yourself as well) to better understand the business case before properly recommending how we can get this done. |
@zolotokrylin - Let's organise a call between us and Brennan to determine how to parse out this issue into problems relevant to Kwil and Truflation respectively. |
@rsoury sure. What are the time slot options? |
@zolotokrylin - Whenever. |
This comment was marked as resolved.
This comment was marked as resolved.
Thank you guys for a call @brennanjl @rsoury @itscameronlee 👍 |
Meeting Notes:
|
I've spent a large chunk of the day digging into this, and have a few things. I've looked at how this could be done in both procedures and actions. Procedures(You can skip reading this section if you want. I merely provided it to explain what I looked at, and why I ultimately came to the solution I did.) It will be quite hard, if even possible, to build real-time data into procedures. The reason we cannot build real-time data directly into procedures is because procedures run 100% within Postgres. The only way to add real-time data to this and have it be 100% compatible (without breaking consensus) would be to rely on Postgres's transaction isolation to add real-time data to queries for the reader connection, while not adding it for the writer connection. Not only would this be an insane hack, but it would also not really fulfill the requirement of being "real-time". Postgres will write the temporary data to disk before reading it. Most operating systems don’t try to make disk I/O real-time, as they don’t try to schedule the I/O to give priority to tasks that have tighter real-time constraints. Furthermore, given the frequency at which we discussed real-time data might be updated (relative to candles), writing to disk would almost certainly not be a good idea. The only way (that I can think of) to handle in-memory data, in a table format, in Postgres, is using temp tables. This, however, would also be a sub-par, since it would still not be able to support the scale you need, and it would also require serious hacking around Kuneiform to get to work. Unfortunately, Postgres just really isn't made for this. It would be both insanely hard to implement, as well as likely would not be as fast as traders would wish. Therefore, procedures seem out of the question for handling real-time data. ActionsWe can use Kwil's extensions to superimpose real-time data on query results. While there would be some challenges with this, it should overall be quite performant and scalable. This would function by having an extension that schemas can import to provide real-time data. The extension itself would maintain a cache of the child streams and the weights, based on the stored schema taxonomies, to query child schemas. This would function similarly to how the current stream composability is performed within actions. The primary difference is that the extension would also keep a thread-safe map of the most recent "real-time" value for any schema. While retrieving data, it would see if it had a more recent "unconfirmed" value for a stream, and if so, return it instead. Data could be written externally to the map by some other process on the TSN node (it would be up to the Truflation team to decide how users would write data to this). I am putting together a basic example that should display what I mean. Your team should be able to build on this with the logic you already have to implement real-time functionality. I open a PR with the example shortly. |
@zolotokrylin - I've established a start the Spec outlining the base architecture that we will take to deliver on this requirement. @outerlook - Once Vadim confirms he's OK with the Spec, we'll need you to essentially work to create the interface spec, and map the architecture in a visual diagram as we did with #137 |
@rsoury thank you. I will review the document and revert back to you. |
@rsoury I used your calendar to book a call @markholdex you were invited as well, please get acquainted with the doc, and let's settle the requirements and pass to development. |
@rsoury did you discuss the current Spec with @brennanjl, and are you guys aligned? |
I believe so, however, I'll ensure Brennan has a review before we settle |
Thank you. |
We are meeting early next week to discuss, I will re-review this weekend. |
A question on requirements: |
@rsoury, I have replied in the Spec document. |
@zolotokrylin - new comment questioning a requirement: https://docs.google.com/document/d/1Z9tJJ5ctLU7REGCh8gBchKBo-V3ADra7k497MZQ17W4/edit?disco=AAABOaBsmFI |
@brennanjl I have replied. |
Update@brennanjl and I are considering an architecture that separates the "real-time service" from the TSN. The services will be distinguished as follows:
By centralising the infrastructure for the real-time service, we aim to achieve near real-time performance while minimising the complexities in Node Operators. Note on terminology: The term "candles" might be misleading. In traditional candlestick charts, a candle represents various underlying values, whereas our "candles" indicate a single value—the median aggregation. We will likely adopt a more appropriate naming convention for "candles", which will be detailed in a second iteration of the Spec. Current StatusBrennan and I are evaluating how to process data external to TSN using Kuneiform's PL/pgSQL. Our options are as follows:
@outerlook - What's your take on the options? Have any input? |
After a discussion with @outerlook - he sparked an idea. @brennanjl - Can we just repurpose the Testing Framework, which computes seed data over a Kwil Procedure in an isolated and centralised manner for delivery of real-time composed streams? ie. Seed data = Real-time "ticks" + Consensus data |
Unfortunately, not that simple. The testing framework has the following flow: "read in test information" -> "parse schemas, generate plpgsql" -> "deploy plpgsql" -> "run test cases" We essentially need to be able to include real-time data functionality (custom TSN functionality) in the plpgsql code. In our current release cycle, we are changing Kuneiform in a way that might make this possible. Currently, we are switching Kuneiform from generating plpgsql to running an interpreter. We were planning on having this be closed off since it is under heavy development, however we could add in the ability to create extensions within the interpreter. You could then run the interpreter with special logic to use in-memory data in place of SQL queries. Still a very rough idea because we have hardly begun work on the interpreter, but it is the only feasible solution I can think of. |
@brennanjl - If there's an expected timeline, we could spec based on this an coordinate with @zolotokrylin to confirm next steps. |
Again, my only other solution was a Kwil Node with Redirect Rules on Primitive Streams (Tables) to pull from a view that has real-time data superimposed. |
We do not have an expected timeline yet. I'm happy to try to shift around internal priorities to make it happen faster though. I can make it the next thing I focus on, currently we are focused on an improved query planner for DOS protection.
I will keep this on the table as an option, since it very well could be what we have to end up doing. My hesitancy here is that Truflation would then need to compile the realtime extension to a Postgres extension, which significantly raises the complexity of running a TSN node and onboarding node operators. |
Agreed. However, Truflation would be the only party performing this additional bootstrapping. @zolotokrylin - What's your take, what's our deadline on this? |
@rsoury, our deadline with this Goal (#199) very much depends on the completion dates of these sequential Goals.
And probably these:
While we are working on and discussing the Specs of #199, I suggest we don't speculate on the deadline of this Goal (#199) as it seems relatively far away, and any prediction of the deadline here will mislead us. 100% focus on the goals listed above based on their sequence in the list, and if there is free time and resources, we continue with this Goal. I.e., if someone doesn't participate anymore in any of the Goals above, please engage in these goals to complete the Specs so that we can understand:
Please get to the first Specs draft document you can confidently present so we can discuss where and how this Goal fits the roadmap. @rsoury, if I need to answer any Specs-related questions, please direct them to me inside the Spec document. You are the chief editor, so feel free to resolve any comments, modify copy, and manage the Spec the way you think is best. |
@zolotokrylin - Nice, got it. There is a missing goal here. ie. TSN Governance to be involved in managing System Streams. @outerlook - Maybe draft this goal up when you have a chance. @brennanjl - I'll leave this in your hands then, as you're best suited to determine whether we should modify Kwil at the data access layer, or whether we should adopt a Kuneiform interpreter. |
Spec
The text was updated successfully, but these errors were encountered: