Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Goal: real-time Indexes and Primitives #199

Open
zolotokrylin opened this issue May 7, 2024 · 58 comments
Open

Goal: real-time Indexes and Primitives #199

zolotokrylin opened this issue May 7, 2024 · 58 comments

Comments

@zolotokrylin
Copy link
Contributor

zolotokrylin commented May 7, 2024

Spec

@zolotokrylin
Copy link
Contributor Author

@brennanjl do you have any ideas on how we can achieve this? Did you guys spin-off the doc by any chance?

@MicBun
Copy link
Contributor

MicBun commented May 7, 2024

Hi @zolotokrylin, in tsn-data-provider I provided two ways to trigger the fetch and push mechanism.
Fetch here means getting the records from the Truflation Database, and Push pushes the records to tsn.

  1. The first way, is the cron job that happens daily at 00.00 based on our server time zone.
  2. The second way, is with RPC or HTTP endpoints. The proto file can be seen here: Link

I suggest utilizing the RPC or HTTP endpoints in our Python repos. So, after there is an insert to the Truflation Database, the Python repos also hit the RPC or HTTP endpoints of tsn-data-provider to trigger the Fetch and Push mechanism.

@brennanjl
Copy link
Collaborator

@zolotokrylin it seems like there is not a description for this goal. Could you provide more context?

@rsoury
Copy link

rsoury commented May 8, 2024

I have some ideas. I'll share them soon.

@rsoury
Copy link

rsoury commented May 8, 2024

@brennanjl - This refers to the goal of enabling Kwil data to be served in real-time, rather than in batch.
ie. (Current) Pull model, versus a Push model

A rudimentary approach is for RPC Nodes to facilitate WebSocket connections to clients where specific queries are passed as parameters. The connected Node will poll the query locally on each new block for the duration of the WebSocket connection, and serve the latest data over the connection.
However, this is restricted to ticks based on block time.
Each query should range between the last block time and the current block time.

An alternative approach for near-real-time ticks is to:

  1. Enable Kwil Nodes to execute local computation over mempool data.
  2. Mempool data could potentially be handled using Temporary Tables
  3. Temp Tables will flush after each new block.
  4. Each Temp Table will map to the consensus tables where raw (primitive) data is stored.
  5. If Mempool data is inconsistently distributed across Nodes at the point of observation, Kwil's LS Sidecar (evolved to read/write Kwil as L1) can collect mempool data into the Node where connections are established -- however, I'm quite certain that this may not be necessary.

If mempool data is indeed inconsistently distributed across Nodes, please let me know... Otherwise, I assume each data point is broadcasted, and that mempool is the state between received data, and consensus data.

Furthermore, if Temp Tables are too persistent, database schema relevant to active WebSocket connections can be replicated in real-time to something like https://github.com/electric-sql/pglite, where relevant mempool data can reside for serving data until it is flushed after connection is closed + cooldown period.

PGlite supports the pl/pgsql procedural language extension, this is included and enabled by default.

Near real-time feeds must be optimistic to be fast.
To align incentives, clients can participate in fishing for fraud proofs.
Fraud Proof = Kwil A served X data in real-time, however, post consensus, Kwil B and C served Y data for the given query.
If Kwil is in PoS or PoSA, clients can prove that real-time mempool data does not match consensus data after block time, then the fraudulent RPC Node can be slashed and incentives can ensure correct behaviour.

A security vulnerability: MEV - Bad actors can participate in the Network to serve bad data specifically for DeFi protocols powered by Truflation, where tampered data can yield a positive value extraction that outweighs the slash/loss as a Kwil Operator.

Side note: All of these suggestions can be incrementally incorporated.

PS. Happy to workshop this over a call.

@rsoury
Copy link

rsoury commented May 8, 2024

A very good question @zolotokrylin for this topic is: What's the business case?

High-frequency ticks look good, but DeFi protocols are generally restricted regardless of the transaction speed of the destination blockchain. Furthermore, Oracles are generally pull-based.

If this is for Web2 users... then MEV isn't an issue (but should be publicly documented) and optimistic data delivery works.

@brennanjl
Copy link
Collaborator

Similar to temp tables, the best bet here would be to use Postgres's materialized views. There's a really cool project that uses these to essentially create two states of a database: https://github.com/xataio/pgroll. They do it for migration purposes, but the concept is the same.

This would require a fairly deep protocol level change to Kwil, but nothing mind-blowing. We recently re-did how mempool gets applied, and I totally see how we could fit unconfirmed data into it.

Agreed on the questions regarding business use case; I know @itscameronlee was mostly concerned with latency for changes to queries resulting from already settled data (this was what Pyth talked about in their whitepaper). This is already handled in the current system, but if there is a need for handling mempool data as well, its certainly doable.

@zolotokrylin
Copy link
Contributor Author

zolotokrylin commented May 11, 2024

@rsoury @brennanjl @itscameronlee @MicBun

@zolotokrylin it seems like there is not a description for this goal. Could you provide more context?

I was told that you guys already discussed this issue with @itscameronlee.
But fair enough, every Goal must have a spec document 👍

I have attached a document with the notes from this thread. Please add your comments directly in the document (same as with the previous EPIC/Goal).

A very good question @zolotokrylin for this topic is: What's the business case?

I ported this question into the document. Let's continue there.

@rsoury
Copy link

rsoury commented May 14, 2024

@zolotokrylin - Please give me Editor access to the Doc 🙏

@zolotokrylin
Copy link
Contributor Author

@rsoury done
image

@rsoury
Copy link

rsoury commented May 21, 2024

Ok @zolotokrylin and @brennanjl,

@itscameronlee has provided an update.
The primary business case is serving sub-second data to Web2 clients.
Consider a UX like a Bloomberg Terminal with high tick speeds, where data is served by Truflation's TSN Node.

This means we can dismiss security concerns regarding MEV, fraud proofs, etc.

@brennanjl - Is this a Kwil-centric issue, you want to lead - serving mempool data over socket connections?
If you think that reworking how mempool data is applied, is a separate concern to enabling a TSN Node to serve websocket connections, then we separate responsibilities here.

What's your take?

@brennanjl
Copy link
Collaborator

Very interesting, thank you for sharing.

I definitely need to get deeper into what it would take to get full-mempool functionality working, but I am confident that this should be done on the core Kwil protocol. Especially with the way weighting and stream composability is handled in TSN, I cannot imagine a solution that is able to achieve this external to the core Kwil codebase that is not 100x more complex.

Is this the current top priority (cc @zolotokrylin)? I can make it a top priority to dive deeper and identify what I believe to be the optimal solution that is quickest to implement, however I have other threads in parallel on finishing procedures, phase 2 of the indexer, and database drop restrictions. I can elevate this to be a top priority if needed, but want to make sure priorities are set properly.

@zolotokrylin
Copy link
Contributor Author

zolotokrylin commented May 22, 2024

@brennanjl what's the ETA for each of the "in-process threads" and completion %, and what's the rough estimation for this real-time goal (cost-wise, how long it might take)?

@zolotokrylin
Copy link
Contributor Author

@brennanjl

@brennanjl
Copy link
Collaborator

Apologies.

ETA for procedures: Official beta will be cut today. There's likely an extra 3 days of works for bug fixes from the beta, as well as documentation improvements, which will need to take place over the next 2 weeks.

ETA for phase 2 of indexer: I think this is a few days of work. One of our engineers is freeing up from another project early next week, and will be able to tackle it.

ETA for db drop restrictions: unsure yet, we still need to figure out how to actually make this.

ETA for mempool: I'm not sure how long the mempool would take, but I think I should get on a call with Ryan (and ideally Cameron, and optionally yourself as well) to better understand the business case before properly recommending how we can get this done.

@rsoury
Copy link

rsoury commented May 27, 2024

@zolotokrylin - Let's organise a call between us and Brennan to determine how to parse out this issue into problems relevant to Kwil and Truflation respectively.

@zolotokrylin
Copy link
Contributor Author

@rsoury sure. What are the time slot options?
@markholdex please also join the call.

@rsoury
Copy link

rsoury commented May 27, 2024

@zolotokrylin - Whenever.
I believe you already have our Calendar links - so just pen in a time that's available between Brennan and I 🙏

@zolotokrylin

This comment was marked as resolved.

@zolotokrylin
Copy link
Contributor Author

Thank you guys for a call @brennanjl @rsoury @itscameronlee 👍
The ball is on Brennan's side to provide us with the proposal on architecture and how it might be implemented in the best way.

@rsoury
Copy link

rsoury commented May 30, 2024

Meeting Notes:

  1. "Unconfirmed" real-time data to be made available for users that do not require strong consensus - ie. High-frequency traders.
  2. "Unconfirmed" data is based on mempool (pre-consensus).
  3. This allows Data Providers to continue pushing data to TSN as usual - unifies the interface.
  4. Data disputes/resolutions are evaluated through consensus - which is bound to block time. Observed as "confirmed" data.
  5. Due to repurposing consensus and mempool, "Unconfirmed" data that mismatches from "confirmed" data will result in penalisation by default. It's either a bad acting Node Operator that takes a loss to stake or authority, or a Data Provider who suffers reputational (and financial) penalisation.
  6. A single TSN Node is tasked with managing websocket connections, and executing compute over its in-memory "unconfirmed" data

@brennanjl
Copy link
Collaborator

I've spent a large chunk of the day digging into this, and have a few things. I've looked at how this could be done in both procedures and actions.

Procedures

(You can skip reading this section if you want. I merely provided it to explain what I looked at, and why I ultimately came to the solution I did.)

It will be quite hard, if even possible, to build real-time data into procedures.

The reason we cannot build real-time data directly into procedures is because procedures run 100% within Postgres. The only way to add real-time data to this and have it be 100% compatible (without breaking consensus) would be to rely on Postgres's transaction isolation to add real-time data to queries for the reader connection, while not adding it for the writer connection. Not only would this be an insane hack, but it would also not really fulfill the requirement of being "real-time". Postgres will write the temporary data to disk before reading it. Most operating systems don’t try to make disk I/O real-time, as they don’t try to schedule the I/O to give priority to tasks that have tighter real-time constraints. Furthermore, given the frequency at which we discussed real-time data might be updated (relative to candles), writing to disk would almost certainly not be a good idea.

The only way (that I can think of) to handle in-memory data, in a table format, in Postgres, is using temp tables. This, however, would also be a sub-par, since it would still not be able to support the scale you need, and it would also require serious hacking around Kuneiform to get to work.

Unfortunately, Postgres just really isn't made for this. It would be both insanely hard to implement, as well as likely would not be as fast as traders would wish. Therefore, procedures seem out of the question for handling real-time data.

Actions

We can use Kwil's extensions to superimpose real-time data on query results. While there would be some challenges with this, it should overall be quite performant and scalable.

This would function by having an extension that schemas can import to provide real-time data. The extension itself would maintain a cache of the child streams and the weights, based on the stored schema taxonomies, to query child schemas. This would function similarly to how the current stream composability is performed within actions. The primary difference is that the extension would also keep a thread-safe map of the most recent "real-time" value for any schema. While retrieving data, it would see if it had a more recent "unconfirmed" value for a stream, and if so, return it instead. Data could be written externally to the map by some other process on the TSN node (it would be up to the Truflation team to decide how users would write data to this).

I am putting together a basic example that should display what I mean. Your team should be able to build on this with the logic you already have to implement real-time functionality. I open a PR with the example shortly.

@rsoury
Copy link

rsoury commented Jun 22, 2024

@zolotokrylin - I've established a start the Spec outlining the base architecture that we will take to deliver on this requirement.
I've additionally noted considerations for how this scales, as the initial phase will be basically centralised, but built atop the TSN Node.

@outerlook - Once Vadim confirms he's OK with the Spec, we'll need you to essentially work to create the interface spec, and map the architecture in a visual diagram as we did with #137

@zolotokrylin
Copy link
Contributor Author

@rsoury thank you. I will review the document and revert back to you.
As of now, the Billing Goal has the priority:

@zolotokrylin zolotokrylin self-assigned this Jun 24, 2024
@zolotokrylin
Copy link
Contributor Author

@rsoury I used your calendar to book a call
image

@markholdex you were invited as well, please get acquainted with the doc, and let's settle the requirements and pass to development.

@zolotokrylin
Copy link
Contributor Author

@rsoury did you discuss the current Spec with @brennanjl, and are you guys aligned?

@rsoury
Copy link

rsoury commented Jul 3, 2024

I believe so, however, I'll ensure Brennan has a review before we settle

@zolotokrylin
Copy link
Contributor Author

@rsoury

  1. could you please help review it with @brennanjl so we can have a final version of the Spec?
  2. I will discuss the priority of this goal with stakeholders once the Specs are ready. The qualities of this feature and the cost of implementation are the main factors when it comes to deciding on the priority.

Thank you.

@brennanjl
Copy link
Collaborator

We are meeting early next week to discuss, I will re-review this weekend.

@rsoury
Copy link

rsoury commented Jul 12, 2024

@zolotokrylin -

A question on requirements:
Is access to historical data necessary for real-time data?
We could make things leaner if it is not.

@zolotokrylin
Copy link
Contributor Author

@rsoury, I have replied in the Spec document.
Please address your following questions in the relevant section of the specification document.

@rsoury
Copy link

rsoury commented Jul 17, 2024

@zolotokrylin - new comment questioning a requirement: https://docs.google.com/document/d/1Z9tJJ5ctLU7REGCh8gBchKBo-V3ADra7k497MZQ17W4/edit?disco=AAABOaBsmFI

@zolotokrylin
Copy link
Contributor Author

@brennanjl I have replied.

@rsoury
Copy link

rsoury commented Jul 22, 2024

Update

@brennanjl and I are considering an architecture that separates the "real-time service" from the TSN.
Instead of embedding the real-time service directly into TSN, it will rely on TSN as a source of truth for authenticating Primitive Streams and correlating "ticks" to "candles," with "candles" representing the aggregate of "ticks" submitted as transactions to TSN.

The services will be distinguished as follows:

  1. "Real-time service"
  2. "Consensus service"

By centralising the infrastructure for the real-time service, we aim to achieve near real-time performance while minimising the complexities in Node Operators.

Note on terminology: The term "candles" might be misleading. In traditional candlestick charts, a candle represents various underlying values, whereas our "candles" indicate a single value—the median aggregation. We will likely adopt a more appropriate naming convention for "candles", which will be detailed in a second iteration of the Spec.

Current Status

Brennan and I are evaluating how to process data external to TSN using Kuneiform's PL/pgSQL.
This is crucial for the real-time service when serving data relative to a Compose Stream.
A Primitive Stream collects "ticks" (which are external to TSN), but the logic to compose the "ticks" is dynamic and deployed to TSN. The challenge is to use this dynamic Kuneiform / PL/pgSQL logic to compose Primitive Stream "ticks" as they are delivered to subscribers of the relevant composed stream.

Our options are as follows:

  1. A standalone Kuneiform interpreter may be necessary.
  2. A TSN Node operating as a dependency in this architecture to block sync with the network and hydrate with the network's contracts. This setup would include an interface for injecting third-party data into this Node's data sourcing mechanism, allowing it to read the union of consensus data and real-time data.

@outerlook - What's your take on the options? Have any input?

@rsoury
Copy link

rsoury commented Jul 22, 2024

After a discussion with @outerlook - he sparked an idea.

@brennanjl - Can we just repurpose the Testing Framework, which computes seed data over a Kwil Procedure in an isolated and centralised manner for delivery of real-time composed streams?

ie. Seed data = Real-time "ticks" + Consensus data

@brennanjl
Copy link
Collaborator

Can we just repurpose the Testing Framework, which computes seed data over a Kwil Procedure in an isolated and centralised manner for delivery of real-time composed streams?

Unfortunately, not that simple. The testing framework has the following flow: "read in test information" -> "parse schemas, generate plpgsql" -> "deploy plpgsql" -> "run test cases"

We essentially need to be able to include real-time data functionality (custom TSN functionality) in the plpgsql code. In our current release cycle, we are changing Kuneiform in a way that might make this possible.

Currently, we are switching Kuneiform from generating plpgsql to running an interpreter. We were planning on having this be closed off since it is under heavy development, however we could add in the ability to create extensions within the interpreter. You could then run the interpreter with special logic to use in-memory data in place of SQL queries.

Still a very rough idea because we have hardly begun work on the interpreter, but it is the only feasible solution I can think of.

@rsoury
Copy link

rsoury commented Jul 22, 2024

@brennanjl - If there's an expected timeline, we could spec based on this an coordinate with @zolotokrylin to confirm next steps.

@rsoury
Copy link

rsoury commented Jul 22, 2024

Again, my only other solution was a Kwil Node with Redirect Rules on Primitive Streams (Tables) to pull from a view that has real-time data superimposed.

@brennanjl
Copy link
Collaborator

If there's an expected timeline, we could spec based on this an coordinate with @zolotokrylin to confirm next steps.

We do not have an expected timeline yet. I'm happy to try to shift around internal priorities to make it happen faster though. I can make it the next thing I focus on, currently we are focused on an improved query planner for DOS protection.

Again, my only other solution was a Kwil Node with Redirect Rules on Primitive Streams (Tables) to pull from a view that has real-time data superimposed.

I will keep this on the table as an option, since it very well could be what we have to end up doing. My hesitancy here is that Truflation would then need to compile the realtime extension to a Postgres extension, which significantly raises the complexity of running a TSN node and onboarding node operators.

@rsoury
Copy link

rsoury commented Jul 23, 2024

My hesitancy here is that Truflation would then need to compile the realtime extension to a Postgres extension, which significantly raises the complexity of running a TSN node and onboarding node operators.

Agreed.

However, Truflation would be the only party performing this additional bootstrapping.

@zolotokrylin - What's your take, what's our deadline on this?

@zolotokrylin
Copy link
Contributor Author

@rsoury, our deadline with this Goal (#199) very much depends on the completion dates of these sequential Goals.
We defiantly need to complete these Goals first:

And probably these:

While we are working on and discussing the Specs of #199, I suggest we don't speculate on the deadline of this Goal (#199) as it seems relatively far away, and any prediction of the deadline here will mislead us.


100% focus on the goals listed above based on their sequence in the list, and if there is free time and resources, we continue with this Goal. I.e., if someone doesn't participate anymore in any of the Goals above, please engage in these goals to complete the Specs so that we can understand:

Please get to the first Specs draft document you can confidently present so we can discuss where and how this Goal fits the roadmap.


@rsoury, if I need to answer any Specs-related questions, please direct them to me inside the Spec document. You are the chief editor, so feel free to resolve any comments, modify copy, and manage the Spec the way you think is best.

@rsoury
Copy link

rsoury commented Jul 23, 2024

@zolotokrylin - Nice, got it.

There is a missing goal here. ie. TSN Governance to be involved in managing System Streams. @outerlook - Maybe draft this goal up when you have a chance.

@brennanjl - I'll leave this in your hands then, as you're best suited to determine whether we should modify Kwil at the data access layer, or whether we should adopt a Kuneiform interpreter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants