-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Goal: run load testing #441
Comments
@outerlook @rsoury, please suggest a simple and cost-effective action plan here. |
@brennanjl - Do you have any benchmarking and pen-test tooling that you can provide? - What's your take on the following/ @outerlook - Proposing a simple pen-test suite that:
Determines:
Seed data with mock data using a Faker lib - https://github.com/go-faker/faker Any additional recommendations to determine load is advised. |
Our team just published metrics today here. Keep in mind there is a lot Kwil can still optimize, this is just how it is right now. This should answer:
For max storage limit, it is totally dependent on your schemas. There are some hard postgres limits, but we will much sooner hit slow queries due to data size. For max queries per second, I would consider this a non-issue / essentially infinite. It is very easy to spin up read replicas to read more data, so it is horizontally scalable. Finally, for max composition depth, this is currently a Postgres limit, but I am unsure what it is. In v0.9, we can enforce more controls in this though, due to our interpreter usage, so I am curious if there is an "ideal value" here. Since so much of this is dependent on your specific schema, indexes, and procedures, I would recommend performing benchmarking yourself (particularly for query speed / stream composition for reads). The above Docsend link should provide adequate information for node distribution + its impact on write performance, but it doesn't really cover reads. The reason I mention this is because the TSN specific benchmarks shouldn't need to run more than one node, as it shouldn't;t effect the benchmarking of queries. |
we do have IaaC for single region, could adapt for multi. However:
This makes sense, as querying is dependent only on the queried node right now, correct?
@rsoury Can we use the results of the kwil team for this? I assume so because the bottleneck for the majority of transactions lies in their size due to the consensus state over them rather than the step of committing to the underlying Postgres, which varies per logic. We have simple writes such as |
Assuming infinity for Postgres, for the current implementation, we should try to relate Compose Depth vs. Date Range (or queried data points?) vs. Query Execution Time, as we can arbitrarily increase the query timeout to whatever seems reasonable and make this a limit. Note: the worst-case scenario is when every ancestor stream is private because more operations are performed per stream on a query. |
@brennanjl - Very cool report! Questions:
(Fun theoretical/research) Considering BFT is the limiting factor, I wonder what these numbers would look like where Sei Consensus or Sui is used with pruned features. |
@outerlook - Re
We can push this back. Especially considering we're in complete control of depth, as Truflation is the only analyst/stream-creator. All third-party analysis and stream creation should be performed on a Staging/Testnet -- not on the Production TSN. Only once we've performed our own stress test, and/or used metrics from Staging/Testnet produced by third parties, then can we expose Stream creation to third parties. @zolotokrylin - Please determine course of action based on this advice. |
Side note: Postgres performance can be limited heavily based on bad/complex queries. I’m sure composition depth plays a role in this. |
@rsoury, sorry, what's the question? We should run a typical "read requests" load testing. Let's keep write requests out of this scope. |
@zolotokrylin - Yep, you've basically answered. Question was: Based on the fact that write performance has been established sufficiently, what's the next steps. Note that "read requests" can be load tested, but as per Brennan's comment, they're essentially correlated to the scale of a single instance Postgres DB + Golang App. |
Gateways are same as RPC service providers, right? |
About read load testing as @rsoury said
We already faced some issues with it: In short, we had query timeouts (5 seconds) for fetching yoy values for month ranges, but it worked for a week. Why is it complex?
As a bandaid, our solution was just to increase the timeout limit to 60 seconds. But I think we would benefit from knowing with more precision how depth is influencing the query time Also, I don't think testing this would be too complex if Kwil's testing framework can be a good representation of actual no-roundtrip query times, as it just uses Postgres directly (right, @brennanjl?). In that case, we could get values for:
Getting the responses for:
We could even be better at optimizing our queries if we have this table |
Yes. I agree with @outerlook, it's wise to proceed with composition depth measurements.
Yes, but we cannot optimise queries that external parties deploy. @brennanjl - Is YoY queries slow because it waits for the next block? Or does it use the last block where consensus time is established? Another bandaid solution is stuffing the entire DB in memory purely for faster queries. |
Yes. We haven't measured the performance impact of running different amounts of nodes with proper tests, but CometBFT's tests show this, and we have a lot of anecdotal evidence as well.
Not really. An overview of how it works can be found here, but essentially, there is one tx per validator that votes, and that single tx contains the information for all resolutions they are voting on. However, the data for the event itself only goes through consensus once, since all other nodes voting in favor simply submit a hash of the event data.
I have actually spoken with a lot of the teams building these protocols. I'm pretty keen to try them because it would obviously open up a lot of opportunities for Kwil, but consensus engines aren't something I want to get fancy with (at least yet). They are very hard to get right, and we don't have an assurance that any of these protocols will be successful long term. We have significantly lower dependency risk with Comet, so we have chosen it for now. |
It is slow due to some sort of logic being executed against the DB (all processing happens outside of consensus). There are two areas where we can improve:
A SQL optimizer probably wouldn't help here. I can take a scan at the current queries, but Postgres already has a really good optimizer. I think the issue is more likely the Kuneiform logic, and the relatively restrictive number of things you can do in Kuneiform. This weekend, I will take a look at the procedures to see if there are any basic ways to optimize. I can also explore a code-gen tool for generating optimal stream code. On a slightly longer timeline, Kwil will have more control over Kuneiform execution soon as we move to an interpreter (starting work on it soon). This will make it much easier for us to add features that could allow you to create more optimal streams. |
hey @brennanjl did you check the procedures for ways to optimize? |
@rsoury I know we are still curious about how can we improve query time and what can influence it. But other than that it feels like it's clear to everyone where we can improve and where it wouldn't make sense. Thus, do we need to create a short spec around the |
@markholdex - Sure thing, we can do a small spec on this. I believe @outerlook has essentially outlined what's involved here #441 (comment) However, we can formalise this into a set of experiments to conduct. |
@rsoury can you get started on it and then we all contribute? |
@markholdex - Spec established and added to Github Issue - #441 (comment) @outerlook - Please complete with Action Items more specific to the benchmark requirements outlined |
@markholdex apologies, I have not had a chance to do this yet. I have been really behind, and dropped the ball on it. Getting on it now. |
@brennanjl totally get it. please do let me know when we can expect to get back from you. Thank you! |
@outerlook what is the status here? I can see we still have unresolved Problems. What is the ETA? @MicBun are you able to help here? |
Hi @markholdex, about the unresolved problems: These two do not require a PR, instead, it is done on a server (AWS). I think it is better to skip this improvement as changing the type from text to int, has minimal impact and we need to adjust all of the data types too. The node upgrade is dependent on this PR, which is still in progress. |
Hey, team. Sorry about the delay with #530. I finally could get #534 into review to fix it. It had weird errors that only happened after 2 hours of running tests. I'm rerunning the tests to get the complete results and share them here. The neat results of this goal are that query times were improved a lot, and we can get reliable query times for the specified procedures. And they are also now predictable (if the days/streams queried grow, the query time is pretty much linear) Our load tests are simple and won't show other specs limitations like memory usage, etc., but with the experience from running them, I say we should continue to recommend at least We could test a much smaller number of streams at once than originally spec'ed (800 vs 1M) due to testing constraints, but results are linear enough to use consistent scale here to get query numbers (maybe not the memory constraints) Another limitation uncovered is a maximum of 180 depth for compositions due to call stack size limitation within the default Postgres configuration. We could tweak our servers to support more values or even some configuration implemented network-wise, but I guess it's not worth it right now because we're very far from this depth, as streams usually increase a lot horizontally first (our streams have 4 depth at max) |
Next steps after #534 get merged:
However, some are more async and could move along with other goals. |
results complete markdown can be seen here -> Open important tables that gives the overview (with value in milliseconds): t3.large - get_index_change - Public
t3.medium - get_index_change - Public
t3.small - get_index_change - Public
I believe we can conclude that although we should keep the 60 seconds timeout just in case, our 365 days of yoy query of ~200 streams in CPI us can now be queried under ~2 seconds in There is a more thorough analysis under #534 that proves these values are linear upon seeing private and public results, although private makes them slower, we could see that it is not too significantly slower |
Hey guys, we had some issues with this upgrade. @zolotokrylin @markholdex The TLDR of what I want to know is if we continue to take it until Friday to push it (with a less performant solution that doesn't break) or put it on hold and solve truflation/website#788, which, if I correctly understood, is a business priority. What improves? Query times More details
|
another note: we will be able to use the complete solution only after kwil next version, where |
Hi @outerlook, forget to mention, the contract on our server is already using the
is there any ETA on this? if it most likely be ready on next few days I think it is better to leave the current contract as it is (the most updated and optimized one) |
I think they will start the beta in the next few days; @brennanjl can answer this better. However, there might be some weeks in beta until it's stable, and there may also be time for us to coordinate the upgrade. I don't think it's so immediate
If that's the only thing that changed, then #550 contracts should be the same as what was deployed. Is that correct? |
Yes, it is correct. |
Then, this goal can be considered done after:
I'm running new tests to get the results, but for 4-branch categories, the difference should not be so big, just for flat categories (the cost to add a new stream to a composed stream is bigger)
|
To confirm, beta got shipped today. That means that preview will converge with a production-ready branch in two weeks (assuming no critical bugs), as is documented by our release cycle here. This will include a number of other additions on-top of what is on preview, including peer filtering and RPC security, and better processes for future coordinated upgrades (to make it easier for truflation to make these breaking changes in the future). |
Thank you, Brennan! I'm closing this issue as everything planned is shipped, and everything is normalized again. If anyone thinks something is pending, please reopen! |
@outerlook is this the final report: #441 (comment) ? |
I left a complete report being generated yesterday. It timed out after 6 hours (before the reverts it took 1h30m). I'll leave another one running today with a 48-hour time limit so we know how long the new one will take. I'll leave the goal open until we have it. |
Results took 13 hours this time note that on previous implementation (with pending optimizations) branching factor didn't matter. However for now, it does. That's why I'll also specify by the factor here: Branching factor = 8It's a relatively common factor, 8 children per stream, but only a guess (I think this varies from 2~10) t3.large - get_index_change - Public
t3.medium - get_index_change - Public
t3.small - get_index_change - Private
Branching factor = infinityas if all streams of a category were directly below the root. helps us know the cost of new streams per parent see that numbers grow a lot with new streams t3.large - get_index_change - Public
yes, took 2.3K seconds = 38 min for last one t3.medium - get_index_change - Public
t3.small - get_index_change - Public
(how can small instance perform better than medium in this case? well, as it's relatively close, IDK if it's worth investigate for this discrepancy) Conclusion: The remaining optimization is important, but although the number of flat trees is big, it isn't a critical situation because we currently need categories that are that flat on production. But it should be addressed in the future because it's almost prohibited to have one with these numbers |
@outerlook - Considering linear vertically scaling the Node impacts performance - should be bump primary Node used for handling queries. |
We can sure answer this by testing I have a weak theory that postgres doesn't split the load in threads for our single query, so I don't believe it would bring it to half unless the CPUs themselves are very different here. I do think it could improve a lot for concurrent queries being requested. let's see what happens |
@outerlook when will the results with new machine become available? |
I'll start it in the next minutes. However, if it has no difference we better not commit it (unless we spot a difference here) |
@outerlook any updates here? |
It errored in the upload process after 11 hours. I'll try running again to see if it was a one-time issue. Sorry about the delay, I'm treating this one as a lower priority task, when comparing to recent events
|
now for branching factor = 8 t3.xlarge - get_index_change - Public
branching factor = infinity t3.xlarge - get_index_change - Public
conclusion: faster than the others, as expected, but by small difference in % (~5%) |
TSN Load Testing - Spec
Originally posted by @outerlook in #415 (comment)
cc: @rsouy
Blocker
Issues
t3.micro
instance doesn't work #528Optimization issues
get_index_change
is inefficient #507Problem(contract): parsing timestamps for each row is inefficient #508not planned on this goalTSN-SDK Task
Other issue
The text was updated successfully, but these errors were encountered: