Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EPIC] Performance: SQLite backed cache #8527

Open
17 of 53 tasks
richard-cox opened this issue Mar 27, 2023 · 7 comments
Open
17 of 53 tasks

[EPIC] Performance: SQLite backed cache #8527

richard-cox opened this issue Mar 27, 2023 · 7 comments

Comments

@richard-cox
Copy link
Member

richard-cox commented Mar 27, 2023

2.9

2.11 (WIP)

2.12 (wip)

Backend Issues (not complete list)

Related

This issue requires a QA template.

@richard-cox
Copy link
Member Author

richard-cox commented Mar 27, 2023

Some of the backend changes have already been implemented in 2.7.2

Anything marked as Development note are implementation details for this issue and are candidates to spin out

Some are incoming in 2.7 Q2

Some that we would like for 2.7 Q2 but are not blocking (in order of importance)

In addition we need to consider more advanced ways to search

  • Sort ip addresses in a more numerical way (need to also consider ip6)
    • e.g. xx.xx.17.196, xx.xx.17.2, xx.xx.17.204, xx.xx.17.24, xx.xx.17.243 and should be sorted numerically-octet-wise.
  • Sort by duration
    • e.g. 7d4h, 6m56s, 23h should be 6m56s, 23h, 7d4h
  • There's some more examples over at Wrong sorting of various views and columns in Rancher UI #9782, but the remaining ones are generally around properties that have been computed client side and therefore can't be supported using server side pagination

@richard-cox
Copy link
Member Author

richard-cox commented Apr 17, 2023

The backend will not support sorting/filtering on 'calculated' fields e.g. current model getters. For the first iteration we should concentrate on badly performing lists that don't need them, this does include all workload types and secrets.

Possible Flow

  • user changes pagination based setting
  • list persists them in the cluster store
  • list shows loading / in progress indicator?
    • tracked in component or store? what happens if long running http request, user leaves, user returns?
  • list triggers a cluster/request (via resource-fetch)
  • cluster/request uses store settings to make http request (ignores values in vuex store)
    • this includes namespace/project filters
  • list receives page
  • list cancels loading / in progress indicator

Question

  • Do we reset these when a user returns to a previous list? What happens if the page has changed and now has no entries?
  • How do incremental loading and manual refresh tie in?
  • How do we work with / around lack of restriction to subscriptions (i.e. we still need to subscribe for all resource change events)
    • What happens if a resources changes that would alter page?

@clayrisser
Copy link

clayrisser commented Apr 18, 2024

I wanted to say this 2 years ago, but I decided to wait and give Rancher time to improve. I can't help but notice this issue is a year old. Still, in 2024 Rancher's performance makes the product completely unusable, while alternative tools are fast and snappy.

I quit using Rancher because it's soooo slow. It's hogging 4 GB memory and 4 CPUs on a brand new cluster. It's been doing this ever since rancher 2.6 and there seems to be no improvements.

I don't understand how alternative tools can be so fast, while the rancher dashboard just freezes all the time because the rancher server CPU and Memory get maxed out. Makes no sense to me.

1 user, 1 cluster, 4 gb ram, 4 cpus and it's still not enough, and it's been this way for years. It's the one issue that's forced me to look for alternatives.

The rancher product feature set is great, but I really wish I could say the same about its performance.

@richard-cox
Copy link
Member Author

richard-cox commented Apr 19, 2024

@clayrisser This is a year old, and I can empathise with the frustration you've experienced. We are very actively working on a solution though (see linked issues and their prs for UI progress). Unfortunately API side hit a road block which has delayed things a bit. I'm happy to say though that we are targeting a tech preview of the feature in 2.9.0.

In addition 2.9.0 contains many UI performance improvements that assist both memory and CPU usage. It's worth giving that a try when released.

@clayrisser
Copy link

clayrisser commented Apr 19, 2024

I will try again with 2.9.0, but if it requires running a large server, (more than 4 CPU) it's just not worth it.

I'm going to assume rancher works well for very large deployments that have resources to throw at it, but small deployments can't justify too many resources for it.

@chad-barensfeld-exa
Copy link

Please add pagination to the "new" UI. It's unusable when importing hundreds of clusters. We have to use the old UI which still has pagination. /g/clusters

@richard-cox
Copy link
Member Author

@chad-barensfeld-exa Server-side pagination is a large project that we are currently targeting in and enabled by default in 2.11. In parallel we have been making a number of other performance improvements which have received a lot of positive feedback from the community and SUSE customers, including those using Ranchers with a large number of clusters.

Unfortunately the old UI is not supported and will soon be unavailable. It also did not use server-side pagination (instead normally only showed a very restricted set of resources).

To help us with your specific performance issue would you be able to open a new github issue and provide as much detail as possible on what and where the performance is problematic, such as

  • Does the blue spinner when reloading a page take very long time, or never stops? Does that happen always, or only some times?
  • Does a page load slowly when the user first navigates to the UI?
  • Does it take a long time to go from page to page?
  • Does a list of resources take a long time to show any rows?
  • Does the UI become unresponsive, clicks don't do anything?

Add concrete examples, eg.

  • "Viewing a cluster's config from the Cluster Management page was just sitting on Loading"
  • "Closing Rancher and opening a new Rancher instance in Chrome showed the blue spinning circle"
  • "After reloading the page (CTRL+R) in about 3 minutes the UI would either eventually load or show a Fail Whale error (~50% of cases)"

If there's lots of places where performance isn't great, focus on one process they found most likely to see issues, or where the issues are worst.

Do all network requests take a long time to complete, or is there one or two in particular that take time?

@nwmac nwmac modified the milestones: v2.12.0, v2.11.0 Nov 1, 2024
@moio moio changed the title Performance: Server-side pagination Performance: SQLite backed cache Dec 2, 2024
@moio moio changed the title Performance: SQLite backed cache [EPIC] Performance: SQLite backed cache Dec 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants