Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Report: vttablet container OOMKilled at huge concurrent select query with consolidator #17243

Open
jwangace opened this issue Nov 16, 2024 · 3 comments

Comments

@jwangace
Copy link
Contributor

jwangace commented Nov 16, 2024

Overview of the Issue

This is reproduce-able on k8s deployments, where runs vitess-operator with vitess v16.
When consolidator been enabled, run select query concurrently at large scale through vtgate, and vttablet container get OOMKilled.

Consolidated Query Wait Count (vttablet_waits_count)
Screenshot 2024-11-16 at 7 59 54 AM

OOMKilled Metrics
Screenshot 2024-11-16 at 7 58 15 AM

Reproduction Steps

To easier reproduce this issue, you can:

  1. set relatively small memory for vttablet container (limit at 1Gi for example)
  2. craft a select query, and make the size of returned relatively large (5Mi for example)
  3. run above select query concurrently at large scale through vtgate (10,000 queries for example)
  4. observe vttablet OOMKilled

Binary Version

Vitess 16 and after versions.

/vt/bin$ ./vtgate --version
Version: 16.0.3-SNAPSHOT (Git revision 4335eaf8ce3fa328aacd36e66f4776bd5208c7c8 branch 'v16-hc-demonware') built on Tue Dec 12 18:02:03 UTC 2023 by vitess@buildkitsandbox using go1.20.5 linux/amd64

/vt/bin$ ./vttablet --version
Version: 16.0.3-SNAPSHOT (Git revision 4335eaf8ce3fa328aacd36e66f4776bd5208c7c8 branch 'v16-hc-demonware') built on Tue Dec 12 18:02:03 UTC 2023 by vitess@buildkitsandbox using go1.20.5 linux/amd64

Operating System and Environment details

kubernetes version: v1.27.11

Log Fragments

OOMKilled happens very quick before any log can be outputted.
@shlomi-noach
Copy link
Contributor

shlomi-noach commented Nov 17, 2024

where runs vitess-operator with vitess v16.

@jwangace thank you for the report! Seeing that v16 is unsupported, could you please clarify whether the bug still appeas on supported versions (v19, v20, v21 at this time)?

@shlomi-noach shlomi-noach added Component: VTTablet and removed Needs Triage This issue needs to be correctly labelled and triaged labels Nov 17, 2024
@jwangace
Copy link
Contributor Author

jwangace commented Nov 17, 2024

Hi @shlomi-noach as you might have noticed, I also put a fix proposal PR in the latest code, unfortunately because we don't have any v22 deployments so I did not reproduce that on v22, however I cross compared related function (in which I proposed to update execSelect) and I believe this bug should present up to the current.

Do you think this is something PlantScale can verify by following Reproduction Steps?

@shlomi-noach
Copy link
Contributor

@jwangace thank you, let us take a look!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants