-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ADBDEV-6156 Count startup memory of each process when using resource groups #1023
base: adb-6.x-dev
Are you sure you want to change the base?
Conversation
Failed job Deploy multiarch Dockerimages: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1807317 |
Allure report https://allure.adsw.io/launch/78383 |
Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1807327 |
Failed job Resource group isolation tests on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1807328 |
Allure report https://allure.adsw.io/launch/78465 |
Failed job Regression tests with ORCA on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1817041 |
Failed job Regression tests with Postgres on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1817039 |
Failed job Resource group isolation tests on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1817048 |
Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1818658 |
Failed job Resource group isolation tests on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1818659 |
Can you write some tests to check? |
Failed job Build for x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1850837 |
DROP | ||
|
||
-- start_ignore | ||
! gpstop -rai; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be add
! gpconfig -r gp_resource_manager;
before this line?
|
Failed job Build for x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1880757 |
Failed job Build for x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1890216 |
Allure report https://allure.adsw.io/launch/79897 |
Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1891649 |
Allure report https://allure.adsw.io/launch/80245 |
Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1911167 |
Failed job Build ubuntu22 for x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1924163 |
Failed job Build for x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1924162 |
Allure report https://allure.adsw.io/launch/80289 |
Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1926814 |
Allure report https://allure.adsw.io/launch/80301 |
Allure report https://allure.adsw.io/launch/80692 |
Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1979177 |
Allure report https://allure.adsw.io/launch/83439 |
The goal of the patch is to make resource groups manager to track startup memory of each process so that the redzone handler would estimate memory more accurately, but the context needs to be brought. Startup memory is considered correctly by the VMEM tracker, but it's not accounted by the resource groups manager because on a process' startup this memory is not added to self->memUsage. This happened because the startup memory is considered on VMEM tracker initialization, but the ways VMEM tracker and resource groups count memory are not consistent with each other. This patch adds startup memory consumption to self->memUsage to make resource groups consider this memory. However, one should note that once a process is detached from a resource group (it finished query execution), resource groups memory manager doesn't see this memory again. This happens due to resource groups' design. We could've moved inactive processes' memory to freeChunks, but that would drain freeChunks pretty fast and cause new processes to fail because any process starts without a resource group and acquires it later, thus it must add it's startup memory to freeChunks at first. If freeChunks is already exhausted, a new process fails to start. Also, there are other complexities in making resource groups manager to fully consider startup memory. Thankfully, we have the VMEM tracker which calculates it correctly, so if the resource groups manager fails to track it correctly, we can still rely on it. It's impossible to solve this problem in this patch because it's a design flaw rather than a bug.
Allure report https://allure.adsw.io/launch/83499 |
Allure report https://allure.adsw.io/launch/83582 |
Allure report https://allure.adsw.io/launch/83693 |
Allure report https://allure.adsw.io/launch/83850 |
-- Start a session which will be detached from a group when the query is done | ||
-- resource groups can't see startup chunks occupied by a detached session but | ||
-- the vmem tracker can | ||
0: SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE pid != (SELECT pg_backend_pid()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this query required? What does exactly this test check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test checks that the vmem tracker can see startup chunks of those processes which are not attached to any resource group. It starts the 0 session which terminates all other processes which can interfere this test's results and then creates a table just to make sure that this session is spawned on all segments. This session is then detached from a resource group and is tracked by vmem only.
Other sessions start transactions so that they can be attached to their resource groups. They allocate all available memory to a resource group. The last allocation should fail because of the vmem tracker, so we could make sure that the vmem tracker actually tracks the memory of those sessions which are not attached to a resource group
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the expected behavior of this test before the patch?
Maybe add |
Yes, this is pretty strange, so I would agree with you. However, I'm not sure about ignoring .sql since they are actually tracked, but some are not, as far as I remember |
Allure report https://allure.adsw.io/launch/83966 |
Tests which have input files in |
@bandetto I know that, but there are some tests which are generated from .source file but the respective .sql is still tracked. Though I agree with you that if a test is generated then .sql should be ignored |
In order to run tests, we need the old version of resGroupPalloc, however, to keep old tests intact, we need to add it under a new name, because the old logic of resGroupPalloc isn't compatible with the new behaviour
Allure report https://allure.adsw.io/launch/84568 |
Allure report https://allure.adsw.io/launch/84658 |
Count startup memory of each process when using resource groups
The goal of the patch is to make resource groups manager to track startup
memory of each process so that the redzone handler would estimate memory more
accurately, but the context needs to be brought.
Startup memory is considered correctly by the VMEM tracker, but it's not
accounted by the resource groups manager because on a process' startup this
memory is not added to self->memUsage. This happened because the startup memory
is considered on VMEM tracker initialization, but the ways VMEM tracker and
resource groups count memory are not consistent with each other.
This patch adds startup memory consumption to self->memUsage to make resource
groups consider this memory.
However, one should note that once a process is detached from a resource group
(it finished query execution), resource groups memory manager doesn't see this
memory again. This happens due to resource groups' design. We could've moved
inactive processes' memory to freeChunks, but that would drain freeChunks pretty
fast and cause new processes to fail because any process starts without a
resource group and acquires it later, thus it must add it's startup memory to
freeChunks at first. If freeChunks is already exhausted, a new process fails to
start. Also, there are other complexities in making resource groups manager to
fully consider startup memory. Thankfully, we have the VMEM tracker which
calculates it correctly, so if the resource groups manager fails to track it
correctly, we can still rely on it. It's impossible to solve this problem in
this patch because it's a design flaw rather than a bug.