-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ADBDEV-6156 Count startup memory of each process when using resource groups #1023
base: adb-6.x-dev
Are you sure you want to change the base?
Conversation
Failed job Deploy multiarch Dockerimages: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1807317 |
Allure report https://allure.adsw.io/launch/78383 |
Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1807327 |
Failed job Resource group isolation tests on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1807328 |
Allure report https://allure.adsw.io/launch/78465 |
Failed job Regression tests with ORCA on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1817041 |
Failed job Regression tests with Postgres on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1817039 |
Failed job Resource group isolation tests on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1817048 |
Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1818658 |
Failed job Resource group isolation tests on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1818659 |
Can you write some tests to check? |
Failed job Build for x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1850837 |
DROP | ||
|
||
-- start_ignore | ||
! gpstop -rai; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be add
! gpconfig -r gp_resource_manager;
before this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done in disable_resgroup
test
|
Failed job Build for x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1880757 |
Failed job Build for x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1890216 |
Allure report https://allure.adsw.io/launch/79897 |
Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1891649 |
Allure report https://allure.adsw.io/launch/80245 |
Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1911167 |
Failed job Build ubuntu22 for x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1924163 |
Failed job Build for x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1924162 |
Allure report https://allure.adsw.io/launch/80289 |
Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1926814 |
Allure report https://allure.adsw.io/launch/80301 |
Allure report https://allure.adsw.io/launch/86597 |
Allure report https://allure.adsw.io/launch/86650 |
4a4a3c2
to
524afa5
Compare
Allure report https://allure.adsw.io/launch/90016 |
Allure report https://allure.adsw.io/launch/90168 |
-- The runaway detector test. A query with a large number of slices should | ||
-- be terminated due to high memory consumption. | ||
select count(*) from t1 a1 join t1 a2 using(a) join t1 a3 using(a) join t1 a4 using(a) join t1 a5 using(a) join t1 a6 using(a) join t1 a7 using(a) join t1 a8 using(a) join t1 a9 using(a) join t1 a10 using(a); | ||
ERROR: Canceling query because of high VMEM usage. current group id is 712716, group memory usage 133 MB, group shared memory quota is 102 MB, slot memory quota is 0 MB, global freechunks memory is 277 MB, global safe memory threshold is 277 MB (runaway_cleaner.c:197) (seg1 slice10 172.18.0.3:6003 pid=88018) (runaway_cleaner.c:197) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rewrite please the test so that the error about memory consumption before the patch and after the patch occurs for different reasons
Allure report https://allure.adsw.io/launch/90809 |
Allure report https://allure.adsw.io/launch/90926 |
src/test/regress/regress_gp.c
Outdated
if (startUpMbRemains > 0) | ||
{ | ||
size = Max(0, size - startUpMbRemains); | ||
startUpMbRemains = Max(0, startUpMbRemains - size); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current code calls MemoryContextAlloc when size is 0.
When the input is startUpMbRemains = 12 and size = 4, the output is size = max(0, 4-12) = 0 and startUpMbRemains = Max(0,12-0) = 12. But the output startUpMbRemains should be 8
if (startUpMbRemains > 0) | |
{ | |
size = Max(0, size - startUpMbRemains); | |
startUpMbRemains = Max(0, startUpMbRemains - size); | |
} | |
if (startUpMbRemains >= size) | |
{ | |
startUpMbRemains -= size; | |
PG_RETURN_INT32(0); | |
} | |
size -= startUpMbRemains; | |
startUpMbRemains = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
"but after the backend is assigned a resource group, this memory is not counted as consumed by the group." - it has been fixed, so let's write in past tense |
* startup memory consumpion, but let it be just for symmetry. | ||
*/ | ||
void | ||
ResGroupProcSubStartupChunks(int32 chunks) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about removing this function and calling ResGroupProcAddStartupChunks(-startupChunks)
instead of ResGroupProcSubStartupChunks(startupChunks)
or removing the chunks argument and using VmemTracker_GetStartupChunks()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed.
@@ -11,6 +11,10 @@ | |||
-- | |||
-- end_matchsubs | |||
|
|||
-- start_ignore | |||
! gpstop -rai; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this line is added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted
@@ -135,7 +139,7 @@ SELECT num_running FROM gp_toolkit.gp_resgroup_status WHERE rsgname='rg_move_que | |||
1&: SELECT pg_sleep(3); | |||
2: SET ROLE role_move_query_mem_small; | |||
2: BEGIN; | |||
2: SELECT hold_memory_by_percent_on_qe(1,0.1); | |||
2: SELECT hold_memory_by_percent_on_qe(1,0.2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why 0.1 is replaced with 0.2 here and below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted
ALTER RESOURCE GROUP admin_group SET memory_shared_quota 0; | ||
ALTER RESOURCE GROUP default_group SET memory_shared_quota 0; | ||
|
||
create resource group rg1 with (cpu_rate_limit=20, memory_limit=15, memory_shared_quota=100, memory_spill_ratio=0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need memory_spill_ratio?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
ALTER RESOURCE GROUP admin_group SET memory_shared_quota 0; | ||
ALTER RESOURCE GROUP default_group SET memory_shared_quota 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to change memory_shared_quota?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
Allure report https://allure.adsw.io/launch/91028 |
Allure report https://allure.adsw.io/launch/91031 |
Allure report https://allure.adsw.io/launch/91084 |
Count the startup memory of each active process when using resource groups
Make the resource manager track the startup memory of each active backend so
that the runaway detector would estimate memory more accurately.
The startup memory is the memory that the backend consumes after startup before
the memory managers (Vmem tracker and resource groups) are initialized. The Vmem
tracker counts this memory as consumed by the segment, but after the backend was
assigned a resource group, this memory was not counted as consumed by the group.
This patch adds startup memory consumption to self->memUsage to make resource
groups consider this memory.
Additionally, this patch slightly modifies the resGroupPalloc function so that
it takes startup memory into account. This is necessary to avoid changing or
complicating the logic of existing tests.
It is worth noting that this patch fixes the accounting of memory consumption by
only active backends (which execute the query). Accounting for the memory
occupied by idle backends is a more complex task that should be done separately.