forked from greenplum-db/diskquota-archive
-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ADBDEV-3685 Error handling for disqkuota worker startup stage (#20)
During diskquota worker's first run the initial set of active tables with their sizes is being loaded from diskquota.table_size table in order to warm up diskquota rejectmap and other shared memory objects. If an error occurs during this initialization process, the error will be ignored in PG_CATCH() block. Because of that local_active_table_stat_map will not be filled properly. And at the next loop iteration tables, that are not in acitive table list will be marked as irrelevant and to be deleted both from table_size_map and table_size table in flush_to_table_size function. In case when the inital set of active tables is huge (thousands of tables), this error ignorance could lead to the formation of a too long delete statement, which the SPI executor won't be able to process due to memory limits. And this case can lead to worker's segmentation fault or other errorneous behaviour of whole extension. This commit proposes the handling of the initialization errors, which occur during worker's first run. In the DiskquotaDBEntry structure the bool variable "corrupted" is added in order to indicate, that the worker wasn't able to initialize itself on given database. And DiskquotaDBEntry also is now passed to refresh_disk_quota_model function from worker main loop, because one need to change the state of dbEntry. The state is changed when the refresh_disk_quota_usage function catches an error, which occured during the initialization step, in PG_CATCH() block. And after the error is catched, the "corrupted" flag is set in given dbEntry, and then the error is rethrown. This leads to worker process termination. The launcher will not be able to start it again, because added flag is set in the database structure, and this flag is being checked inside the disk_quota_launcher_main function. The flag can be reseted by calling resetBackgroundWorkerCorruption function, which is currently called in SIGHUP handler.
- Loading branch information
1 parent
be945ba
commit 3b06e37
Showing
7 changed files
with
133 additions
and
15 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
-- | ||
-- Tests for error handling when the worker catches the error during | ||
-- its first run. | ||
-- | ||
|
||
-- Function checking whether worker on given db is up | ||
CREATE or REPLACE LANGUAGE plpython2u; | ||
CREATE | ||
CREATE or REPLACE FUNCTION check_worker_presence(dbname text, wait_time int) RETURNS boolean AS $$ import psutil import time worker_name = 'bgworker: [diskquota] ' + dbname time.sleep(wait_time) for proc in psutil.process_iter(): try: if 'postgres' in proc.name().lower(): for val in proc.cmdline(): if worker_name in val: return True except (psutil.NoSuchProcess, psutil.AccessDenied, psutil.ZombieProcess): pass return False $$ LANGUAGE plpython2u EXECUTE ON MASTER; | ||
CREATE | ||
|
||
-- Test diskquota behavior when an error occurs during the worker's first run. | ||
-- The error leads to process termination. And launcher won't start it again | ||
-- until extension reload or SIGHUP signal. | ||
CREATE EXTENSION diskquota; | ||
CREATE | ||
SELECT check_worker_presence(current_database(), 0); | ||
check_worker_presence | ||
----------------------- | ||
t | ||
(1 row) | ||
SELECT gp_inject_fault('diskquota_worker_initialization', 'error', dbid) FROM gp_segment_configuration WHERE role='p' AND content=-1; | ||
gp_inject_fault | ||
----------------- | ||
Success: | ||
(1 row) | ||
SELECT diskquota.init_table_size_table(); | ||
init_table_size_table | ||
----------------------- | ||
|
||
(1 row) | ||
SELECT check_worker_presence(current_database(), current_setting('diskquota.worker_timeout')::int / 2); | ||
check_worker_presence | ||
----------------------- | ||
f | ||
(1 row) | ||
-- Reload configuration and check that worker is up again | ||
!\retcode gpstop -u; | ||
(exited with code 0) | ||
SELECT check_worker_presence(current_database(), current_setting('diskquota.worker_timeout')::int / 2); | ||
check_worker_presence | ||
----------------------- | ||
t | ||
(1 row) | ||
DROP EXTENSION diskquota; | ||
DROP |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
-- | ||
-- Tests for error handling when the worker catches the error during | ||
-- its first run. | ||
-- | ||
|
||
-- Function checking whether worker on given db is up | ||
CREATE or REPLACE LANGUAGE plpython2u; | ||
CREATE or REPLACE FUNCTION check_worker_presence(dbname text, wait_time int) | ||
RETURNS boolean | ||
AS $$ | ||
import psutil | ||
import time | ||
worker_name = 'bgworker: [diskquota] ' + dbname | ||
time.sleep(wait_time) | ||
for proc in psutil.process_iter(): | ||
try: | ||
if 'postgres' in proc.name().lower(): | ||
for val in proc.cmdline(): | ||
if worker_name in val: | ||
return True | ||
except (psutil.NoSuchProcess, psutil.AccessDenied, psutil.ZombieProcess): | ||
pass | ||
return False | ||
$$ LANGUAGE plpython2u EXECUTE ON MASTER; | ||
|
||
-- Test diskquota behavior when an error occurs during the worker's first run. | ||
-- The error leads to process termination. And launcher won't start it again | ||
-- until extension reload or SIGHUP signal. | ||
CREATE EXTENSION diskquota; | ||
SELECT check_worker_presence(current_database(), 0); | ||
SELECT gp_inject_fault('diskquota_worker_initialization', 'error', dbid) | ||
FROM gp_segment_configuration WHERE role='p' AND content=-1; | ||
SELECT diskquota.init_table_size_table(); | ||
SELECT check_worker_presence(current_database(), | ||
current_setting('diskquota.worker_timeout')::int / 2); | ||
-- Reload configuration and check that worker is up again | ||
!\retcode gpstop -u; | ||
SELECT check_worker_presence(current_database(), | ||
current_setting('diskquota.worker_timeout')::int / 2); | ||
DROP EXTENSION diskquota; |