forked from greenplum-db/gpdb-archive
-
Notifications
You must be signed in to change notification settings - Fork 22
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix panic with 'stuck spinlock' during crash recovery tests (#901)
Problem: Isolation crash recovery tests (for example, uao_crash_compaction_column, crash_recovery_dtm, etc.) failed with 'stuck spinlock' panic with very low reproduction rate. Cause: Problem happened in the following case: 1. During the test, a backend process was forced into a panic (according to test scenario). 2. At the very same moment, the background writer process called 'SIMPLE_FAULT_INJECTOR("fault_in_background_writer_main")', he did it regularly from the loop in 'BackgroundWriterMain'. The bg writer acquired the spinlock inside the fault injector, but before it released it... 3. Postmaster started to send the SIGQUIT signal to all child processes. 4. Background writer process received SIGQUIT and halted its execution without releasing the spinlock. 5. In the background writer process, a handler for the SIGQUIT signal was invoked - 'bg_quickdie'. And 'bg_quickdie' called 'SIMPLE_FAULT_INJECTOR("fault_in_background_writer_quickdie");'. As the spinlock was not released, 'bg_quickdie' hanged on the spinlock. 6. Postmaster failed to wait child processes completion. And after timeout it tried to check 'SIMPLE_FAULT_INJECTOR("postmaster_server_loop_no_sigkill")' and hanged on the same spinlock as well. 7. Finally, both postmaster and 'bg_quickdie' failed with 'stuck spinlock' panic. Fix: The general rule would be that it is NOT allowed to access the fault injector from the postmaster process or from any SIGQUIT handler of child processes when the system is resetting after the crash of some backend. It is so because during reset postmaster terminates child processes, and it might terminate some process when the process has acquired the spinlock of the fault injector, but hasn't released it yet. So, subsequent calls to the fault injector api from postmaster or any SIGQUIT handler will lead to deadlock. Only 'doomed' processes can still call the fault injector api, as they will soon be terminated anyway. According to written above, following changes are made: 1. Access to the fault injector during a reset is removed from 'bg_quickdie' and postmaster's ServerLoop. 2. The fts_segment_reset test and related code are redesigned. Now, instead of sleeping for a delay defined by the test in 'bg_quickdie', postmaster sends SIGSTOP to the bg writer, and starts a timer for the delay. Once the delay elapses, the timer handler sends SIGCONT and SIGQUIT to the bg writer. This logic is triggered if the postmaster detects the new fault "postmaster_delay_termination_bg_writer" (which is a replacement for "fault_in_background_writer_quickdie" and "postmaster_server_loop_no_sigkill"). This fault is checked only when the postmaster is in the PM_RUN state. So it should be safe to check for it. New tests specific to this issue are not added because of the unstable nature of the problem. Changes from original commit: 1. Add a NULL check to timeout.c (borrowed from GPDB 6) 2. Changed comments. (cherry picked from commit dae2f08)
- Loading branch information
1 parent
41ad71f
commit d15eade
Showing
5 changed files
with
109 additions
and
35 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters