From 8631c7c79a1adffaf769499e68c56adefcd92e7b Mon Sep 17 00:00:00 2001 From: Tobias Werth Date: Sun, 24 Sep 2023 14:44:45 +0200 Subject: [PATCH] Disarm timer in runguard after child has exited. If we would not disarm the timer, there is a possibility that the timer sends us a SIGALRM while we are still busy with cleaning the sandbox up. What you observe in these cases is a judging with wall time well below the time limit is judged as TLE, e.g. ``` Timelimit exceeded. runtime: 0.288s cpu, 0.302s wall memory used: 26066944 bytes ********** runguard stderr follows ********** /opt/domjudge/bin/runguard: warning: timelimit exceeded (hard wall time): aborting command ``` In practice, we saw the behavior happening when running many judgedaemons and domserver on a single machine while rejudging the whole contest (i.e. under quite high load). In that case, the call `cgroup_delete_cgroup_ext` did sometimes hang for multiple seconds. For easy reproducibility, you can also add an artificial delay in the clean up code, e.g. by adding something like: ``` const struct timespec artificial_delay = { 10, 0 }; nanosleep(&artificial_delay, NULL); ``` --- judge/runguard.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/judge/runguard.c b/judge/runguard.c index a1097b8b7b..2cd7fcc7bc 100644 --- a/judge/runguard.c +++ b/judge/runguard.c @@ -1428,8 +1428,23 @@ int main(int argc, char **argv) } else { exitcode = WEXITSTATUS(status); } + verbose("child exited with exit code %d", exitcode); - check_remaining_procs(); + if ( use_walltime ) { + /* Disarm timer we set previously so if any of the + * clean-up steps below are slow we are not mistaking + * this for a wall-time timeout. */ + itimer.it_interval.tv_sec = 0; + itimer.it_interval.tv_usec = 0; + itimer.it_value.tv_sec = 0; + itimer.it_value.tv_usec = 0; + + if ( setitimer(ITIMER_REAL,&itimer,NULL)!=0 ) { + error(errno,"disarming timer"); + } + } + + check_remaining_procs(); double cputime = -1; output_cgroup_stats(&cputime);