Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disarm timer in runguard after child has exited. #2157

Merged
merged 1 commit into from
Sep 30, 2023

Conversation

meisterT
Copy link
Member

If we would not disarm the timer, there is a possibility that the timer sends us a SIGALRM while we are still busy with cleaning the sandbox up.

What you observe in these cases is a judging with wall time well below the time limit is judged as TLE, e.g.

Timelimit exceeded.
runtime: 0.288s cpu, 0.302s wall
memory used: 26066944 bytes
********** runguard stderr follows **********
/opt/domjudge/bin/runguard: warning: timelimit exceeded (hard wall time): aborting command

In practice, we saw the behavior happening when running many judgedaemons and domserver on a single machine while rejudging the whole contest (i.e. under quite high load). In that case, the call cgroup_delete_cgroup_ext did sometimes hang for multiple seconds.

For easy reproducibility, you can also add an artificial delay in the clean up code, e.g. by adding something like:

const struct timespec artificial_delay = { 10, 0 };
nanosleep(&artificial_delay, NULL);

If we would not disarm the timer, there is a possibility that the timer
sends us a SIGALRM while we are still busy with cleaning the sandbox up.

What you observe in these cases is a judging with wall time well below
the time limit is judged as TLE, e.g.
```
Timelimit exceeded.
runtime: 0.288s cpu, 0.302s wall
memory used: 26066944 bytes
********** runguard stderr follows **********
/opt/domjudge/bin/runguard: warning: timelimit exceeded (hard wall time): aborting command
```

In practice, we saw the behavior happening when running many
judgedaemons and domserver on a single machine while rejudging the whole
contest (i.e.  under quite high load). In that case, the call
`cgroup_delete_cgroup_ext` did sometimes hang for multiple seconds.

For easy reproducibility, you can also add an artificial delay in the
clean up code, e.g. by adding something like:
```
const struct timespec artificial_delay = { 10, 0 };
nanosleep(&artificial_delay, NULL);
```
Copy link
Contributor

@edomora97 edomora97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes total sense! Thanks!

@meisterT meisterT merged commit 85e8f9e into DOMjudge:main Sep 30, 2023
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants