Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow setting shutdown timeout for primary and audit instances #4822

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

mauroservienti
Copy link
Member

No description provided.

// Windows services allow a maximum of 125 seconds when stopping a service.
// When shutting down or restarting the OS we have no control over the
// shutdown timeout
public TimeSpan ShutdownTimeout { get; set; } = TimeSpan.FromMinutes(2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When running in docker the timeout is 10s on linux and 30s on windows

https://docs.docker.com/reference/cli/docker/container/stop/#:~:text=The%20default%20timeout%20can%20be,30%20seconds%20for%20Windows%20containers.

We should make sure that the SCMU sets this value since it knows the host will be a windows service and similar for our docker compose examples we should set the value as well.

That said I wonder if selecting the "maximum allowed timeout for any of the known hosting environments" is the best thing to do?

This means that where we miss to set this the instance will always be killed by the host

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You say we should not have a default value but always set it according to the running platform. Is this correct?

Another option (which I'm not sure I like) would be to detect somehow in code the hosting platform and set a default value accordingly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You say we should not have a default value but always set it according to the running platform. Is this correct?

Not sure, I think a default is good but not sure what it should be set to.

Regarding detecting inside: I have done some research about figuring out a good "instance id" and concluded that it's easier to set it from the outside since there you know what it would be.

In this case: SCMU => knows that it will be a Windows service we can set it to the maximum value that can be requested

https://stackoverflow.com/questions/13454054/what-is-the-maximum-time-windows-service-wait-to-process-stop-request-and-how-to

Are we requesting additional time when running as a win service?

Side note: I just realized that in docker we run with a separate database so there we can be more aggressive when shutting down since we don't have to wait for the database to stop3

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we requesting additional time when running as a win service?

Our understanding is the host does that automatically by setting

services.Configure<HostOptions>(options => options.ShutdownTimeout = settings.ShutdownTimeout);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default should be 2 minutes, here is why:

  • With external mode, 10 seconds should be plenty and the container is resilient to termination. If we would be running for longer then 10 seconds, we likely were not able to adhere to any cancellation anyway.

  • With internal mode, the platform is Windows and we can keep running with 2 minutes. We could even run longer but likely the Windows host isn't configured to allow for it. We WANT to be able to run "cancellation" because we want to ensure the child process gets killed so not cause any issues when the service instance gets restarted.

  • With internal mode but a reboot, in that case we will be killed anyway. We extend stop to allow for the database to gracefully exit. In case of a reboot we'll be killed when taking longer than 30 seconds.

i.e. Setting ShutdownTimeout to 2 minutes is good. In external we would be gracefully existing in a few seconds. If not, we got other problems.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take a look at the last two commits:

Aren't those more explicit?

// Windows services allow a maximum of 125 seconds when stopping a service.
// When shutting down or restarting the OS we have no control over the
// shutdown timeout
public TimeSpan ShutdownTimeout { get; set; } = TimeSpan.FromMinutes(2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only gives 5 seconds of "cancellation" which I think is sufficient because it should be a sort of let everything go and log about it but do not delay termination

…ume it is Windows, and thus we can set the shutdown timeout to 2 minutes
// When shutting down or restarting the OS we have no control over the
// shutdown timeout. This is by the installer engine that is run _only_ on
// Windows via SCMU or PowerShell
settings.Set(ServiceControlSettings.ShutdownTimeout, "00:02:00");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that familiar with this. This is added during an upgrade to the config?

Comment on lines +184 to +186
// The default value is set to the maximum allowed time by the most restrictive
// hosting platform, which Docker Linux containers.
public TimeSpan ShutdownTimeout { get; set; } = TimeSpan.FromSeconds(10);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 seconds is when we get killed, you set the timeout to allow for cancellation.

As in:

  • if we're interested in cancellation, set it to less than the host its kill duration
  • if we're not interested in cancellation, set this to a high value that gives us time to run a graceful termination

With external, I think 5 seconds is already plenty
With internal, I want to wait as long as possible

So I like your last commits, but if we want to see cancellation to take place this number should be even lower:

Suggested change
// The default value is set to the maximum allowed time by the most restrictive
// hosting platform, which Docker Linux containers.
public TimeSpan ShutdownTimeout { get; set; } = TimeSpan.FromSeconds(10);
// The default value is set to the maximum allowed time by the most restrictive
// hosting platform, which Docker Linux containers minus a few seconds
// to actually allow for cancellation and logging to take place
public TimeSpan ShutdownTimeout { get; set; } = TimeSpan.FromSeconds(5);

@andreasohlund
Copy link
Member

Should this PR also update SCMU to set it to a higher value on upgrade? (since there we know its running as a win service)

@mauroservienti
Copy link
Member Author

Sure thing, @andreasohlund. I'll continue working on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants