Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to better kill the nested process tree of spawned processes? #2371

Closed
bahmutov opened this issue Mar 8, 2021 · 12 comments
Closed

How to better kill the nested process tree of spawned processes? #2371

bahmutov opened this issue Mar 8, 2021 · 12 comments
Labels
stale type: feature code contributing to the implementation of a feature and/or user facing functionality

Comments

@bahmutov
Copy link

bahmutov commented Mar 8, 2021

Which problem is this feature request solving?

Our users have noticed that nested backend child processes are not killed when running Cypress end-to-end tests during the build steps. For example, if we start a backend server using npm run develop which starts the gatsby develop process then only the top-level child process is killed (because utils.run is just execa.run which does childProcess.kill via https://nodejs.org/api/child_process.html#child_process_subprocess_kill_signal which does not kill "grand" children.

image

Describe the solution you'd like

It would be nice to kill somehow the entire process tree

Describe alternatives you've considered

Netlify build just shuts down everything, maybe warning about these orphan processes, but does not show it as a dangerous error

Can you submit a pull request?

No.

@bahmutov bahmutov added the type: feature code contributing to the implementation of a feature and/or user facing functionality label Mar 8, 2021
@ehmicky
Copy link
Contributor

ehmicky commented Mar 8, 2021

Hi @bahmutov,

Thanks for reaching out on this.

The behavior you are describing is the current behavior (although the red color should probably be updated to yellow instead):

Netlify build just shuts down everything, maybe warning about these orphan processes, but does not show it as a dangerous error

Please note that the lingering processes logic needs some serious improvements, which are outlined in https://github.com/netlify/pod-workflow/issues/74. It is possible that those improvements might at least partially solve some of the problems you are describing.

Please also note that terminating a process tree is not as trivial as it first sounds. This is the second most requested feature in Execa. We tried implementing twice, but both PRs did not complete.

The main problem is for the build system to be able to distinguish programmatically between the two following cases:

  1. A user intentionally lets their build command (or a plugin author lets their plugin) end, even though some child process is still ongoing, and should complete its task for the build to properly finish. In this instance, we want to warn users that they probably did not "await" that child process properly and that our current behavior (terminating it) might break their site.
  2. A library is known to "leak" child processes, but does it in a non-erroneous fashion. In this instance, our current approach would be to recommend patching this library so it terminates its child processes.

One question I am wondering is: wouldn't the problem above still be a problem when Cypress is run locally (without Netlify)? Namely, would the gatsby develop process still be ongoing when Cypress exits? If so, is there an option to terminate child processes in Cypress, to ensure that running Cypress does not leave some background processes behind?

@ehmicky
Copy link
Contributor

ehmicky commented Mar 9, 2021

Note: this problem was reported here as well.

@bahmutov
Copy link
Author

bahmutov commented Mar 9, 2021

I agree, there is no good option and this concerns the other Linux CIs - they just don't panic and shut down the container :)

@ehmicky
Copy link
Contributor

ehmicky commented Mar 9, 2021

they just don't panic

At the moment, the build in Netlify is not panicking. Currently, the only impact of lingering processes is the presence of this warning message in the build logs. We might be sending the wrong message with the poor choice of the red color instead of yellow though.

Netlify is not waiting for those processes to end: the container is shut down right away.

Another improvement we could do would be to ignore specific programs which are known to fall in the 2. category. For example, in the case above, we could choose not to warn about gatsby develop still running in the background. However, there is a chance this might actually be a user error (1. category), in which case we might miss a chance to warn the user about fixing their site setup.

(cc @mheffner who was involved in earlier discussions about this topic)

@mheffner
Copy link
Contributor

mheffner commented Mar 9, 2021

While not the best way to handle this, I did write an experimental build plugin to handle this: https://github.com/mheffner/netlify-plugin-process-cleanup

However, we should likely kill those in the netlify build framework.

@ehmicky
Copy link
Contributor

ehmicky commented Mar 10, 2021

@mheffner From https://github.com/netlify/buildbot/issues/984, it appears that we currently terminate lingering processes. However, this is done as part of terminating the container it seems (so terminating it inside @netlify/build would be more proper as you mention).

I was curious about your thoughts on the warning message? Is this something we still intend to print?

(Also tagging @fool who was part of the above issue)

@mheffner
Copy link
Contributor

@ehmicky Yeah, we don't currently terminate them in the build. IMO we should terminate all of them before proceeding to the upload phase as it can cause inconsistent behavior if something is still modifying files while uploading.

I think we should still print the warning, even if in the future we kill them before proceeding in the build process. If a user unintentionally left something running, it could be the source of a inconsistent deploy and the message would be helpful for debugging. We would change color though if we thought red was too much.

I would also consider excluding some known repeat offenders (looking at you gatsby telemetry) from the list if we find them running. If we know they are harmless we don't need to alert users as loudly IMO.

@ehmicky
Copy link
Contributor

ehmicky commented Mar 10, 2021

Yeah, we don't currently terminate them in the build.

You're right, thanks for correcting (I implied the opposite in my previous comment, but this was incorrect). I agree we should be explicit about terminating those.

I mentioned above about some of the potential problems in using process trees to figure out which processes to terminate. For example, when the tree is deep and some intermediary process exits, their children would be inherited by the init process (i.e. become zombie processes), which can complicate tracking them.

However, we might have a solution: comparing the flat list of processes before/after the build command/plugins, and terminating any new processes. This approach is outlined in this issue and is easier to implement now that the logic is in @netlify/build.

I would also consider excluding some known repeat offenders (looking at you gatsby telemetry) from the list if we find them running. If we know they are harmless we don't need to alert users as loudly IMO.

💯
This issue was filed to implement this (including for Gatsby telemetry).

I think we should still print the warning, even if in the future we kill them before proceeding in the build process. If a user unintentionally left something running, it could be the source of a inconsistent deploy and the message would be helpful for debugging. We would change color though if we thought red was too much.

👍


Going back to your original problem @bahmutov, would the following course of action help with Cypress?

  • Terminating any process started by the build command/plugins, that is still running at the end of the build. Note: the current warning will be printed. (issue)
  • Replacing the warning color from red to yellow (Improve warnings message color #2389)
  • Do not warn about specific processes which are known to linger in many builds (issue)

@bahmutov
Copy link
Author

I say yellow color is a good step :)

@ehmicky
Copy link
Contributor

ehmicky commented Apr 9, 2021

Hi @bahmutov,

We've just released some improvements to this warning message which should make it clearer. The color is now yellow and the message has been improved. Also the detection itself has been improved.

We've also added a support guide in Netlify Community to explain why this message appears and how to remove it.
Please note that this is only a warning message, i.e. it does not impact builds (except for the logs).

We make sure that any lingering processes are terminated at the end of the build, as you suggested in your initial comment. However, when we do, we print that warning message because it might indicate an error that users might want to fix. For example, they might spawn a process and forget to await it, which might break their site.

This gets a little clumsy when that behavior is performed by a widely used library (such as gatsby develop). In that case, many users see this warning message but are not able to directly fix it. For those cases, we maintain a hardcoded list of commands that are ignored before printing this warning message.

Please let me know what you think about that approach.

@github-actions
Copy link
Contributor

This issue has been automatically marked as stale because it has not had activity in 1 year. It will be closed in 14 days if no further activity occurs. Thanks!

@github-actions github-actions bot added the stale label Oct 11, 2022
@github-actions
Copy link
Contributor

This issue was closed because it had no activity for over 1 year.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale type: feature code contributing to the implementation of a feature and/or user facing functionality
Projects
None yet
Development

No branches or pull requests

3 participants