Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a nginxprocess package for working with NGINX processes #956

Open
wants to merge 3 commits into
base: v3
Choose a base branch
from

Conversation

ryepup
Copy link
Contributor

@ryepup ryepup commented Dec 27, 2024

Proposed changes

Add a new nginxprocess package to export useful utilities around working with NGINX processes, and update internals to use that package. NGINXaaS for Azure has some proprietary code that also wants to use these utilities.

This comes with a significant speed improvement by filtering for NGINX processes earlier and fetching less data from the OS. See individual commits for more details.

Checklist

Before creating a PR, run through this checklist and mark each as complete.

  • I have read the CONTRIBUTING document
  • I have run make install-tools and have attached any dependency changes to this pull request
  • If applicable, I have added tests that prove my fix is effective or that my feature works
  • If applicable, I have checked that any relevant tests pass after adding my changes
  • If applicable, I have updated any relevant documentation (README.md)
  • If applicable, I have tested my cross-platform changes on Ubuntu 22, Redhat 8, SUSE 15 and FreeBSD 13

@ryepup ryepup requested a review from a team as a code owner December 27, 2024 19:50
@github-actions github-actions bot added the chore Pull requests for routine tasks label Dec 27, 2024
Copy link
Contributor

github-actions bot commented Dec 27, 2024

✅ All required contributors have signed the F5 CLA for this PR. Thank you!
Posted by the CLA Assistant Lite bot.

@ryepup
Copy link
Contributor Author

ryepup commented Dec 27, 2024

I have hereby read the F5 CLA and agree to its terms

@ryepup ryepup force-pushed the export-pid-features branch from 22a996f to 39ba7e4 Compare December 27, 2024 20:01
@ryepup
Copy link
Contributor Author

ryepup commented Dec 27, 2024

I'm not sure why the pipeline is failing; make unit-test and make performance-test work if I run them locally, and I didn't change the otel collector section.

@ryepup ryepup force-pushed the export-pid-features branch from 39ba7e4 to eed2042 Compare December 27, 2024 20:18
@@ -39,7 +36,7 @@ func (nhw *NginxHealthWatcher) Health(ctx context.Context, instance *mpi.Instanc
}

proc, err := nhw.processOperator.Process(ctx, instance.GetInstanceRuntime().GetProcessId())
if errors.Is(err, process.ErrorProcessNotRunning) {
if nginxprocess.IsNotRunningErr(err) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm matching the existing behavior here, but there's a slim chance that instance.GetInstanceRuntime().GetProcessId() is a running process but not an NGINX process. Should we add another branch using nginxprocess.IsNotNginxErr to catch that case?

@ryepup ryepup force-pushed the export-pid-features branch from eed2042 to f27ef37 Compare December 27, 2024 20:34
pkg/nginxprocess/process.go Outdated Show resolved Hide resolved
pkg/nginxprocess/process.go Outdated Show resolved Hide resolved
pkg/nginxprocess/process.go Outdated Show resolved Hide resolved
loadStatus bool
}

// Option customizes how processes are gathered from the OS.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe learning for me, the interface for apply seems extraneous to me. Why not just

type Option func(opts *options)

func WithStatus(v bool) Option {
	return func(opts *options) {
		opts.loadStatus = v
	}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the "functional options" implementation we settled on NGINXaaS. It does take more code, but sets a stronger boundary between exported and unexported symbols. This is a similar approach used by opentelemetry.

The downside of type Option func(opts *options) is that it exposes the unexported options type as part of the exported Option type. There's not much a caller can do with that information, but it's exposing details we might want to change in the future without having to think about compatibility.

The other (more hypothetical) benefit is flexibility. In the unlikely event this package grows a lot of more features, we could add another option interface (e.g maybe we split into FindOption and ListOption). WithStatus could return a type that implements two interfaces, but that's not possible with func types.

If this code were in an internal package, I'd probably just use type Option func(opts *options), as all the callers are known. Since this is a public codebase in a package meant to be consumed by other software, I figured I'd err on the side of paranoia.

@ryepup ryepup force-pushed the export-pid-features branch 3 times, most recently from 0235a9b to 2a306bf Compare December 30, 2024 19:06
This package is intended for use by any software that wants to
identify NGINX processes.

It's mostly a refactor of the existing `process.ProcessOperator` with
a few differences and some features wanted by NGINXaaS.

- minimizes `gopsutil` calls, many of which end up shelling out to
`ps`. `nginxprocess.List` filters out non-NGINX processes
early. `process.ProcessOperator` gathered details for all processes,
and then non-NGINX processes were filtered out later by
`instance.NginxProcessParser`. `nginxprocess.List` also fetches less
process information, keeping only what the agent codebase actually
used.

A quick benchmark on my laptop (~750 processes, 12 of which are NGINX)
showed a drastic speed improvement:

| Benchmark   | Iterations | Time (ns/op) | Memory (B/op) | Allocations (allocs/op) |
| ----------- | ---------- | ------------ | ------------- | ----------------------- |
| old         | 1          | 26832273195  | 18282768      | 133775                  |
| new         | 9          | 114558263    | 3439672       | 12836                   |

- uses a functional options pattern to optionally fetch process
status, which is only needed in a few places and saves some more
shelling out to `ps`. This could be expanded in the future to support
other options while retaining backwards compatibility.

- includes some error testing funcs to let callers handle different
errors without needing to know how those errors are implemented. This
helps trap `gopsutil` implementation details in `package nginxprocess`
and makes future `gopsutil` upgrades easier.

- respects context cancellation a little faster
Swap internal packages to use the shared `nginxprocess` package for
details about NGINX processes.

This is mostly switching `model.Process` for `nginxprocess.Process`,
updating tests, and using features of `nginxprocess. There is one
non-trivial change: `model.Process.Running` has been dropped, there is
no equivalent in `nginxprocess`.

`Running` was almost always true, because it was calculated immediately after
a successful call to `NewProcessWithContext`. In order for it to be
false, a process would have to end after `NewProcessWithContext`, but
before `IsRunningWithContext`, which is an incredibly tiny time
window. There is also no guarantee that the process is _still_ running
by the time callers check `Running`, so the field was a little
misleading. The system now relies solely on errors from
`NewProcessWithContext` to identify terminated processes. Worst case
we catch a dead process on the next tick of `HealthWatcherService`
Existing names were really similar to imported package names or other
variables.
@ryepup ryepup force-pushed the export-pid-features branch from 2a306bf to fe5c421 Compare December 30, 2024 19:43
@ryepup
Copy link
Contributor Author

ryepup commented Dec 30, 2024

I'm not sure why the pipeline is failing; make unit-test and make performance-test work if I run them locally, and I didn't change the otel collector section.

unit-test fixed itself, and the performance-test was my fault; I added a benchmark that failed if there were no nginx processes. I switched that up so hopefully it'll be happy now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
chore Pull requests for routine tasks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants