-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[usm] service discovery, system-probe ignore configured services by name #30702
[usm] service discovery, system-probe ignore configured services by name #30702
Conversation
Test changes on VMUse this command from test-infra-definitions to manually test this PR changes on a VM: inv create-vm --pipeline-id=48395348 --os-family=ubuntu Note: This applies to commit a777efa |
@@ -0,0 +1,51 @@ | |||
// Unless explicitly stated otherwise all files in this repository are licensed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is it in a new file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
source file impl_linux.go already reached 600 lines, which I think it considered large file, I didn't want to increase this file further, this is why new file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is misleading to have methods of the same struct defined on multiple files
let's fold it back to the original file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// ignoreServices is a list of service names that should not be reported as a service. | ||
var ignoreServices = map[string]struct{}{ | ||
"datadog-agent": {}, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why only datadog agent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is meantime only service defined in scope of the ticket to be bypassed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR is pretty much irrelevant
Customers with the datadog agent don't set DD_SERVICE for us, and we don't do that by default
So I'm concerned the current implementation does not provide any actual value and impact
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to delete this PR and close Jira ticket?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, I want to improve the ticket description, and make the implementation better for the general case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated Jira ticket and title of this PR. Now this PR allows to add excluded service name to system-probe config. The config has one default service "datadog-agent"
pkg/collector/corechecks/servicediscovery/module/ignore_proc.go
Outdated
Show resolved
Hide resolved
pkg/collector/corechecks/servicediscovery/module/ignore_proc.go
Outdated
Show resolved
Hide resolved
pkg/collector/corechecks/servicediscovery/module/ignore_proc_test.go
Outdated
Show resolved
Hide resolved
… addIgnoredPid(proc.Pid).
Regression Detector |
func (s *discovery) shouldIgnoreService(name string, proc *process.Process) bool { | ||
s.mux.Lock() | ||
_, found := ignoreServices[name] | ||
if found { | ||
s.ignorePids[proc.Pid] = true | ||
} | ||
s.mux.Unlock() | ||
|
||
return found |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a function called should....
should only report true/false, and not changing an internal state
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed this function
s.mux.Lock() | ||
s.ignorePids[pid] = true | ||
s.mux.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Always prefer defered unlock, it guarantees future modifications won't miss the unlock
s.mux.Lock() | |
s.ignorePids[pid] = true | |
s.mux.Unlock() | |
s.mux.Lock() | |
defer s.mux.Unlock() | |
s.ignorePids[pid] = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very short frequently used function, 'defer' instruction requires additional allocation, which is not necessary in this case. This function is not expected to grow significantly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a very negligible optimization.
Please follow the effective go guidelines https://go.dev/doc/effective_go#defer
Those are meant to reduce human errors and mistakes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed this function to use 'defer'
|
||
// shouldIgnoreService saves pid of the process if the service should be excluded from handling | ||
// and returns true for such process. | ||
func (s *discovery) shouldIgnoreService(name string, proc *process.Process) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to get Process
here, if you care only about the PID
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed this code and separated ignored service and ignored pid
makeAlias(t, test.comm, serverBin) | ||
bin := filepath.Join(serverDir, test.comm) | ||
cmd := exec.CommandContext(ctx, bin) | ||
cmd.Env = append(os.Environ(), test.envs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why os.Environ
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
err := cmd.Start() | ||
require.NoError(t, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
err := cmd.Start() | |
require.NoError(t, err) | |
require.NoError(t, cmd.Start()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// wait until the service name becomes available | ||
info, err := discovery.getServiceInfo(proc) | ||
assert.NoError(collect, err) | ||
assert.NotEmpty(collect, info.ddServiceName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're expecting for a certain value (based on the environment variable value), use it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this unit test intends to check that Pid has been bypassed, there are other unit test covering service name extraction. I do not think this unit test should check match of service name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then why do you test ddServiceName
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added service name verification
if s.shouldIgnoreService(name, proc) { | ||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't you add the pid to the ignore list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed this code and separated ignored service and addition of ignored pid
The ticket and PR should not focus merely on DD_SERVICE, but to filter entries by service name Please fix the code, PR description and title accordingly |
// loadIgnoredServices saves names that should not be reported as a service | ||
func (config *discoveryConfig) loadIgnoredServices(services []string) { | ||
if len(services) == 0 { | ||
log.Debug("loading ignored services found empty commands list") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copy-paste bug, commands list
is inaccurate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
…trace-agent, process-agent, datadog-cluster-agent.
/merge |
Devflow running:
|
@@ -404,6 +404,7 @@ func InitSystemProbeConfig(cfg pkgconfigmodel.Config) { | |||
cfg.BindEnvAndSetDefault(join(discoveryNS, "enabled"), false) | |||
cfg.BindEnvAndSetDefault(join(discoveryNS, "cpu_usage_update_delay"), "60s") | |||
cfg.BindEnvAndSetDefault(join(discoveryNS, "ignored_command_names"), []string{"chronyd", "cilium-agent", "containerd", "dhclient", "dockerd", "kubelet", "livenessprobe", "local-volume-pr", "sshd", "systemd"}) | |||
cfg.BindEnvAndSetDefault(join(discoveryNS, "ignored_services"), []string{"datadog-agent", "trace-agent", "process-agent", "datadog-cluster-agent"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add security-agent
and other processes listed in cmd/*
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
APM Distributed Tracing team provided us with some test environment with service discovery enabled, I took some DD services from that environment. Apparently security is not enabled there, that's why I don't see that service-agent in the discovery list. I would be careful about adding names that I don't know how they show up in discovery. This is why I prefer not to extend this list at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yuri-lipnesh - Bryce is mentioning another DD product that it is not enabled in your environment. Please open a follow up PR to address that, and run PR in dogfooding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've created USMON-1339
What does this PR do?
This code adds a hashmap to store processes that should be excluded from processing based on the service name.
This code adds excludes service name 'datadog-agent' from processing.
Motivation
USMON-1226
Some processes cannot be excluded based on the command name, which may be a generic name such as "agent".
In this case, a service can be excluded based on the service name. When a service name is identified, the process ID is stored in an internal table to exclude it from processing.
Describe how to test/QA your changes
This code passed service discovery unit test on local VM
PASS
ok github.com/DataDog/datadog-agent/pkg/collector/corechecks/servicediscovery/module 38.766s
Started couple of programs on local VM and dogfeeded to dddev environment.
Checked that services were reported on APM page "Identify and close your observability gaps"
Checked that services are not reported anymore after adding services to the list of bypassed services.
Possible Drawbacks / Trade-offs
Additional Notes