Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOTNET-5968 Add extra information to /metrics endpoint (SUP-5887) #228

Merged
merged 3 commits into from
Nov 18, 2024

Conversation

mattwarren
Copy link
Contributor

@mattwarren mattwarren commented Nov 14, 2024

https://contrast.atlassian.net/browse/DOTNET-5968

From https://contrast.atlassian.net/browse/SUP-5887?focusedCommentId=492466:

Create a new release of the Agent Operator with the following:
- Additional information exposed via the /metrics endpoint (still researching how we do some of this)
  - Native cpu/memory usage of the Operator
  - Pod cpu/memory limits from the K8s API
  - "IsLeader" flag
- Extra logging to diagnose the uncaught .NET Exception that is causing the restart
  - We think it’s an "Out of Memory" exception, but need to confirm this
  - Currently dmesg just shows `.NET ThreadPool[1547673] general protection fault ip:7f54186ec50f sp:7f539e2d0c60 error:0 in libc.so.6`

@mattwarren mattwarren requested a review from a team as a code owner November 14, 2024 13:56
@mattwarren
Copy link
Contributor Author

I've tested this locally and it looks okay:

// http://localhost:5001/api/v1/metrics
{
  "UptimeSeconds": 94.9121163,
  "Resources.DaemonSetResource.NamespacesCount": 1,
  "Resources.DaemonSetResource.ResourcesCount": 1,
  "Resources.DeploymentResource.NamespacesCount": 1,
  "Resources.DeploymentResource.ResourcesCount": 1,
  "Resources.SecretResource.NamespacesCount": 1,
  "Resources.SecretResource.ResourcesCount": 1,
  "Resources.PodResource.NamespacesCount": 1,
  "Resources.PodResource.ResourcesCount": 9,
  "Resources.Global.NamespacesCount": 1,
  "Resources.Global.ResourcesCount": 12,
  "Injected.PodsCount": 0,
  "Performance.Gen2GCCount": 0,
  "Performance.Gen1GCCount": 0,
  "Performance.PercentTimeinGCsincelastGC": 0,
  "Performance.WorkingSet": 141,
  "Performance.CPUUsage": 0,
  "Performance.AllocationRate": 5149584,
  "Performance.Gen0Size": 292784,
  "Performance.MonitorLockContentionCount": 6,
  "Performance.NumberofMethodsJitted": 10935,
  "Performance.Gen1Size": 10442536,
  "Performance.ILBytesJitted": 690771,
  "Performance.POHPinnedObjectHeapSize": 212128,
  "Performance.GCHeapSize": 20,
  "Performance.NumberofAssembliesLoaded": 189,
  "Performance.GCCommittedBytes": 29,
  "Performance.ExceptionCount": 290,
  "Performance.NumberofActiveTimers": 26,
  "Performance.ThreadPoolQueueLength": 0,
  "Performance.LOHSize": 567672,
  "Performance.ThreadPoolThreadCount": 6,
  "Performance.TimespentinJIT": 635.7404000000001,
  "Performance.GCFragmentation": 3.630939892821068,
  "Performance.ThreadPoolCompletedWorkItemCount": 543,
  "Performance.Gen0GCCount": 0,
  "Performance.Gen2Size": 576,
  "IsLeader": "True"
}

@mattwarren mattwarren marked this pull request as draft November 14, 2024 16:51
@mattwarren mattwarren changed the title Add extra information to /metrics endpoint (SUP-5887) DOTNET-5968 Add extra information to /metrics endpoint (SUP-5887) Nov 15, 2024
@mattwarren
Copy link
Contributor Author

Added details metrics with the Process. prefix, the new output is below.
Note that I tested this in a local cluster with replicas: 2 and the IsLeader value works as expected, i.e. one Pod has "IsLeader": "True" and the other "IsLeader": "False"

{
  "Injected.PodsCount": 0,
  "Performance.AllocationRate": 1552280,
  "Performance.CPUUsage": 0,
  "Performance.ExceptionCount": 48,
  "Performance.GCCommittedBytes": 0,
  "Performance.GCFragmentation": 0,
  "Performance.GCHeapSize": 39,
  "Performance.Gen0GCCount": 0,
  "Performance.Gen0Size": 0,
  "Performance.Gen1GCCount": 0,
  "Performance.Gen1Size": 0,
  "Performance.Gen2GCCount": 0,
  "Performance.Gen2Size": 0,
  "Performance.ILBytesJitted": 875597,
  "Performance.LOHSize": 0,
  "Performance.MonitorLockContentionCount": 8,
  "Performance.NumberofActiveTimers": 18,
  "Performance.NumberofAssembliesLoaded": 179,
  "Performance.NumberofMethodsJitted": 13013,
  "Performance.PercentTimeinGCsincelastGC": 0,
  "Performance.POHPinnedObjectHeapSize": 0,
  "Performance.ThreadPoolCompletedWorkItemCount": 534,
  "Performance.ThreadPoolQueueLength": 0,
  "Performance.ThreadPoolThreadCount": 6,
  "Performance.TimespentinJIT": 16.364399999999932,
  "Performance.WorkingSet": 164,
  "Resources.DaemonSetResource.NamespacesCount": 1,
  "Resources.DaemonSetResource.ResourcesCount": 1,
  "Resources.DeploymentResource.NamespacesCount": 2,
  "Resources.DeploymentResource.ResourcesCount": 2,
  "Resources.Global.NamespacesCount": 2,
  "Resources.Global.ResourcesCount": 16,
  "Resources.PodResource.NamespacesCount": 2,
  "Resources.PodResource.ResourcesCount": 11,
  "Resources.SecretResource.NamespacesCount": 2,
  "Resources.SecretResource.ResourcesCount": 2,
  "UptimeSeconds": 315.2051509,
  "Process.WorkingSet64": 169861120,
  "Process.MinWorkingSet": 0,
  "Process.MaxWorkingSet": 536870912,
  "Process.PeakWorkingSet64": 169861120,
  "Process.PrivateMemorySize64": 253161472,
  "Process.VirtualMemorySize64": 6317158400,
  "Process.PeakVirtualMemorySize64": 6334402560,
  "Process.PagedMemorySize64": 0,
  "Process.PeakPagedMemorySize64": 0,
  "Process.NonpagedSystemMemorySize64": 0,
  "Process.TotalProcessorTime": "00:00:05.2300000",
  "Process.UserProcessorTime": "00:00:04.9100000",
  "Process.PrivilegedProcessorTime": "00:00:00.3200000",
  "Process.Thread": 19,
  "Process.Modules": 151,
  "IsLeader": "False"
}

@mattwarren mattwarren marked this pull request as ready for review November 15, 2024 12:39
@gamingrobot gamingrobot merged commit 8bf3e3f into master Nov 18, 2024
17 checks passed
@gamingrobot gamingrobot deleted the extra-metrics branch November 18, 2024 18:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants