Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Live metrics and collection rules don't work when also application also exporting Prometheus metrics #6937

Closed
jwreford99 opened this issue Jul 2, 2024 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@jwreford99
Copy link
Contributor

jwreford99 commented Jul 2, 2024

Description

I have been having a real head scratcher with regards to getting basic metrics out of dotnet monitor. There is a potential this is expected behaviour, but I would love to know what I can do if this is expected behaviour, so would be very appreciative of any response 😄

I setup a simple collection rule against our application, which should have triggered when a very low memory limit was breached, however, even when the memory is much higher than the threshold for a sustained period the collection rule never fired. I then tried hitting the livemetrics endpoint and was getting no response (200 OK, but no content).

I then tried asking for a specific meter with the request:

{
  "includeDefaultProviders": false,
  "meters": [
    {
      "meterName": "System.Net.NameResolution",
      "instrumentNames": [
        "dns.lookup.duration"
      ]
    }
  ]
}

and started seeing the error Another metrics collection session is already in progress for the target process. This was the first time I saw an actual error in the logs, otherwise it was all normal log messages

I have managed to get a minimal reproduction by spinning up https://github.com/djluck/prometheus-net.DotNetRuntime. Specifically, just doing DotNetRuntimeStatsBuilder.Customize().StartCollecting() seems to cause both livemetrics endpoint to come back empty and the collection rules not to work.

The application code at it's simplest is:
example.txt

With the deployment to kubernetes looking like:
example-deployment.txt

My expectation would be:

  • Collection rules and livemetrics endpoint works correctly
  • Failing that, some error message to indicate that there is a problem
  • (And if that was the case I would be massively appreciative if you could suggest something I could do differently

Configuration

Dotnet monitor version 8 running on x64 in a kubernetes cluster.
Monitoring a simple application which exposes metrics as Prometheus

Regression?

I did check this against version 7 and it seems to be the same

Other information

Another metrics collection session is already in progress for the target process.\nConcurrent sessions are not supported."
Obvious workaround would be to stop reporting to Prometheus via this method but I would really like to avoid having to change that 😅 !

@jwreford99 jwreford99 added the bug Something isn't working label Jul 2, 2024
@kkeirstead
Copy link
Member

Hi @jwreford99, hopefully I can help get you unblocked! What you're describing definitely does sound like it should be possible, so hopefully it's just an issue related to configuration/setup and not a bug in dotnet-monitor.

For starters, I ran your sample locally, and also wasn't getting any metrics back - which was surprising, because my personal sample was working right before that with metrics/live metrics/collection rules simultaneously. I did some preliminary debugging, and I think I uncovered at least part of the problem - in your configuration for dotnet monitor, there's a section you can fill in for GlobalCounter. For example, my settings.json looked like:

"GlobalCounter": { "MaxHistograms": 10, "MaxTimeSeries": 1000, "IntervalSeconds": 5 }

Internally, dotnet monitor checks to make sure any metrics that are being returned are on the requested interval - if they're not, we don't capture them. By the looks of it, the DotNetRuntimeStatsBuilder is using a refresh interval of 1 second, so dotnet monitor wouldn't pick up its metrics. Once I updated my configuration to have "IntervalSeconds": 1, I started seeing metrics and live metrics come through.

If you can, give this a try and see if it does anything for you - if you still have more issues, let me know and we can dig further into what's going on 😄

@jwreford99
Copy link
Contributor Author

@kkeirstead you absolute legend! Thanks so much, this is now working for me 😄
May I ask, how did you know that DotNetRuntimeStatsBuilder was using a refresh interval of 1 second? And if it isn't a completely ridiculous question, why does it matter that a different thing had a different refresh interval, I would expect the value to come directly from the counters not via the Prometheus stuff?

Potentially a thick question!

@juris-greitans
Copy link

Hi. I am using a collection rule with CollectLiveMetrics action and this action never returns any metrics. Setting IntervalSeconds to 1 in settings file fixes the issue.

@kkeirstead
Copy link
Member

@kkeirstead you absolute legend! Thanks so much, this is now working for me 😄 May I ask, how did you know that DotNetRuntimeStatsBuilder was using a refresh interval of 1 second? And if it isn't a completely ridiculous question, why does it matter that a different thing had a different refresh interval, I would expect the value to come directly from the counters not via the Prometheus stuff?

Potentially a thick question!

Glad to hear that worked! To answer your first question - I only knew that by debugging through dotnet monitor. I'm not sure if there was a way for the end-user to know what the issue was, which may end up being an action item on our part to improve clarity here. For your second question, I'll point you to our docs, which at least gives a high-level explanation.

@jwreford99
Copy link
Contributor Author

Thanks @kkeirstead, given you have answered my question and fixed the issue for me I will close this off now, but I agree it would definitely be a nice QOL improvement for dotnet monitor to emit some logging to indicate this problem.

Thanks again, your team is great and always so speedy and helpful 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants