Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Add more testing + memory stats from cgroups #10

Closed

Conversation

hamiltont
Copy link
Contributor

@hamiltont hamiltont commented Feb 3, 2020

PR adds:

  • Reports unit memory metrics by reading memory.stat from cgroupfs
  • Pulls cgroup code into distinct package to prep for upstreaming
  • Improves TravisCI setup - example
    • Adds end-to-end integration test for two live systemd versions
    • Adds unit tests
  • Increases scrape speed by about 15% by reducing dbus calls
  • Uploads code coverage - example

Remaining TODOs

  • Decide what memory stats we should export. There are a lot. @povilasv any thoughts you have would be appreciated here. It seems silly to start exporting 20 new metrics per unit so a decision should be made w.r.t. what memory metrics are most impactful
  • Cleanup docs
  • Cleanup cgroup/cgroup.go file

fixes #2

Main problem is that this does not support test coverage, because we are using an
external binary as the binary-under-test and coverage works by compiling a custom
binary with coverage-tracking-logic when running go test. It's not simple to force
compilation of these tracking statements, so as long as we use an external binary we
cannot have coverage data.

Minor issue is that the process launch time is relatively slow. Not an issue when
running 2-3 tests, but this is may be an issue down the road
This is a minimal proof-of-concept for how to test the exporter while
also getting code coverage details. Should also be a bit faster.

Bad things: If something ever goes wrong, cleaning up would be
much harder because we cannot just kill the external process. So
arguably this could be a bad thing, or in the future we bother to split
up testing into smoke tests that can run in this manner and larger
tests that should be run as an external process. But for now I'm happy
with this, I like seeing code coverage reports :-)

While it's tempting to think this magically enables parallelism in the
test suite, remember we are sharing the global default http server here.
It would not be too hard to change that, and also auto-update the port
numbers, but it's overkill for now. I'll just run stuff serially until it
becomes an issue
Replaces the 'call an external binary' approach with the new
'call a handler function in main.go' approach. Much faster.

This commit also shows a few example test cases just to get
the ball rolling
Their focus on testing-inside-docker means they do not have a good
testing-in-vm experience. Only one ubuntu version is supported, and
upgrading it on each CI run is both wasteful and a huge pain to get
correct (we only want to upgrade golang, but they have a customized
version of ubuntu to the typical backported PPA approach does not work
out of the box). As a nice side bonus, travis-ci is mostly OSS while
circleCI is not, so we can read the source if we have any issues with travis.
See golangci/golangci-lint#658 for details

Summary - golangci-lint, built with Go 1.12, will generate this error when linting Go 1.13 code
@povilasv
Copy link
Contributor

povilasv commented Feb 6, 2020

Hey 👋 let me know when it's ready for review :)

@hamiltont hamiltont changed the title WIP: Add more testing + memory stats from cgroups Add more testing + memory stats from cgroups Feb 6, 2020
@hamiltont
Copy link
Contributor Author

@povilasv Sure, anytime now should be OK. This PR is big already (sorry, seems to be my bad habit), so let's aim to clean it up as needed and get it merged before the next push ;-) I could use any insight you have on "what memory values matter enough to export as metrics"

@hamiltont
Copy link
Contributor Author

hamiltont commented Feb 6, 2020

Also, a second concern - what should the exported metrics be named? node_exporter moved to using the pattern node_memory_Capname_metric (see screenshot from one of my prometheus consoles). I could not find a good discussion on why they changed their naming pattern, or why they export so many metric names instead of just having something like node_memory{stat="anon"}. Regardless, this is a good time to consider how systemd_exporter wants timeseries for memory metrics to be organized

Screen Shot 2020-02-06 at 2 32 08 AM

@povilasv
Copy link
Contributor

Will review this week thanks for this :)

Re naming I am not sure why node exporter does this but I think we should follow https://prometheus.io/docs/practices/naming/ and do whatever makes sense for our use case and query patterns.

I feel like doing stat=anon will make one metric be super big which will suffer from huge cardinality explosion so maybe a metric per stat

@SuperQ Would be great to get some insights from you regarding naming :)

@SuperQ
Copy link
Contributor

SuperQ commented Feb 10, 2020

Labels only make sense when there is an aggregation that can be applied. Using a label for node_memory_... doesn't make sense because the various values don't add up or average together. For example: sum(node_memory_bytes) doesn't make any sense since MemTotal + MemAvailable bytes is nonsensical.

Having separate metrics also helps for more typical use cases like this:

node_memory_MemAvailable_bytes /
node_memory_MemTotal_bytes

If it were a label, you would have to do something crazy like this:

sum without (stat) (node_memory_bytes{stat="MemAvailable"}) /
sum without (stat) (node_memory_bytes{stat="MemTotal"})

@hamiltont
Copy link
Contributor Author

hamiltont commented Feb 10, 2020 via email

@baryluk
Copy link

baryluk commented Feb 10, 2020

My opinion on metrics selection. I do not see harm on essentially exporting all of them. If they are constant, they will be well compressed by Prometheus and consume almost no resources.

The more interesting question would be which metrics to keep as a separate metric/gauge, and which ones to put together with separate labels. I think it is useful to have some metrics grouped into single gauge with separate labels, if it does make sense to 1) visualise all of them at the same time, 2) make sum of them to see for a total of something.

From a quick look it looks like this only applies to few metrics.

  1. rss and rss_huge could be put together. Unless rss from cgroups already accounts for huge allocations.

  2. faults. it might make sense to use single counter with minor and major label values. But I do expect major to be what most people look at, with minor providing almost no information, and being significantly bigger often, leading to hiding of actual information. So that might still be beneficial to keep them separate.

As of the others metrics, I think they should be all separate (because doing aggregation over them doesn't make much sense), but I didn't look too deeply into this, and there might be some exceptions.

@SuperQ
Copy link
Contributor

SuperQ commented Feb 10, 2020

Thanks @baryluk

One thing to point out, is if we have metrics that have parts where they can be summed, and we also have a "total" exposed as another part, we try and avoid exposing those in Prometheus best practices. This allows cleaner use of sum(metric).

@baryluk
Copy link

baryluk commented Feb 10, 2020

Thanks @baryluk

One thing to point out, is if we have metrics that have parts where they can be summed, and we also have a "total" exposed as another part, we try and avoid exposing those in Prometheus best practices. This allows cleaner use of sum(metric).

Sounds good!

Following Prometheus best practices is a very good idea, to reduce amount of user confusion, but in general it is actually very logical, short and well written to avoid common mistakes with naming metrics and labels.

However regarding your comment, I can't find a general guideline in Prometheues best practices about not exporting total metric if sum(other_metric) provides same information (modulo minor atomicity differences and moving the aggregation cost from one process to another, and adding a bit extra of memory / storage).

I checked https://prometheus.io/docs/practices/naming/

cgroup/cgroup.go Outdated Show resolved Hide resolved
// path appends the given path elements to the filesystem path, adding separators
// as necessary.
func (fs FS) path(p ...string) string {
return filepath.Join(append([]string{string(fs.mountPoint)}, p...)...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cgroup/cgroup.go Outdated
Comment on lines 109 to 111
// if cgroupUnified != unifModeUnknown {
// return cgroupUnified, nil
// }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get rid of commented out code


switch fs.Type {
case cgroup2SuperMagic:
log.Debugf("Found cgroup2 on /sys/fs/cgroup/, full unified hierarchy")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is a package, consider not logging or allowing users to choose their own logging pkg, more info https://dave.cheney.net/2015/11/05/lets-talk-about-logging

cgroup/cgroup_test.go Outdated Show resolved Hide resolved

}

func testMain(wg *sync.WaitGroup) *http.Server {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider moving this into testing package.

Comment on lines +109 to +118
// b, err := ioutil.ReadAll(resp.Body)
// if err != nil {
// return nil, err
// }
// if err := resp.Body.Close(); err != nil {
// return nil, err
// }
// if want, have := http.StatusOK, resp.StatusCode; want != have {
// return nil, fmt.Errorf("want /metrics status code %d, have %d. Body:\n%s", want, have, b)
// }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove commented out code

if found != "service" {
t.Errorf("Bad unit name parsing. Wanted %s got %s", "service", found)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get rid of empty space

break
}
err = c.collectUnitCPUMetrics(*cgroupPath, conn, ch, unit)
if err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if err != nil {
if err != nil && parseUnitType(unit) != "socket" {

if err != nil {
// Most sockets do not have a cpu cgroupfs entry, but a few big ones do (notably docker.socket). Quiet down
// error reporting if error came from a socket
if parseUnitType(unit) != "socket" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if parseUnitType(unit) != "socket" {

@SuperQ
Copy link
Contributor

SuperQ commented Feb 14, 2020

@baryluk It's under the labels section of "writing exporters".

https://prometheus.io/docs/instrumenting/writing_exporters/#labels

Exports the ability to manually create a new cgroup filesystem struct,
as well as the various mount modes. Cleans up documentation
@hamiltont hamiltont changed the title Add more testing + memory stats from cgroups WIP: Add more testing + memory stats from cgroups Feb 24, 2020
@hamiltont
Copy link
Contributor Author

Retitled as WIP while I work through the feedback

@hectorhuertas
Copy link

@hamiltont thanks for all the amazing work you've done here. do you have plans to resume work on this PR? those cgroup rss metrics are very appealing :)

@hamiltont
Copy link
Contributor Author

hamiltont commented Apr 29, 2020 via email

@talyz
Copy link

talyz commented Sep 13, 2021

What's the status here? I've been trying this out for a while and it really makes the metrics much more useful. I had to add a small patch to ignore services with the RemainAfterExit property (talyz@c4f6128) in order for it to not spam my logs, but that's the only issue I've had with it so far.

@hamiltont
Copy link
Contributor Author

hamiltont commented Sep 13, 2021 via email

@talyz
Copy link

talyz commented Sep 14, 2021

I see :) Yes, it's really useful - great work!

Unfortunately, I don't think I can be of much help - I only know golang at a very basic level. I hope someone can, though.

@anarcat
Copy link

anarcat commented May 10, 2022

i filed bug #46 about how memory usage is reported incorrectly by this exporter. would this PR fix that problem?

@SuperQ
Copy link
Contributor

SuperQ commented Jul 19, 2022

After a very long time, we've moved this to prometheus-community. If you're still interested in working on this, please rebase this change against the latest commits.

@oseiberts11
Copy link

I tried to "rebase this change against the latest commits." but there were a lot of merging problems, but the result as it is (squashed, because there were simply way too many conflicts in the commit-by-commit rebase) is in #66 .

@SuperQ SuperQ closed this Dec 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Please read memory usage data from cgroups
8 participants