Skip to content

Conversation

Finomosec
Copy link

@Finomosec Finomosec commented Apr 30, 2024

Added missing mdadm stats:

  • node_md_disks # added {state="down"}
  • node_md_sync_time_remaining (seconds)
  • node_md_blocks_synced_speed
  • node_md_blocks_synced_pct

Notes:

  • One drive was not being shown, as it was in state="down" (recovering), which was not reported in the output.
  • Using node_md_blocks_synced / node_md_blocks as progress percentage created wrong results on my system, as the total-blocks differed from the total-to-be-synced-blocks. This may be due to the raid-level being used (raid5).
md0 : active raid5 sdf1[4] sde1[1] sdc1[2] sdb1[0]
      14650718208 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [UUU_]
      [===================>.]  recovery = 99.9% (4882207424/4883572736) finish=7.8min speed=2908K/sec
      bitmap: 2/37 pages [8KB], 65536KB chunk

Finomosec and others added 7 commits May 1, 2024 11:51
changed sync_minutes_remaining to sync_time_remaining (in seconds)

Signed-off-by: Frederic <[email protected]>
fixed code formatting

Signed-off-by: Frederic <[email protected]>
@Finomosec
Copy link
Author

@SuperQ
I'm done for now.
Feel free to merge it at any time.

Comment on lines +114 to +119
blockSyncedSpeedDesc = prometheus.NewDesc(
prometheus.BuildFQName(namespace, "md", "blocks_synced_speed"),
"current sync speed (in Kilobytes/sec)",
[]string{"device"},
nil,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem necessary, we should be able to compute this from something like rate(node_md_blocks_synced[1m]) * <blocksize>.

Suggested change
blockSyncedSpeedDesc = prometheus.NewDesc(
prometheus.BuildFQName(namespace, "md", "blocks_synced_speed"),
"current sync speed (in Kilobytes/sec)",
[]string{"device"},
nil,
)

Copy link
Author

@Finomosec Finomosec May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is usefull. It is the CURRENT speed, as it is shown in /stat/proc/mdstat
I have it showing in my Grafana board.
Plus i guess <blocksize> is not included in the data, so it would require additional configuration for each md-device.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the groundwork has already been laid for #1085, and we probably should not add any new parsing functionality relating to /proc/mdstat.

@SuperQ
Copy link
Member

SuperQ commented May 7, 2024

Maybe instead of exposing the sync percent, we should expose the "TODO" blocks value. This way the completion ratio can be correctly calculated as node_md_blocks_synced / node_md_blocks_synced_todo.

Frederic and others added 2 commits May 9, 2024 11:23
renamed pct to percent

Co-authored-by: Ben Kochie <[email protected]>
Signed-off-by: Frederic <[email protected]>
added unit "seconds"

Co-authored-by: Ben Kochie <[email protected]>
Signed-off-by: Frederic <[email protected]>
@Finomosec
Copy link
Author

Finomosec commented May 9, 2024

Maybe instead of exposing the sync percent, we should expose the "TODO" blocks value. This way the completion ratio can be correctly calculated as node_md_blocks_synced / node_md_blocks_synced_todo.

That was my first idea, too, but the data-source (https://github.com/prometheus/procfs/blob/master/mdstat.go) does not (yet) capture/expose this value.

Also node_md_blocks_synced_todo is not a good name. todo sound like remaining, which is not correct.
Maybe to_be_synced would suffice.

But hey! We could calculate it using blocks_synced and the percentage.
What do you think, should we do this?

... but it might be imprecise, especially for low percentage values plus it might yield slightly different results over time, which would be kind of akward.

So maybe better not after all.

I added a request to add it:
prometheus/procfs#636

@Finomosec Finomosec requested a review from SuperQ May 9, 2024 09:39
@discordianfish
Copy link
Member

Yeah I agree, let's add the TODO blocks to procfs

@SuperQ
Copy link
Member

SuperQ commented May 14, 2024

Released updated procfs: https://github.com/prometheus/procfs/releases/tag/v0.15.0

@JoaoPPCastelo
Copy link

Hi @Finomosec

Found your dashboard and was interested to add it to my Grafana, but saw that this PR is still open. Any plans to continue the missing work and add the changes to the node exporter?

Thanks

@Finomosec
Copy link
Author

Finomosec commented Apr 30, 2025

Hello @JoaoPPCastelo , thanks for your interest.

It seems i did miss the updated procfs :-)
I will see what i can do. Maybe this week.

Greetings

p.s. i was using my branched version in the meantime

@sklyfr
Copy link

sklyfr commented Jul 10, 2025

Hi, I see you have made (imo) best grafana chart for RAID.
Asking this for merge, due to upkeep associated with forked nodeexporter outside helm.

@discordianfish
Copy link
Member

@Finomosec Sorry for dropping the ball on this. Can you update this branch?

@discordianfish
Copy link
Member

See failing checks

@Finomosec
Copy link
Author

I fixed the problems in prometheus/procfs#737, please merge this first so this PR can work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants