Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fragmentation is always 0 #2

Open
sdalu opened this issue Sep 8, 2023 · 14 comments
Open

Fragmentation is always 0 #2

sdalu opened this issue Sep 8, 2023 · 14 comments

Comments

@sdalu
Copy link

sdalu commented Sep 8, 2023

I don't have fragmentation information on the vdev=root but on the vdev belows.
In the following example fragmentation appear on vdev=root/raidz-0

zpool_stats,name=data,state=ONLINE,vdev=root alloc=3011023380480u,free=43151285116928u,size=46162308497408u,read_bytes=64605253632u,read_errors=0u,read_ops=7770490u,write_bytes=1044095565824u,write_errors=0u,write_ops=95370763u,checksum_errors=0u,fragmentation=0u 1694174753643743077
zpool_stats,name=data,state=ONLINE,vdev=root/raidz-0 alloc=3011023380480u,free=43151285116928u,size=46162308497408u,read_bytes=64604286976u,read_errors=0u,read_ops=7770476u,write_bytes=959581487104u,write_errors=0u,write_ops=85064917u,checksum_errors=0u,fragmentation=5u 1694174753643743077
zpool_stats,name=data,state=ONLINE,path=/dev/gpt/data:1,vdev=root/raidz-0/disk-0 alloc=0u,free=0u,size=0u,read_bytes=11241701376u,read_errors=0u,read_ops=1293875u,write_bytes=159979962368u,write_errors=0u,write_ops=14239436u,checksum_errors=0u,fragmentation=0u 1694174753643743077
zpool_stats,name=data,state=ONLINE,path=/dev/gpt/data:2,vdev=root/raidz-0/disk-1 alloc=0u,free=0u,size=0u,read_bytes=9667366912u,read_errors=0u,read_ops=1229915u,write_bytes=159665225728u,write_errors=0u,write_ops=14166177u,checksum_errors=0u,fragmentation=0u 1694174753643743077
zpool_stats,name=data,state=ONLINE,path=/dev/gpt/data:3,vdev=root/raidz-0/disk-2 alloc=0u,free=0u,size=0u,read_bytes=10595532800u,read_errors=0u,read_ops=1320407u,write_bytes=160059457536u,write_errors=0u,write_ops=14141508u,checksum_errors=0u,fragmentation=0u 1694174753643743077
zpool_stats,name=data,state=ONLINE,path=/dev/gpt/data:4,vdev=root/raidz-0/disk-3 alloc=0u,free=0u,size=0u,read_bytes=11593834496u,read_errors=0u,read_ops=1302155u,write_bytes=159870656512u,write_errors=0u,write_ops=14208789u,checksum_errors=0u,fragmentation=0u 1694174753643743077
zpool_stats,name=data,state=ONLINE,path=/dev/gpt/data:5,vdev=root/raidz-0/disk-4 alloc=0u,free=0u,size=0u,read_bytes=10487341056u,read_errors=0u,read_ops=1272938u,write_bytes=159750729728u,write_errors=0u,write_ops=14148958u,checksum_errors=0u,fragmentation=0u 1694174753643743077
zpool_stats,name=data,state=ONLINE,path=/dev/gpt/data:6,vdev=root/raidz-0/disk-5 alloc=0u,free=0u,size=0u,read_bytes=11018510336u,read_errors=0u,read_ops=1351186u,write_bytes=160255455232u,write_errors=0u,write_ops=14160049u,checksum_errors=0u,fragmentation=0u 1694174753643743077
zpool_stats,name=data,state=ONLINE,vdev=root/hole-1 alloc=0u,free=0u,size=0u,read_bytes=0u,read_errors=0u,read_ops=0u,write_bytes=0u,write_errors=0u,write_ops=0u,checksum_errors=0u,fragmentation=0u 1694174753643743077
zpool_stats,name=data,state=ONLINE,vdev=root/mirror-2 alloc=3121152u,free=16639877120u,size=16642998272u,read_bytes=966656u,read_errors=0u,read_ops=14u,write_bytes=84514078720u,write_errors=0u,write_ops=10305846u,checksum_errors=0u,fragmentation=0u 1694174753643743077
zpool_stats,name=data,state=ONLINE,path=/dev/gpt/data.slog:left,vdev=root/mirror-2/disk-0 alloc=0u,free=0u,size=0u,read_bytes=483328u,read_errors=0u,read_ops=7u,write_bytes=42257039360u,write_errors=0u,write_ops=5152923u,checksum_errors=0u,fragmentation=0u 1694174753643743077
zpool_stats,name=data,state=ONLINE,path=/dev/gpt/data.slog:right,vdev=root/mirror-2/disk-1 alloc=0u,free=0u,size=0u,read_bytes=483328u,read_errors=0u,read_ops=7u,write_bytes=42257039360u,write_errors=0u,write_ops=5152923u,checksum_errors=0u,fragmentation=0u 1694174753643743077
@bertiebaggio
Copy link
Owner

Thanks for the report. I can confirm am seeing something similar here:

grafana-zfs-fragmentation
(influx data explorer)

I can add a variable to the dashboard (like the hostname, poolname etc ones) to specify a vdev, unless you have a better suggestion? I will also check the original dashboard to see how they handle it, though I suspect the bug might be present there too.

@sdalu
Copy link
Author

sdalu commented Sep 8, 2023

I don't feel having a variable is the right choice.

Use fragmentation from telegraf input.zfs instead?
Fetch all fragmentation from root/[^/]+ and compute a global value ?
Or a bug in zpool_influxdb ?

@bertiebaggio
Copy link
Owner

I don't feel having a variable is the right choice.

I agree it's not an ideal choice. However in it's favour it does let the user specify exactly what vdev they mean in the case of >1 vdev in the pool:

zfs-vdevs

I think your second suggestion is the best approach for now- get the fragmentation values from the actual vdev[s] (ie not the disks, not root) and average† those.

†: I can see edge cases where the average produces misleading results, like a tiny-but-highly-fragmented vdev with a huge-but-unfragmented vdev in a pool, but it's a start for now.

@sdalu
Copy link
Author

sdalu commented Sep 8, 2023

That certainly not the right formula, but I would use the vdev size as a correction factor for the fragmentation, something like:
sum(frag_i * size_i) / sum(size_i)

@bertiebaggio
Copy link
Owner

Yup, that's a better approach; I'm not sure what data is available at the scope of that panel of the dashboard but it should be possible to pull it in (it's been some months since I looked at it)

@sdalu
Copy link
Author

sdalu commented Sep 8, 2023

I don't have errors on my pool, but should be check if it's the same with checksum/read/write errors

@bertiebaggio
Copy link
Owner

Good thinking, I checked one of those earlier in data explorer and it seems that the structures are indeed the same

@bertiebaggio
Copy link
Owner

bertiebaggio commented Sep 9, 2023

I have been working on this.

The good news it's i) it's pretty easy to filter to vdevs (ignoring root and individual disks) ii) it's possible to pull in vdev and pool size:

zpool_stats-influx-fragmentationandsize

The bad-for-now news is i) I need to learn joins / pivots / unions (no combination of those has worked so far) ii) join() support varies with influxDB version, and mine doesn't have the latest join package.

At some point I'll also need to figure out how best to test multiple vdevs, I might need to rig up something in a VM.

Update +2.5h:

The good:

influxdb-flux-fragmentation-works
(fragmentation is weighted)

The bad:

influxdb-grafana-flux-fragmentation-doesnotwork
(what works in influxdb does not seem to work in grafana)

@sdalu
Copy link
Author

sdalu commented Sep 10, 2023

Can you share your influxdb query?

@bertiebaggio
Copy link
Owner

bertiebaggio commented Sep 10, 2023 via email

bertiebaggio added a commit that referenced this issue Sep 16, 2023
Fragmentation was always being reported as 0% (issue #2) regardless of the
actual fragmentation.

This was due to the value generated by the zpool_influxdb for the pool being 0,
whereas vdevs have fragmentation.

So now we weight vdev fragmentations by pool size and sum to give an average:

    dev   size  frag  note
    ---   ----  ----  ----
 eg vdev1  80GB  50%  (contributes 4/5)
    vdev2  20GB  10%  (contributes 1/5)
    pool  100GB

 → overall 50 * 4/5 + 10 * 1/5 = 42% weighted pool fragmentation

Thanks to sdalu for reporting this.
@bertiebaggio
Copy link
Owner

Apologies for the delay in getting back to this- I lost the changes due to a browser segfault, thankfully I had an intermediate version saved elsewhere.

I've split the query into a step-by-step process so it's hopefully a bit easier to understand.

This is for the pool usage over time panel, for example:

// Note that we pull the overall pool size to weight vdev fragmentation
//
// eg vdev1 80GB  50%  (contributes 4/5)
//    vdev2 20GB  10%  (contributes 1/5)
//
// → overall 50 * 4/5 + 10 * 1/5 = 42% weighted pool fragmentation 
//
// see for more info: https://github.com/bertiebaggio/grafana-zfs-metrics/issues/2

import "strings"

KEEPCOLS = ${TSCOLS}

niceify_legend = ${NICEIFYLEG} // function defined as variable- changes underscores to spaces + title-cases

data =
    from(bucket: "${bucket}")
    |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
    |> filter(fn: (r) => r["name"] == "${poolname}")
    |> filter(fn: (r) => r["host"] == "${hostname}")
    |> filter(fn: (r) => r["_field"] == "alloc" or r["_field"] == "size" or r["_field"] == "free" or r["_field"] == "fragmentation")

sizes =
    data
      |> filter(fn: (r) => r["_field"] == "size")
      |> filter(fn: (r) => not exists r["path"]) // remove disks

// create a table to extract a scalar)
poolsizetable =
    sizes
      |> filter (fn: (r) => not strings.containsStr(v: r["vdev"], substr: "/"))
      |> map(fn: (r) => ({r with _field: "size", _field: "pool_size"}))
      |> keep(columns: ["_value"])

poolsize = (poolsizetable |> findRecord(fn: (key) => true, idx:0))._value
// jeez! https://github.com/influxdata/flux/issues/3522

vdevs =
    sizes
      |> filter(fn: (r) => strings.containsStr(v: r["vdev"], substr: "/"))
      |> group(columns: ["_time", "vdev"])

vdevwithpoolsize =
    vdevs
    |> set(key: "zz_pool_size", value: string(v: poolsize))
    |> drop(columns: ["_start", "_stop"])

fragmentation =
   data
      |> filter(fn: (r) => r["_field"] == "fragmentation")

fragnodisks =
    fragmentation
      |> filter(fn: (r) => not exists r["path"])
      |> keep(columns: ["vdev", "_field", "_value", "_time", "_measurement", "host", "name", "state"])
      |> group(columns: ["_time"])

vdevfrag =
    fragnodisks
      |> unique(column: "vdev")
      |> filter(fn: (r) => strings.containsStr(v: r["vdev"], substr: "/"))

sizeandfrag =
  join(tables: {f: vdevfrag, s: vdevwithpoolsize}, on: ["vdev", "_time"])
  // https://github.com/EMCECS/influx/blob/master/query/docs/SPEC.md#join
  |> keep(columns: ["_time", "vdev", "zz_pool_size", "_value_f", "_value_s"]) // _f = fragmention, _s = vdev size
  |> group(columns: ["_time"])

weightedfragmentationbyvdev =
  sizeandfrag // then rename fields
  |> map(fn: (r) => ({
      _time: r._time,
      fragmentation: r._value_f,
      size: r._value_s,
      vdev: r.vdev,
      pool_size: r.zz_pool_size,
      ratio: float(v: r._value_s) / float(v: r.zz_pool_size),
    })
  )
  |> map(fn: (r) => ({
        _time: r._time,
        fragmentation: r.fragmentation,
        vdev: r.vdev,
        ratio: r.ratio,
        weighted_fragmentation: float(v: r.fragmentation) * float(v: r.ratio),
  }))

poolfragmentation = //finally!
  weightedfragmentationbyvdev
  |> sum(column: "weighted_fragmentation")
  |> map(fn: (r) => ({
      _time: r._time,
      _field: "fragmentation",
      _value: r.weighted_fragmentation,
  }) )
  |> group(columns: ["_field"])


otherdata =
  data
  |> filter(fn: (r) => r["_field"] != "fragmentation")
  |> filter(fn: (r) => not exists r["path"]) // drop disks
  |> filter(fn: (r) => not strings.containsStr(v: r["vdev"], substr: "/")) // only pool
  |> keep(columns: ["_field", "_time", "_value"])

fragandother =
  union(tables: [otherdata, poolfragmentation])

NamedData = fragandother
  |> map(fn: (r) => ({_value:r._value, _time:r._time, _field:niceify_legend(leg: r["_field"])}))
  |> keep(columns: KEEPCOLS)
  |> yield()

It's not the prettiest, but it'll do for now. I've pushed this to a feature branch (fragmentation-update).

I would like to test it with multiple vdevs before I push out to main / grafana.com.

@bertiebaggio
Copy link
Owner

I wrote a query to duplicate and modify a vdev for the purposes of testing.

For those wanting to do the same, this should give you an idea of my approach:

// create a duplicate table with a modified vdev
EXTRASIZE = 5141689420068  // 5TB ish- one seventh of my own pool size

vdevdup =
  data
  |> group(columns: ["host"])
  // grab the parts that reference a vdev - in my data's case the only vdev was 'raidz-0'
  |> filter(fn: (r) => strings.containsStr(v: r["vdev"], substr: "raidz-0"))
  // rename to second vdev
  |> map(fn: (r) => ({r with vdev: strings.replace(v: r.vdev, t: "raidz-0", u: "raidz-1", i: 1)}))
  // change fragmentation (76 is double my data's fragmentation of 38) and vdev size
  |> map(fn: (r) => ({
    r with
      _value: if r._field == "fragmentation" then
        76
      else if r._field == "size" then
        EXTRASIZE
      else
        r._value
  }))

// <snip>

// Change pool size too. I did this in the poolsizetable table,
// but a better approach would have been to map() and update the actual data records
// along the lines of-
//
// r with _value: if r._field == "size" then r._value + EXTRASIZE
//                else r._value

poolsizetable =
    sizes
      |> filter (fn: (r) => not strings.containsStr(v: r["vdev"], substr: "/"))
      |> map(fn: (r) => ({r with _field: "size", _field: "pool_size"}))
      |> keep(columns: ["_value"])
      |> map(fn: (r) => ({r with _value: r._value + EXTRASIZE})) // bump up pool size

It's ugly but it works. This leaves 'free', 'alloc' and indeed 'size' unchanged and with nonsense values but it at least let me test.

This returned 42.7%, which is almost spot on 38 * (35/40) + 76 * (5/40)

I welcome further test cases or corrections, but I am happy enough that this works to the point of merging into main and publishing. This is definitely because I am satisfied with its robustness and not at all because working in Grafana's query editor is painful compared to a proper development environment~

@sdalu
Copy link
Author

sdalu commented Sep 19, 2023

Seems to fail to calculate fragmentation with a pool such as:

	NAME           STATE     READ WRITE CKSUM
	quick          ONLINE       0     0     0
	  gpt/quick:1  ONLINE       0     0     0
	  gpt/quick:2  ONLINE       0     0     0

@sdalu
Copy link
Author

sdalu commented Sep 19, 2023

perhaps vdev should be selected as: %r|^(?<poolname>[^/]+)/(?<vdev>[^/]+)$| (ruby regex)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants