Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNMP counter overflows #1

Open
waddles opened this issue Jul 8, 2015 · 15 comments
Open

SNMP counter overflows #1

waddles opened this issue Jul 8, 2015 · 15 comments

Comments

@waddles
Copy link

waddles commented Jul 8, 2015

Great work on developing these modules but I seem to be overflowing the 32bit counters for my zpool info:

root@rubicon:~# zpool list
NAME   SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
data    29T  3.59T  25.4T         -     3%    12%  1.00x  ONLINE  -

root@rubicon:~# snmptable -u chameleon6287188769 -c chameleon6287188769 -v 2c rubicon zfsPoolStatusTable
SNMP table: BAYOUR-COM-MIB::zfsPoolStatusTable

 zfsPoolName zfsPoolSize zfsPoolAlloc zfsPoolFree zfsPoolCap zfsPoolDedup zfsPoolHealth zfsPoolAltRoot zfsPoolUsedBySnaps zfsPoolUsed
        data           0    171798691  1717986918         12         1.00        online              -                  0   680399994
BAYOUR-COM-MIB::zfsPoolStatusTable: WARNING: More columns on agent than in MIB

Seems ok coming out of the perl script:

OID_BASE.5.1.2.1
string
data
OID_BASE.5.1.3.1
integer
31885837205504
OID_BASE.5.1.4.1
integer
3947246743715.84
OID_BASE.5.1.5.1
integer
27927595345510.4
OID_BASE.5.1.6.1
integer
12
OID_BASE.5.1.7.1
string
1.00
OID_BASE.5.1.8.1
integer
4
OID_BASE.5.1.9.1
string
-

Any suggestions?

@FransUrbo
Copy link
Owner

The MIB was 'thrown together' without much regard (but 'some') to what the values actually where, so I'm not overly surprised by this. I haven't been running it myself in a while, because I have stability issues because of 'load sensitivity' on my primary (bad SAS/SATA card/driver).

The Integer32 value on some/all of these needs to be updated with the factual size of the value. This require going through the code in ZFS/ZoL..

I'll see what I can do, but if you have concrete changes, feel free to open a pull request.

@waddles
Copy link
Author

waddles commented Jul 8, 2015

Ok so I changed the MIB to use Integer64 for the values in zfsPoolStatusTable but Net-SNMP still does not return them properly. Then I found this patch https://sourceforge.net/p/net-snmp/patches/737/ but it does not appear to have been applied. I am running Ubuntu Vivid (15.04) with Net-SNMP 5.7.2 but even latest upstream doesn't look like it handles it properly.

@FransUrbo
Copy link
Owner

Then I don't know off-hand what to do :(

@waddles
Copy link
Author

waddles commented Jul 9, 2015

On a side note, I love the clean code in https://github.com/calmh/solaris-extra-snmp/blob/master/zfs-snmp although it depends on kstat and doesn't appear to keep persistency, but that could be fixed fairly easily.

I think a better way of getting the zpool usage (instead of using zpool iostat then converting it to a somewhat rough estimate by multiplying by powers of 1024) is to use zfs list -p <pool>

# zpool iostat
               capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
data        3.74T  25.3T     41    129  4.46M  3.79M
# zfs list -p
NAME                      USED           AVAIL          REFER  MOUNTPOINT
data             3577557345060  23702699187420          30260  /data
data/atlassian      2227441270  23702699187420     2227441270  /data/atlassian
data/backup      3570098684040  23702699187420  3570098684040  /data/backup
data/bamboo          234328990  23702699187420      234328990  /data/bamboo
data/confluence     1980329210  23702699187420     1980329210  /data/confluence
data/crowd            47012470  23702699187420       47012470  /data/crowd
data/jira           2189892170  23702699187420     2189892170  /data/jira
data/postgresql      434886040  23702699187420      434886040  /data/postgresql
data/stash           268084910  23702699187420      268084910  /data/stash
# zfs list -p data
NAME           USED           AVAIL  REFER  MOUNTPOINT
data  3577557345060  23702699187420  30260  /data

Total capacity is obviously the sum of all 3 values

@FransUrbo
Copy link
Owner

That still don't help unfortunately. 3577557345060 + 23702699187420 + 30260 = 27280256562740 which is still much, much higher than the maximum value of a (unsigned) 32-bit int (which is 4,294,967,295). The signed int is half that...

The maximum value of a (unsigned) 64-bit int is 18,446,744,073,709,551,615 (which would allow for 18445 petabyte :), which is plenty high. A signed 64-bit int is half that. Don't know what the Integer64 would be, signed or unsigned, but either way, that would do it. But if the snmpd doesn't support it, it's not much I can do :(

Discussion actually jogs some distant memories though. It feels like I've had this discussion with myself but couldn't solve it…

https://en.wikipedia.org/wiki/Integer_(computer_science)#Common_integral_data_types

@FransUrbo
Copy link
Owner

I've been trying to do something about this in https://github.com/FransUrbo/snmp-modules/tree/int64_size-free, but it didn't work as I expected.

@FransUrbo
Copy link
Owner

With this patch on the 5.7.3+dfsg-1 version, I got it to work. I'm currently trying to figure out how to implement this in the MIB.

http://sourceforge.net/p/net-snmp/mailman/message/34285720/

@FransUrbo
Copy link
Owner

I took your recommendation to use zfs get to get the exact sizes, instead of the "human readable" values one gets from zpool list and "translate" that into bytes. There was a slight mismatch there. On my system, there was a 29MB discrepancy.

I'm still trying to figure out how to fix the MIB. BUT, the code in the int64_size-free branch will now correctly return a integer64 instead of a integer32:

$ snmpget localhost zfsPoolSize zfsPoolSize
BAYOUR-COM-MIB::zfsPoolSize.1 = Opaque: Int64: 8256506880
BAYOUR-COM-MIB::zfsPoolSize.1 = Opaque: Int64: 8256506880

The fact that it returns a Opaque: Int64 and not a Integer64 is the current problem. Not quite sure how to fix that just yet. I have some test MIB entries in that branch, but they don't seem to be working. I think I'm roughly on the right track here. There's something about the https://tools.ietf.org/html/draft-perkins-bigint-00 I need to figure out.

@waddles
Copy link
Author

waddles commented Jul 13, 2015

https://tools.ietf.org/html/draft-perkins-opaque-01 might help you understand more.

Looking at that patch and the file it applies to, that section of code is all about unsigned longs which means it should be returning a type of ASN_OPAQUE_U64 and have a definition of 'Unsigned64'. That then leaves no 'integer64' (signed) in which case the #ifdef probably also needs another clause added to handle signed 64-bit integers. The implementation would be the same for all 3 if I'm not wrong.

The difference between Counter64, Integer64 and Unsigned64 is that Counters don't decrease and of course the interpretation of +/-. For our purposes we really want Unsigned64.

See also https://sourceforge.net/p/net-snmp/code/ci/1b4ca14972d39d61a93bb0e3e4eea76795bedb89/tree/include/net-snmp/library/asn1.h line 80 and onwards.

@FransUrbo
Copy link
Owner

Tripple checking and actually LOOKING at the code more closely this time, you're probably right. Using a unsigned instead of signed in the code, because we don't need negative values,

In practice though, it shouldn't really matter right now. We can return a 9ZB value (instead of a 18ZB value with unsigned). That still isn't enough to account for the total size of a ZFS pool :). But it should be enough for almost everyone. For now. To be able to return the value of the maximum size of a ZFS pool (256ZB), we need a 128bit value!

However, although you're right in that, the problem is currently how to incorporate that into the MIB. I have added both a I64 and a U64, but neither work as expected.

But I'm starting to wonder if it matter if it returns a Integer64 instead of Opaque: Int64. The value is what we need, not the type…

Could you try the int64_size-free branch and see if it works for you?

@FransUrbo
Copy link
Owner

I've taken your suggestions for net-snmp and walked (not ran :) with it - http://sourceforge.net/p/net-snmp/mailman/message/34291537/.

However, my two patches isn't included in the web archive for some reason.

https://gist.github.com/FransUrbo/a2bfee606ffda0b7b81e
https://gist.github.com/FransUrbo/b891f94b1100f2a3b251

This gives me:

# for i in {1..6}; do snmpget localhost .1.3.6.1.4.1.22222.42.$i.0; done
SNMPv2-SMI::enterprises.22222.42.1.0 = INTEGER: 123456
SNMPv2-SMI::enterprises.22222.42.2.0 = Opaque: Int64: 9223372036854775806
SNMPv2-SMI::enterprises.22222.42.3.0 = Counter32: 123456
SNMPv2-SMI::enterprises.22222.42.4.0 = Counter64: 9223372036854775806
SNMPv2-SMI::enterprises.22222.42.5.0 = Gauge32: 4294967294
SNMPv2-SMI::enterprises.22222.42.6.0 = Opaque: UInt64: 18446744073709551614

which seems to just fine (except that instead of a UInt32 (or whatever it should have been), I get a Gauge32). No biggie, but it looks strange...

@FransUrbo
Copy link
Owner

Don't seem to need any special stuff in the MIB. Just made the zfsPoolSize and zfsPoolSize and Integer64 (although smiling complains about this) and return a unsigned64 value from the agent and this all seems to be working just fine!

# snmpget localhost zfsPoolSize zfsPoolSize
BAYOUR-COM-MIB::zfsPoolSize.1 = Opaque: UInt64: 8256506880
BAYOUR-COM-MIB::zfsPoolSize.1 = Opaque: UInt64: 8256506880
# snmpwalk localhost zfsPoolStatusTable
BAYOUR-COM-MIB::zfsPoolStatusIndex.1 = INTEGER: 1
BAYOUR-COM-MIB::zfsPoolStatusIndex.2 = INTEGER: 2
BAYOUR-COM-MIB::zfsPoolName.1 = STRING: rpool
BAYOUR-COM-MIB::zfsPoolName.2 = STRING: rpool 2
BAYOUR-COM-MIB::zfsPoolGUID.1 = STRING: 4977845871582736322
BAYOUR-COM-MIB::zfsPoolGUID.2 = STRING: 3787144349319647945
BAYOUR-COM-MIB::zfsPoolSize.1 = Opaque: UInt64: 8256506880
BAYOUR-COM-MIB::zfsPoolSize.2 = Opaque: UInt64: 8256506880
BAYOUR-COM-MIB::zfsPoolAlloc.1 = INTEGER: 132096
BAYOUR-COM-MIB::zfsPoolAlloc.2 = INTEGER: 111616
BAYOUR-COM-MIB::zfsPoolFree.1 = Opaque: UInt64: 8256374784
BAYOUR-COM-MIB::zfsPoolFree.2 = Opaque: UInt64: 8256395264
BAYOUR-COM-MIB::zfsPoolCap.1 = INTEGER: 0
BAYOUR-COM-MIB::zfsPoolCap.2 = INTEGER: 0
BAYOUR-COM-MIB::zfsPoolDedup.1 = STRING: 1.00
BAYOUR-COM-MIB::zfsPoolDedup.2 = STRING: 1.00
BAYOUR-COM-MIB::zfsPoolHealth.1 = INTEGER: online(4)
BAYOUR-COM-MIB::zfsPoolHealth.2 = INTEGER: online(4)
BAYOUR-COM-MIB::zfsPoolAltRoot.1 = STRING: -
BAYOUR-COM-MIB::zfsPoolAltRoot.2 = STRING: -
BAYOUR-COM-MIB::zfsPoolUsedBySnaps.1 = INTEGER: 0
BAYOUR-COM-MIB::zfsPoolUsedBySnaps.2 = INTEGER: 0
BAYOUR-COM-MIB::zfsPoolUsed.1 = INTEGER: 282624
BAYOUR-COM-MIB::zfsPoolUsed.2 = INTEGER: 111616
# snmptable -CB localhost zfsPoolStatusTable         
SNMP table: BAYOUR-COM-MIB::zfsPoolStatusTable

 zfsPoolName         zfsPoolGUID zfsPoolSize zfsPoolAlloc zfsPoolFree zfsPoolCap zfsPoolDedup zfsPoolHealth zfsPoolAltRoot zfsPoolUsedBySnaps zfsPoolUsed
       rpool 4977845871582736322  8256506880       132096  8256374784          0         1.00        online              -                  0      282624
     rpool 2 3787144349319647945  8256506880       111616  8256395264          0         1.00        online              -                  0      111616
# 

@FransUrbo
Copy link
Owner

zfsPoolAlloc also needs to be a UInt64, just-in-case...

@FransUrbo
Copy link
Owner

Same code on a host that doesn't have a patched Net-SNMP:

# snmpget localhost zfsPoolSize zfsPoolSize zfsPoolAlloc
BAYOUR-COM-MIB::zfsPoolSize.1 = Gauge32: 3961545728
BAYOUR-COM-MIB::zfsPoolSize.1 = Gauge32: 3961545728
BAYOUR-COM-MIB::zfsPoolAlloc.1 = Gauge32: 51384320
# snmpwalk localhost zfsPoolStatusTable
BAYOUR-COM-MIB::zfsPoolStatusIndex.1 = INTEGER: 1
BAYOUR-COM-MIB::zfsPoolName.1 = STRING: rpool
BAYOUR-COM-MIB::zfsPoolGUID.1 = STRING: 11847949639043149139
BAYOUR-COM-MIB::zfsPoolSize.1 = Gauge32: 3961545728
BAYOUR-COM-MIB::zfsPoolAlloc.1 = Gauge32: 51384320
BAYOUR-COM-MIB::zfsPoolFree.1 = Gauge32: 3910161408
BAYOUR-COM-MIB::zfsPoolCap.1 = INTEGER: 0
BAYOUR-COM-MIB::zfsPoolDedup.1 = STRING: 1.00
BAYOUR-COM-MIB::zfsPoolHealth.1 = INTEGER: online(4)
BAYOUR-COM-MIB::zfsPoolAltRoot.1 = STRING: -
BAYOUR-COM-MIB::zfsPoolUsedBySnaps.1 = INTEGER: 0
BAYOUR-COM-MIB::zfsPoolUsed.1 = INTEGER: 153585254
# snmptable -CB localhost zfsPoolStatusTable
SNMP table: BAYOUR-COM-MIB::zfsPoolStatusTable

 zfsPoolName          zfsPoolGUID zfsPoolSize zfsPoolAlloc zfsPoolFree zfsPoolCap zfsPoolDedup zfsPoolHealth zfsPoolAltRoot zfsPoolUsedBySnaps zfsPoolUsed
       rpool 11847949639043149139  3961545728     51384320  3910161408          0         1.00        online              -                  0   153585254
# zfs get -H -oproperty,value -p used,available,referenced rpool
used    51384320
available       8205103104
referenced      25600
# expr 51384320 + 8205103104 + 25600 ; echo 3961545728
8256513024
3961545728
# zpool list
NAME    SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
rpool  7.94G  49.1M  7.89G         -      -     0%  1.00x  ONLINE  -

@FransUrbo
Copy link
Owner

Querying a unpatched server from a OSX Lion:

$ snmptable -CB unpatched-server zfsPoolStatusTable
SNMP table: BAYOUR-COM-MIB::zfsPoolStatusTable

 zfsPoolName          zfsPoolGUID zfsPoolSize zfsPoolAlloc zfsPoolFree zfsPoolCap zfsPoolDedup zfsPoolHealth zfsPoolAltRoot zfsPoolUsedBySnaps zfsPoolUsed
       rpool 11847949639043149139  3961545728     51362816  3910182912          0         1.00        online              -                  0   153585254

And to the patched server:

$ snmptable -CB patched-server zfsPoolStatusTable 
SNMP table: BAYOUR-COM-MIB::zfsPoolStatusTable

 zfsPoolName         zfsPoolGUID                                      zfsPoolSize                                     zfsPoolAlloc                                      zfsPoolFree zfsPoolCap zfsPoolDedup zfsPoolHealth zfsPoolAltRoot zfsPoolUsedBySnaps zfsPoolUsed
       rpool 4977845871582736322 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00           0         1.00        online              -                  0      613376
     rpool 2 3787144349319647945 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  2D 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00           0         1.00        online              -                  0      437248

So I guess the patch still needs some work. Or possibly the MIB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants