Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No VM disk statistics #11

Open
fumok opened this issue Aug 8, 2016 · 6 comments
Open

No VM disk statistics #11

fumok opened this issue Aug 8, 2016 · 6 comments

Comments

@fumok
Copy link

fumok commented Aug 8, 2016

Hi,

I'm testing a setup to monitor kvm with ganglia, everything seems fine except for VM disk statistics (empty graph).
Some detail:

  • RHEL 7.2 3.10.0-327.el7.x86_64
  • libvirt-1.2.17-13.el7.x86_64
  • ganglia-3.7.2-2.el7 (from epel)
  • hsflowd-2.0.1-1.x86_64.rpm (from here)

Can you help me? Thanks a lot.

@sflow
Copy link
Owner

sflow commented Aug 8, 2016

This is the relevant code:
https://github.com/sflow/host-sflow/blob/master/src/Linux/mod_kvm.c#L114-L174

Is is missing the capacity/allocation/available data from virDomainGetBlockInfo(), or the reads/writes/errors counter data from virDomainBlockStats(), or both?

The binary is compiled with "-g -O2" so you should be able to set a breakpoint in gdb and debug like this:
sudo yum install gdb
sudo service hsflowd stop
sudo gdb hsflowd
gdb> set args -ddd
gdb> b mod_kvm.c:169
(say yes it will be loaded later)
gdb> r

If you'd rather build from sources and add print statements then you'll need gcc and something like this:
sudo yum install libvirt-devel libxml2-devel
make FEATURES="kvm ovs"

@fumok
Copy link
Author

fumok commented Aug 9, 2016

Thanks for supporting me.
After further digging, I think the disk statistics are populated only if vm uses default storage pool.

I've created a new test vm with a cqow image in default storage pool, and now, dumping data with sflowtool, I can see valorized those counters:
vdsk_capacity
vdsk_allocation
vdsk_available
vdsk_rd_req
vdsk_rd_bytes
vdsk_wr_req
vdsk_wr_bytes
vdsk_errs

Incidentally now the break point works (b mod_kvm.c:169). Before, with not standard storage pool vm, it had never been reached.

EDIT: Perhaps I was too hasty in my diagnosis. I think that the discriminant is the type of storage pool. With qcow images counters are ok, with lvm do not.

EDIT2: I can confirm, disk counters work only with qcow backing storage.

@sflow
Copy link
Owner

sflow commented Aug 9, 2016

Sounds like it's not recognizing the lvm storage when it parses the XML. If you could send the output of "virsh dumpxml " that would be helpful.

@fumok
Copy link
Author

fumok commented Aug 9, 2016

hsflowd in debug mode returns:

dbg1: attribute dev
dbg1: disk.dev=hda
dbg1: ignoring readonly device
dbg1: attribute dev
dbg1: attribute dev
dbg1: disk.dev=vda
dbg1: attribute dev
dbg1: attribute dev
dbg1: disk.dev=vdb
dbg1: attribute dev
dbg1: attribute dev
dbg1: disk.dev=vdc
dbg1: attribute dev
dbg1: attribute dev
dbg1: disk.dev=vdd
dbg1: attribute dev
dbg1: attribute dev
dbg1: disk.dev=vde
dbg1: attribute dev
dbg1: attribute dev
dbg1: disk.dev=vdf
dbg1: attribute dev
dbg1: attribute dev
dbg1: disk.dev=vdg

and this is right: one virtual cdrom (hda) and seven virtual lvm disk (from vda to vdg)

Follow xml of test vm:

test.txt

@sflow
Copy link
Owner

sflow commented Sep 6, 2016

It looks like we might be able to pick up the disk stats a different way...

It depends on how a qemu VM is treated with respect to Linux cgroups. For example, I have a KVM system running Ubuntu 14.4 with a VM called "test6" and it looks like I can get disk stats like this:

cat /sys/fs/cgroup/blkio/machine/test6.libvirt-qemu/blkio.throttle.*

Can you substitute "test6" with "test" on your system and get numbers this way? Do the numbers look as though they are specific to that VM?

We already pick up cgroup stats this way in hsflowd's mod_docker, so this could be quite straightforward to add. Just need to understand how it appears in different versions of KVM. I'll try Fedora 24 next and see what appears there.

@fumok
Copy link
Author

fumok commented Sep 7, 2016

I think cgroup it's the right way. On RHEL7.2 (kernel 3.10.0-327.el7.x86_64) the relevant counter can be found in:

/sys/fs/cgroup/blkio/machine.slice/{weird_name}/

{weird_name} is something like this (in my case the hostname contain a hyphen):

machine-qemu\x2d{host_hostname}\x2dgo\x2d{guest_name}\x2da.scope

Most part of blkio.throttle.* are not valorized, only blkio.throttle.io_service_bytes and blkio.throttle.io_serviced.

I've the same situation on a Centos 7.2, with much more newer libvirtd and qemu-kvm. So i can suppose that only matters host's kernel version. The cgoup approach is used also by others interesting project. Take a look, for example, at https://github.com/firehol/netdata/

P.S.

Systemd systems use different names, see https://libvirt.org/cgroups.html

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants