Skip to content
Phil Hofmann edited this page Feb 23, 2016 · 7 revisions

Seriously, at VR we take Monitoring seriously!

Local I: Monit

The local Monit configuration is managed by Cdist and reads a monit.conf which is deployed with the rails app.

Local II: Icinga

We use Nagios' Fork Icinga for monitoring our servers.

Tomáš Pospíšek did the initial implementation, so he knows best how to configure that application.

To include a new check to Icinga, do the following:

Example of including a Swap check

  1. The check needs to be configured for checking the Icinga Server (currently Staging)
  2. The check needs to be configured for checking the Remote Servers (currently only Live)
  3. The check needs to be executed for checking the Icinga Server (currently Staging)
  4. The check needs to be executed for checking the Remote Servers (currently only Live)

Configuration

New Files

  • local check (staging)
  • /etc/nagios-plugins/config/swap.cfg
# 'check_swap' command definition
define command{
        command_name    check_swap
        command_line    /usr/lib/nagios/plugins/check_swap -w '$ARG1$' -c '$ARG2$'
        }
  • remote check (live)
  • /etc/nagios-plugins/config/swap_ssh.cfg
# 'check_swap_ssh' command definition
define command{
        command_name    check_swap_ssh
        command_line    /usr/lib/nagios/plugins/check_by_ssh -H '$HOSTADDRESS$' -C "/usr/lib/nagios/plugins/check_swap -w '$ARG1$' -c '$ARG2$'"
        }

Execution

  • local check (staging)
    • /etc/icinga/objects/services_icinga.cfg
define service {
        hostgroup_name debian-servers
        service_description Swap
        check_command check_swap
        check_command           check_swap!75!40
        use generic-service
        notification_interval 0
}
  • remote check (live)
    • /etc/nagios-plugins/config/voicerepublic.cfg
define service {
        host_name       production
        service_description Swap
        check_command           check_swap_ssh!75!40
        use generic-service
        notification_interval 0
}

Remote I: Uptime Robot

TODO munen: add some details here

Remote II: Monit on s126.de (phil)

Monit config file /etc/monit/conf.d/monit_vr.conf

check host voicerepublic.com with address voicerepublic.com
  if failed icmp type echo then alert
  if failed port 80 proto http then alert
  if failed port 443 then alert
  if failed port 1935 then alert
  if failed port 9292 then alert
  alert [email protected]

check host staging.voicerepublic.com with address staging.voicerepublic.com
  if failed icmp type echo then alert
  if failed port 80 proto http then alert
  if failed port 443 then alert
  if failed port 1935 then alert
  if failed port 9292 then alert
  alert [email protected]

Also uses https://gist.github.com/branch14/9976231f2a9430b75f68 to push notifications on Slack.

Munin

On s126.de /etc/munin/munin.conf holds the following configuration

[staging.voicerepublic.com]
    #address 77.109.138.13                                                                                                                                                                                                                                                
    #address 77.109.150.163                                                                                                                                                                                                                                               
    #address 136.243.52.230                                                                                                                                                                                                                                               
    address 136.243.197.189
    use_node_name yes

[voicerepublic.com]
    #address 77.109.150.133                                                                                                                                                                                                                                               
    #address 136.243.52.231                                                                                                                                                                                                                                               
    address 136.243.209.119
    use_node_name yes