Skip to content

A tool for executing scripts when ZooKeeper nodes change.

License

Notifications You must be signed in to change notification settings

bdacode/twitcher

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

================
=== Twitcher ===
================

1. License
2. Overview
3. Installation
4. Configuration Language
4.1 Examples
4.2 Things to Avoid
5. Known Issues
6. Future Features
7. Change Log

1. License
==========

This software is licensed under the Apache 2.0 license. Please see LICENSE
in the top level of this git repository for more information.

2. Overview
===========

Twitcher is a minimal tool used to run scripts when a znode in ZooKeeper
changes. The intent is to allow various system tasks to take place on a
watched model rather than via polling.

The best description of Twitcher is to highlight an example that it solves
very well. Lets assume we are facing a problem where we need a single git
repository to be checked out and up to date on 100 machines in a cluster. The
typical solution to this is to add a cron job that performs a git pull every
minute. This can be problematic as your git server has to endure the load of
100 servers performing a pull every minute. You can configure Twitcher on all
machines to watch /git/head on your ZooKeeper instance. When that znode
changes Twitcher can run git pull. Add a post-commit script in your main
git repository that pokes that file and you will be able to fetch updates
within few seconds of a commit on all machines while not requiring constant
fetching when no changes are being made.

3. Installation
===============

The default installation of Twitcher places the twitcher binary (bin/twitcher)
in /usr/sbin and the twitcher module (twicher/*) in the site-packages
directory which is often /usr/lib/python2.6/sites-packages/twitcher.

The configs are read recursively from /etc/twitcher while twitcher is running.

Note that Twitcher uses inofity to update when files change so it can
automatically reload them. This is used to allow quick adding and removing
of configs without restarting the server binary. If a config fails to parse
then it will not be updated in the running binary and a log message will be
written.

By default Twitcher uses syslog under daemon as the default logging method.

4. Configuration Language
=========================

The configuration language for Twitcher is actually just python. Each
configuration must register a znode, and an action for each watch it wants.
This is done via a RegisterWatch function which takes several arguments.

RegisterWatch(): Creates a watch that will run a script when a Zookeeper node
                 is updated.
    znode: This is the znode in ZooKeeper that is to be watched.
    action: This is a function or lambda that will perform an action when
            the znode is modified. The most common use here is to call Exec()
            which is a function documented later.
    pipe_stdin: If True then the contents of 'znode' will be piped to the stdin
                of the processes running action. Default is True.
    run_on_load: If True then the action will be executed when Twitcher starts.
                 Without this your action may miss updates.
                 The default is True.
    run_mode: This defines how Twitcher will react when 'znode' is updated
              while it is running 'action' for a previous update. The optional
              modes are:
                QUEUE: This will queue the watch until the currently running
                       'action' finishes, then run 'action' with the most
                       recent contents. If several watches are received while
                       'action' is running then only the last update will
                       be executed.
                DISCARD: Ignore the update, don't run the script, but
                         re-register the watch.
                PARALLEL: Run the script in parallel for every update received.
              The default run mode is QUEUE.
    uid: The user id to run the process as (string or int). The default is to
         run as root.
    gid: The group id to run the process as (string or int). The default is to
         run as root.
    notify_signal: This is the signal that will be send to the spawned
                   process that is currently running when a new update has
                   been received. By default this is not used, and it will only
                   work if the run_mode is QUEUE.
    timeout: The number of seconds before any spawned process should be sent
             a SIGTERM. Five seconds later the process will be sent a SIGKILL.
    description: This is a text description of the watch which will be used
                 for logging. The default is to name watches after the file
                 they are configured in.

Exec(): Returns a lambda that will execute a given command when run.
    command: If this is a string then the command will be invoked in a
             shell interpreter. If its a list then it will be executed
             exactly as specified.

By default the configuration files should be placed in /etc/twitcher and
should use an extension of ".twc".


4.1. Examples
=============

In this example config a node called '/twitcher/example1' will be watched for
updates, and when it changes it will execute
"date >> /tmp/twitcher_example1.out".

RegisterWatch(
    znode='/twitcher/example1',
    action=Exec('date >> /tmp/twitcher_example1.out')
    )

As another example we can run a python function rather than executing a
command. For this example we will execute the function example2() every time
'/twitcher/example2' is updated.

def example2():
  open('/tmp/twitcher_example2.out', 'w').write('testing')

RegisterWatch(
    znode='/twitcher/example2',
    action=example2
    )

Keep in mind that the action function is run in a process that has been forked
off from the main Twitcher instance. As such the following example
DOES NOT WORK AS EXPECTED. The output will always be 1 since the increment
happens in the child process which exits after doesnotwork() finishes.

x = 0
def doesnotwork():
  global x
  x += 1
  open('/tmp/twitcher_badexample.out', 'w').write(x)

RegisterWatch(
    znode='/twitcher/doesnotwork',
    action=doesnotwork
    )

4.2 Things to Avoid
===================

Any action run by Twitcher is expected to by idempotent. Since the script may
be executed at any time and can be triggered rapidly it is expected that
each script does basic health and sanity checking before performing expensive
or destructive actions.

Even if the script is in QUEUE mode there is a possibility that it may be run
twice. If Twitcher is restarted while the script is running it will then
restart the script when Twitcher restarts which leaves two running at the
same time. In order to prevent this your script should use file based sigil
or other method to verify exclusivity.

5. Known Issues
===============

None at the moment.

6. Future Features
==================

Failure handling on the child process: Allow the config to specify what should
be done in the case where the script fails (returns non zero). Some examples
include "retry x times" "fail but keep watch" "fail and stop watching"

7. Change Log
=============

Release: 1.5

2010/11/08: Make connection re-establish all existing watches.

Release: 1.4

2010/10/25: Fix client session expiration recovery.

2010/8/3: Documentation fixes.

Release: 1.3

2010/8/2: Added notify_signal and timeout support and fix a few bugs.

Release: 1.2, Open source under the Apache 2.0 license.

2010/7/15: Added uid/gid support

Release: 1.1

2010/6/24: Fixed a bug that allowed child processes to get ignored forever.

Release 1.0

2010/6/19: Initial review.

About

A tool for executing scripts when ZooKeeper nodes change.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published