Skip to content

Major release: streamparse runner, component API updates, Python logging support and more

Compare
Choose a tag to compare
@msukmanowsky msukmanowsky released this 25 Aug 16:24

This is a major release that introduces several potentially breaking API
changes, hence the major version advancement to 1.0.0. Please read the changes
below as well as our migration wiki
guide before upgrading.

streamparse runner

  • Added a new runner for topology components and as a result, a new way to
    define topologies (extension to the existing Clojure DSL). More info can be
    found in the docs and in the 1.0.0 migration guide.
  • Quickstart projects have been updated to now have nested directories
    "src/bolts" and "src/spouts".

Component API updates

  • Added auto_anchor, auto_ack and auto_fail flags to base Bolt class. See
    docs for detailed descriptions of these flags and migration page for info on
    how to safely upgrade your bolts.
  • BasicBolt is now deprecated.
  • Class var BatchingBolt.SECS_BETWEEN_BATCHES renamed to
    secs_between_batches since this isn't a constant, just a setting.
  • Spout.emit and Bolt.emit now returns a list of task IDs a tuple was
    emitted to unless need_task_ids kwarg is set to False.
  • Spout.emit_many and Bolt.emit_many now return a two-dimensional list of
    the task IDs each emit tuple was sent to unless need_task_ids kwarg is set
    to False.
  • BatchingBolt does not return task IDs due to concurrency issues that will
    be addressed in a future release.
  • Spouts and bolts now have the following instance variables which users are
    free to use. These are initialized before the call to initialize():
    • _topology_name - name of the topology when submitted to Storm.
    • _task_id - task ID of the current component in the topology.
    • _component_name - the name of the current component as defined in the
      Clojure definition (e.g. "my-bolt").
    • _debug - the topology.debug setting (configured using the sparse
      --debug flag).
    • _storm_conf - the entire config dict recieved on initial handshake.
    • _context - the entire context dict receieved on initial handshake.
  • BatchingBolt threads (main and _batcher) now have more descriptive names.

IPC

  • A lot of code cleanup around how we read from and write to stdin and stdout.
    Much more stable here but no breaking changes for users since they should
    never interface with these methods directly.

Logging

  • Full support for Python logging has now been added. Lots of info in the docs
    on this and the migration guide. Logging config settings added to new
    quickstart projects.
  • print statements are now properly sent to the component's log file - feel
    free to add print statements for handy debugging or better yet, create a
    logger for nicely formatted messages.
  • Full support for Storm log levels once STORM-414 is merged in.

Administration

  • Added pre and post submit hooks in fabric and invoke for topologies to
    enable users to run arbitrary code (e.g. send IRC message) after topologies
    are submitted (info in docs).\
  • sparse tail command now requires -n <topology> argument as it will tail
    only the logs for a specific topology and environment.
  • Added remove_logs fab task which users can optionally hook up to a
    pre_submit hook to clear out Python logs.

Testing

  • Better test support added for all our components, more improvements to come
    here.