Releases: pystorm/streamparse
streamparse 3.0.0.dev0
This is the first developer preview release of streamparse 3.0. It has not been tested extensively in production yet, so we are looking for as much feedback as we can get from users who are willing to test it out.
You can install this release via pip with pip install --pre streamparse==3.0.0.dev0
. It will not automatically install because it's a pre-release.
⚠️ API Breaking Changes ⚠️
- Topologies are now specified via a Python Topology DSL instead of the Clojure Topology DSL. This means you can/must now write your topologies in Python! Components can still be written in any language supported by Storm, of course. (Issues #84 and #136, PR #199, #226)
- The deprecated
Spout.emit_many
method has been removed. (pystorm/pystorm@004dc27) - As a consequence of using the new Python Topology DSL, all Bolts and Spouts that emit anything are expected to have the
outputs
attribute declared. It must either be a list ofstr
orStream
objects, as described in the docs. - We temporarily removed the
sparse run
command, as we've removed all of our Clojure code, and this was the only thing that had to still be done in Clojure. (Watch issue #213 for future developments)
Features
- Added
sparse slot_usage
command that can show you how balanced your topologies are across nodes. This is something that isn't currently possible with the Storm UI on its own. (PR #218) - Can now specify
ssh_password
inconfig.json
if you don't have SSH keys setup. Storing your password in plaintext is not recommended, but nice to have for local VMs. (PR #224, thanks @motazreda) - Now fully Python 3 compatible (and tested on up to 3.5), because we rely on fabric3 instead of plain old fabric now. (4acfa2f)
- Now remove
_resources
directory after JAR has been created.
Other Changes
- Now rely on pystorm package for handling Multi-Lang IPC between Storm and Python. This library is essentially the same as our old
storm
subpackage with a few enhancements (e.g., the ability to use MessagePack instead of JSON to serialize messages). (Issue #174, Commits aaeb3e9 and 1347ded) - All Bolt, Spout, and Topology-related classes are all available directly at the
streamparse
package level (i.e., you can just dofrom streamparse import Bolt
now) (Commit b9bf4ae). sparse kill
now will kill inactive topologies. (Issue #156)- All examples now use the Python DSL
- The Kafka-JVM example has been cleaned up a bit, so now you can click on Storm UI log links and they'll work.
streamparse 2.1.4
This minor release adds support for specifying ui.port
in config.json
to make the sparse stats
and sparse worker_uptime
commands work when ui.port
is not set to the default 8080.
streamparse 2.1.3
Fix a race condition in TicklessBatchingBolt
that could cause a tuple to be part of more than one batch. (PR #193)
streamparse 2.1.2
This release fixes an issue where reraise
wasn't being imported from six
in bolt.py
(commit d743188).
streamparse 2.1.1
This bugfix release just fixes an issue where TicklessBatchingBolt
was crashing when trying to handle exceptions in TicklessBatchingBolt.run()
(commit 48bace6).
streamparse 2.1.0
Features
- Added back an updated version of the pre-2.0
BatchingBolt
that did not rely on tick tuples calledTicklessBatchingBolt
. This is useful in cases where you know your spout will not replay tuples after a topology shutdown. Because Storm is not guaranteed to continue to send tick tuples when the topology is shutting down, the standardBatchingBolt
may have a batch of tuples waiting to be processed (that were never ACKed) sitting in it when the topology shuts down. When you resubmit and start it back up, those tuples will be lost unless the spout saves state between runs (which is pretty uncommon). With theTicklessBatchingBolt
this is much less likely to happen because we use a timer thread which is independent of Storm, which will continue to execute even while the topology is shutting down. As long as the time you give Storm to shutdown is greater than the time it takes to process the last batch, your last batch will always be fully processed. (PR #191) - Can now specify virtualenv command-line arguments in
config.json
viavirtualenv_flags
(issue #94, PR #159) - Added support for pulling out
source->stream->fields
mapping with Storm 0.10.0+ (commit 61f163d)
Bug fixes
- Restored
--version
argument tosparse
that was accidentally removed in previous release. (commit 48b6de7) - Fixed missing comma in
setup.py
(issue #160, commit bde3cc3) - Fixed issue where an empty
tasks.py
file (for invoke) was necessary to make fabric pre-submit hooks work. (issue #157, commit a10c478) - Fixed issue where
run
andsubmit
couldn't parse email addresses and git hashes properly (PR #189, thanks @eric7j, commit 8670e3f) - Fixed issue where fabric
env
wasn't being populated whenuse_virtualenv
was False (commit a10c478) - Fixed issue where updating virtualenvs would hang when VCS path changed. (commits e923a3c and 3e27cf0)
Documentation
- Added documentation that explains how parallelism and workers work in Storm and streamparse. (issue #163, PR #165)
- Added documentation about tuple routing and direct streams. (issue #162, commit fff05a0)
- Fixed some inconsistencies in the capitalization of
Bolt
andSpout
in our docs (issue #164, PR #166) - Embedded PyCon 2015 presentation video in docs. (PR #161)
- Added some more FAQs to docs. (PR #88, thanks @konarkmodi)
Depedencies
streamparse 2.0.2
This release fixes an issue where tick tuples were not being acked in the new BatchingBolt
implementation (e38a024). It also updates the documentation for BatchingBolt
to indicate that you can enable tick tuples on a per-bolt basis in your topology file by adding :conf {"topology.tick.tuple.freq.secs", 1}
to your python-bolt-spec
arguments (d1c405a).
streamparse 2.0.1
This bugfix release fixes an issue where reading non-ASCII messages on Python 2.7 would cause a UnicodeDecodeError
(#154). Thanks to @daTokenizer for reporting this!
streamparse 2.0.0
This release adds a bunch of new functionality (e.g., additional subcommands), but also changes some things that were not used by a lot of people in backward-incompatible ways.
⚠️ API BREAKING CHANGES ⚠️
BatchingBolt
now uses tick tuples instead of a separate timer thread. This is an API-breaking change, asBatchingBolt.secs_between_batches
is nowBatchingBolt.ticks_between_batches
. You also will need to make sure you run your Storm topology withtopology.tick.tuple.freq.secs
set to how frequently you want the ticks to occur. Read the docs for more details. (#125, #137)- streamparse fabric and invoke tasks have been moved to
sparse
sub-commands: fab remove_logs
➡️sparse remove_logs
fab tail_logs
➡️sparse tail
fab activate_env
is no longer necessary, as all commands that need the fabric environment modified do this automatically.fab create_or_update_virtualenvs
➡️sparse update_virtualenv
(note the case change, since this only every worked on a single virtualenv at a time)inv jar_for_deploy
➡️sparse jar
inv list_topologies
➡️sparse list
inv kill_topology
➡️sparse kill
inv run_local_topology
➡️sparse run
inv submit_topology
➡️sparse submit
inv tail_topology
➡️sparse tail
inv visualize_topology
➡️sparse visualize
inv prepare_topology
has been removed because the commands that relied on it (sparse run
,sparse submit
, andsparse jar
) all callstreamparse.util.prepare_topology
automatically.- The
streamparse.ext
package has been removed and so have thestreamparse.ext.fabric
andstreamparse.ext.invoke
modules. streamparse.ext.util
➡️streamparse.util
- Users should no longer do
from streamparse.ext.fabric import *
andfrom streamparse.ext.invoke import *
in their projects'fabfile.py
andtasks.py
files.pre_submit
andpost_submit
hooks will be executed automatically even without this.
Major enhancements
sparse run
now runs indefinitely by default (#122)- Added
Bolt.process_tick(tup)
method for processing tick tuples (#116, #124) - Added
sparse worker_uptime
andsparse stats
commands for getting information about running Storm topologies and their workers. (#17, #52) --ackers
and--workers
can now be specified as separate arguments tosparse submit
andsparse run
, instead of just using--par
. (#74, #97)Bolt.emit_many()
is now deprecated and will be removed in streamparse 3.0. Please just callBolt.emit()
repeatedly instead. (#66)- Added lots of documentation about how topologies work and how to get started with streamparse. (#26, #103)
- Added conda recipe template for building a streamparse conda package. (#105)
- SSH tunnels are no longer required for
kill
,list
, andsubmit
commands (#96, #98, #112). env.use_ssh_config
isTrue
by default now (#54)- Can now deploy/build simple JARs in addition to Uber-JARs. This speeds up
sparse submit
for pure Python projects. (#106) - Added
sparse jar
,sparse remove_logs
, andsparse update_virtualenv
commands to replace old Fabric and Invoke tasks.
Minor enhancements
- Removed dependency on
docopt
and switched to usingargparse
for command-line arguments. Now sub-commands all have their own detailed--help
switches (e.g.,sparse run --help
) andsparse --help
will list all of the available commands with a brief description of what they do. (#115, #152) - Added first pieces of support for a Python DSL for defining topologies (#84) as part of a grander vision to move away from Clojure (#136). Please note that this cannot actually be used yet, because the utility to take the Python DSL and then generate something Storm understands out of it has not been written yet.
- Overhauled unit tests to separate simplify IPC testing (#41, #47).
- Added documentation on using an unofficial version of Storm (#142)
- Added support for Tox (#128)
- Updated spouts and bolts to allow Python tuples to be emitted. (#119)
- Switched to using Travis Docker containers for building (#90)
- Made update of virtualenvs optional by seeing if requirements.txt exists (#60)
- Created a new
storm
subpackage, which will be split off into its own package (pystorm) for version 3.0 of streamparse. This contains all of the IPC/Multi-Lang related code. In the future streamparse will just be a collection of utilities for managing Storm topologies/clusters. - Moved a lot of code from the
Spout
andBolt
classes into theComponent
parent class to cut down on code duplication.
Bugfixes
- Fixed multithreaded emitting (#101, #133)
sparse
commands that uselein
underneath now display output fromlein
immediately. (#109)- Fixed typo in config name to get maxbytes (#110)
- We now reset
Botl.current_tups
even when receiving a heartbeat (#107) Spout.emit_many()
works again (#144)sparse tail
tails all machines now. (#104)
Contributors for this release (by number of commits)
- Dan Blanchard (@dan-blanchard)
- Keith Bourgoin (@kbourgoin)
- Viktor Shlapakov (@vshlapakov)
- Curtis Vogt (@omus)
- Andrew Montalenti (@amontalenti)
- Tim Hopper (@tdhopper)
- Daniel Hodges (@hodgesds)
- Arturo Filastò (@hellais)
- Wieland Hoffmann (@mineo)
- Aiyesha Ma (@Aiyesha)
- Cody Wilbourn (@codywilbourn)
Thanks to all our contributors!
Storm 0.9.3 support
This release adds support for Storm 0.9.3 in addition to a number of bug fixes.
New and updated examples available.
- Adds: Support for Storm 0.9.3 heartbeats (#82)
- Adds:
StormHandler
class for logging to Storm - Adds:
--wait
timeout tosparse kill
andspare submit
- Adds: "kafka-jvm" example -- mixed language topology (JVM/clojure + Python) with JVM-based Kafka Spout
- Adds: "wordcount-on-redis" example
- Updates: wordcount example
- Fixes: #64: `sparse tail fails when logs are missing
- Fixes: "flush" method to LogStream
- Fixes: #100: SSH tunnels are not always closed
- Fixes: -o option string issues
- Documentation updates