#BirdWatch
BirdWatch is a reactive web application for visualizing a stream of live Tweets making use of AngularJS, BootStrap, Crossfilter, D3.js, ElasticSearch and Play Framework (in alphabetical order).
EDIT 04/2014: For comparing frameworks, there is now also a ReactJS version in addition to the AngularJS version. A detailed blog post will follow shortly. In this version there is a trend-aware bar chart built entirely with React, without relying on D3.js. That chart will be the topic of yet another article soon.
Here is an overview of the information flow in the system:
A Play application connects to the Twitter Streaming API and receives all Tweets that include at least one of a set of configured words. Twitter caps this to 1% of the FireHose, which basically means that the application will not receive more than one percent of all Tweets at any given moment of time. This limit still falls in the range of millions of Tweets per day; a well-defined area of interest should comfortably fit in.
Incoming Tweets are inserted into an ElasticSearch index where they are almost instantly available for querying. Each Tweet is also compared with what is called a percolation query, a pre-registered query for each connected client. Every thus pre-registered query is run on every new Tweet. For every Tweet on which the query matches the client will immediately be informed by means of Server Sent Events (SSE).
AngularJS clients hold a local data copy of all the Tweets they have asked for using the ElasticSearch query syntax, with 'AND' being the default operator. Every query is not only run on the existing Tweets in the ElasticSearch index but is also registered as a percolation query. A user selectable amount of previous Tweets is loaded, and then every new Tweet for which the query matches is appended immediately, allowing Tweets analysis in near-real-time. Queries are bookmarkable, making it easy to frequently look at interesting and potentially complex queries.
Client-side analysis of the (live) search result is performed using Crossfilter.
A live version of this application is available. This instance listens to a bunch of software and data related terms, see the application.conf file for details. Interesting queries on this data set include:
Please feel free to contribute, pull requests are happily accepted. I use this project to study the technologies involved and I would appreciate learning better ways of doing things.
A detailed description of the application can be found on my blog.
##Setup
Play Framework. You need a JVM on your machine. On a Mac the easiest way is to then install play using HomeBrew:
brew install play
If brew was installed on your machine already you want to run this first:
brew update
brew upgrade
You also need ElasticSearch:
brew install elasticsearch
You then run
elasticsearch
BEWARE: this application has recently been upgraded to work with ElasticSearch v1.0.0. There have been breaking changes in the Percolation Query API (for the better, for sure) but because of these changes the latest version will not work with previous versions of ElasticSearch. If for some reason you cannot run v1.0.0 yet, you can check out an earlier commit of this application.
And inside the application folder:
play run
Twitter API consumer key and access token are required to consume the Twitter Streaming API. You need to create a Twitter application and store keys and secrets in a twitter.conf file, using the commented out section in the application.conf as a template.
That should be all there is to it before you can run your own instance listening on localhost:9000. This will open the ReactJS version. For the AngularJS version open localhost:9000/angular/
##Configuration
Inside conf/application.conf
you can change terms that the application subscribes to from the Twitter Streaming API. The application will then receive all tweets that contain one or more of the set of terms if the total number of tweets that match are no more than 1% of all tweets that Twitter is receiving at any time. Otherwise, the delivery will be capped at 1%. Since February 2014 you can now also subscribe to tweets from a list of twitter IDs, either in addition to the terms or exclusively (in that case: application.topics=""
).
You may want to remove Google Analytics script in main.scala.html or adapt the Analytics setting in the application.conf according to your own needs.
###Streaming API limitations Please be aware that only one connection to the Twitter Streaming API is possible from any one public IP address. Starting a connection to the Streaming API will potentially end other connections from the same network if NAT is in place using the same public IP address. Access from mobile networks is discouraged and most likely won't work.
This software is licensed under the Apache 2 license, quoted below.
Copyright © 2013 Matthias Nehlsen.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this project except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0.
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.