Skip to content

Setting up EmrEtlRunner

Brandon Amos edited this page Jan 17, 2014 · 1 revision

HOME > SNOWPLOW SETUP GUIDE > Step 3: Setting up Enrich > Step 3.1: setting up EmrEtlRunner

Snowplow [EmrEtlRunner] emr-etl-runner is an application that parses the log files generated by your Snowplow collector and

  1. Cleans up the data into a format that is easier to parse / analyse
  2. Enriches the data (e.g. infers the location of the visitor from his / her IP address and infers the search engine keywords from the query string)
  3. Stores that cleaned, enriched data in S3

This guide covers how to setup EmrEtlRunner including scheduling it so that your event data is automatically fetched from the collector logs, processed and updated in your cleaned data store on S3. It is divided into three sections:

  1. Installation installation. You need to install EmrEtlRunner on your own server. It will interact with Amazon Elastic MapReduce and S3 via the Amazon API
  2. Usage usage. How to use EmrEtlRunner at the command line, to instuct it to process data from your collector
  3. [Scheduling] schedule. How to schedule the tool so that you always have an up to date set of cleaned, enriched data available for analysis

To start with [install] installation EmrEtlRunner.

Note: We recommend running all Snowplow AWS operations through an IAM user with the bare minimum permissions required to run Snowplow. Please see our IAM user setup page for more information on doing this.

HOME > SNOWPLOW SETUP GUIDE > Step 3: Setting up Enrich > Step 3.1: Setup EmrEtlRunner

Setup Snowplow

  • [Step 1: Setup a Collector] (setting-up-a-collector)
  • [Step 2: Setup a Tracker] (setting-up-a-tracker)
  • [Step 3: Setup Enrich] (setting-up-enrich)
    • [3.1: Setup EmrEtlRunner] (setting-up-EmrEtlrunner)
      • [3.1.1: install EmrEtlRunner] (1-Installing-EmrEtlRunner)
      • [3.1.2: using EmrEtlRunner] (2-Using-EmrEtlRunner)
      • [3.1.3: scheduling EmrEtlRunner] (3-scheduling-EmrEtlRunner)
    • [3.2: Setup Scala Kinesis Enrich] (setting-up-scala-kinesis-enrich)
  • [Step 4: Setup alternative data stores] (setting-up-alternative-data-stores)
  • [Step 5: Analyze your data!] (Getting started analyzing Snowplow data)

Useful resources

Clone this wiki locally