-
Notifications
You must be signed in to change notification settings - Fork 0
Setting up EmrEtlRunner
HOME > SNOWPLOW SETUP GUIDE > Step 3: Setting up Enrich > Step 3.1: setting up EmrEtlRunner
Snowplow [EmrEtlRunner] emr-etl-runner is an application that parses the log files generated by your Snowplow collector and
- Cleans up the data into a format that is easier to parse / analyse
- Enriches the data (e.g. infers the location of the visitor from his / her IP address and infers the search engine keywords from the query string)
- Stores that cleaned, enriched data in S3
This guide covers how to setup EmrEtlRunner including scheduling it so that your event data is automatically fetched from the collector logs, processed and updated in your cleaned data store on S3. It is divided into three sections:
- Installation installation. You need to install EmrEtlRunner on your own server. It will interact with Amazon Elastic MapReduce and S3 via the Amazon API
- Usage usage. How to use EmrEtlRunner at the command line, to instuct it to process data from your collector
- [Scheduling] schedule. How to schedule the tool so that you always have an up to date set of cleaned, enriched data available for analysis
To start with [install] installation EmrEtlRunner.
Note: We recommend running all Snowplow AWS operations through an IAM user with the bare minimum permissions required to run Snowplow. Please see our IAM user setup page for more information on doing this.
Home | About | Project | Setup Guide | Technical Docs | Copyright © 2012-2013 Snowplow Analytics Ltd
HOME > SNOWPLOW SETUP GUIDE > Step 3: Setting up Enrich > Step 3.1: Setup EmrEtlRunner
- [Step 1: Setup a Collector] (setting-up-a-collector)
- [Step 2: Setup a Tracker] (setting-up-a-tracker)
- [Step 3: Setup Enrich] (setting-up-enrich)
- [3.1: Setup EmrEtlRunner] (setting-up-EmrEtlrunner)
- [3.1.1: install EmrEtlRunner] (1-Installing-EmrEtlRunner)
- [3.1.2: using EmrEtlRunner] (2-Using-EmrEtlRunner)
- [3.1.3: scheduling EmrEtlRunner] (3-scheduling-EmrEtlRunner)
- [3.2: Setup Scala Kinesis Enrich] (setting-up-scala-kinesis-enrich)
- [3.1: Setup EmrEtlRunner] (setting-up-EmrEtlrunner)
- [Step 4: Setup alternative data stores] (setting-up-alternative-data-stores)
- [Step 5: Analyze your data!] (Getting started analyzing Snowplow data)
Useful resources