➡️ Complete this end-to-end tutorial on guides.snowflake.com
This demo shows how to auto-ingest streaming and event-driven data from Twitter into Snowflake using Snowpipe. By completing this demo you will have built a docker image containing a python application that listens and saves live tweets; those tweets are uploaded into Snowflake using AWS S3 as a file stage.
The lessons learned in demo can be applied to any streaming or event-driven data source.
The core topics covered in this demo include:
- Data Loading: Load Twitter streaming data in an event-driven, real-time fashion into Snowflake with Snowpipe
- Semi-structured data: Querying semi-structured data (JSON) without needing transformations
- Secure Views: Create a Secure View to allow data analysts to query the data
- Snowpipe: Overview and configuration
You will need:
- git
- Docker Desktop
- Twitter Developer account (free)
- AWS account (12-month free tier)
clone this repository locally
git clone https://github.com/Snowflake-Labs/demo-twitter-auto-ingest
navigate to the repository you just cloned:
cd demo-twitter-auto-ingest
Use your text editor of choice to edit the following files:
Dockerfile
(lines 9 to 16)0_setup_twitter_snowpipe.sql
(lines 23 to 25)
As you will be able to see in the files, you will also need to specify your AWS S3 bucket (where the data will be stored) and a default search keyword.
- While in your
demo-twitter-auto-ingest
directory run:
docker build . -t snowflake-twitter
This command builds the Dockerfile
in the current directory, and tags the built image as snowflake-twitter
.
The last two lines of the output should look similar to the following:
Successfully built c1c0b7262436
Successfully tagged snowflake-twitter:latest
Note: In the above example, c1c0b7262436 is the image id - yours will likely be different.
$ docker run --name <YOUR_CONTAINER_NAME> snowflake-twitter:latest <YOUR_TWITTER_KEYWORD>
Example (searching for #wednesdaymotivation):
$ docker run --name twitter-wednesdaymotivation snowflake-twitter:latest wednesdaymotivation
At this point you should be able to see the tweets coming in... (every .
represents two tweets)
- Log into your Snowflake demo account and load the 0_setup_twitter_snowpipe.sql script (edited at point 2).
- Execute the script one statement at a time.
- Make sure to configure event notifications in AWS S3 as described here.
Once you have finished with the setup, it's important that you stop your container in order not to reach your Twitter API rate limits.
Go back to Terminal, open a new Terminal tab (you can use the shortcut ⌘T) and execute the following command:
docker stop <YOUR_CONTAINER_NAME>
Note: the container has a "safety" timeout of 15 minutes.