Skip to content

yangwhale/raycom

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Apache Beam Sample

Build Status

You can use this master branch as a skeleton java project

Proposed streaming pipeline

IMPORTANT: in the sample code, assume the pubsub message is csv text encoded in utf-8

pubsub -> dataflow -> GCS(avro, csv for both data & deadleter) + BigQuery

Current pipeline DAG

Quick start

Prerequisits

Java dev environment

  • JDK8+
  • Maven

This branch is focusing on streaming, so the sample subscribes messages from Pubsub. It's easy to switch to KafkaIO in beam. But the quickest way to produce some dummy data then send to Pubsub for fun is by using this project.

If you use the GCP Play Ground to produce the pubsub message, there isn't much to do. Simply update the run shell script, make sure you have the corresponding permissions to manipulate the GCP resources. Then

./run df

FAQ

  1. Do I need to setup the BigQuery table in advance?

A: No. The application will create for you, and append to existing table by default.

  1. How to control the permissions?

A: This project is currently relying on the service account specified by the GOOGLE_APPLICATION_CREDENTIALS environment variable. Consult here for details.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 94.1%
  • Shell 4.4%
  • TSQL 1.5%