The purpose of this walkthrough is to create a Dataflow streaming pipeline to read XML encoded messages from PubSub:
This pipeline is developed with the Beam Python SDK
- Please refer to the Python codebase
If you wish to execute this code within a Qwiklabs environment, you can use this Stream messages from Pub/Sub by using Dataflow
Best practice recommends a Dataflow job to:
- Utilize a worker service account to access the pipeline's files and resources
- Minimally necessary IAM permissions for the worker service account
- Minimally required Google cloud services
Beam may, for redundancy purposes, sometimes process messages more than once and message ordering is not guaranteed by design. However, in order and exactly once processing of the messages is a possible when using PubSub and Dataflow tegether. If this is a solution requirement please refer to the following Google Cloud Blog's entry: After Lambda: Exactly-once processing in Google Cloud Dataflow, Part 1