Skip to content

Commit

Permalink
Merge pull request #1 from scentbird/master
Browse files Browse the repository at this point in the history
Update readme and minor changes
  • Loading branch information
daigotanaka authored May 13, 2020
2 parents c40e317 + 9bef49f commit 6474a9c
Show file tree
Hide file tree
Showing 3 changed files with 48 additions and 31 deletions.
65 changes: 36 additions & 29 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,7 @@
Reverse ETL: Extract data from BigQuery tables.

This is a [Singer](https://singer.io) tap that produces JSON-formatted data
following the [Singer
spec](https://github.com/singer-io/getting-started/blob/master/SPEC.md).
following the [Singer spec](https://github.com/singer-io/getting-started/blob/master/SPEC.md).

This tap:

Expand All @@ -19,45 +18,53 @@ This tap:
(originally found in the [Google API docs](https://googlecloudplatform.github.io/google-cloud-python/latest/bigquery/usage.html))

1. Use [this wizard](https://console.developers.google.com/start/api?id=bigquery-json.googleapis.com) to create or select a project in the Google Developers Console and activate the BigQuery API. Click Continue, then Go to credentials.
1. On the **Add credentials to your project** page, click the **Cancel** button.
1. At the top of the page, select the **OAuth consent screen** tab. Select an **Email address**, enter a **Product name** if not already set, and click the **Save** button.
1. Select the **Credentials** tab, click the **Create credentials** button and select **OAuth client ID**.
1. Select the application type **Other**, enter the name "Singer BigQuery Target", and click the **Create** button.
1. Click **OK** to dismiss the resulting dialog.
1. Click the Download button to the right of the client ID.
1. Move this file to your working directory and rename it *client_secrets.json*.
2. On the **Add credentials to your project** page, click the **Cancel** button.
3. At the top of the page, select the **OAuth consent screen** tab. Select an **Email address**, enter a **Product name** if not already set, and click the **Save** button.
4. Select the **Credentials** tab, click the **Create credentials** button and select **OAuth client ID**.
5. Select the application type **Other**, enter the name "Singer BigQuery Tap", and click the **Create** button.
6. Click **OK** to dismiss the resulting dialog.
7. Click the Download button to the right of the client ID.
8. Move this file to your working directory and rename it *client_secrets.json*.

### Step 2: Configure

Create a file called config.json in your working directory, following config.sample.json. The required parameters are the start_datetime and at least one stream (one bigquery table) to copy.

### Step 3: Install and Run

First, make sure Python 3 is installed on your system or follow these installation instructions for Mac or Ubuntu.

tap-bigquery can be run with any Singer Target. As example, let use target-redshift

These commands will install target-redshift and tap-bigquery with pip. Export google client secrets file to auth in Google cloud. Run tap-bigquery in discovery mode to let it create json schema file and then run them together, piping the output of tap-bigquery to target-redshift:

```
> pip install tap-bigquery pipelinewise-target-redshift
> export GOOGLE_APPLICATION_CREDENTIALS="./client_secret.json"
> tap_bigquery -c config.json -d > catalog.json
> tap_bigquery -c config.json --catalog catalog.json --start_datetime '2020-05-01T00:00:00Z' --end_datetime '2020-05-01T01:00:00Z'
```

### Authentication

It is recommended to use `target-bigquery` with a service account.
* Download the client_secrets.json file for your service account, and place it on the machine where `target-bigquery` will be executed.
It is recommended to use `tap-bigquery` with a service account.
* Download the client_secrets.json file for your service account, and place it on the machine where `tap-bigquery` will be executed.
* Set a `GOOGLE_APPLICATION_CREDENTIALS` environment variable on the machine, where the value is the fully qualified path to client_secrets.json

It should be possible to use the oAuth flow to authenticate to GCP as well:
* `target-bigquery` will attempt to open a new window or tab in your default browser. If this fails, copy the URL from the console and manually open it in your browser.
* `tap-bigquery` will attempt to open a new window or tab in your default browser. If this fails, copy the URL from the console and manually open it in your browser.
* If you are not already logged into your Google account, you will be prompted to log in.
* If you are logged into multiple Google accounts, you will be asked to select one account to use for the authorization.
* Click the **Accept** button to allow `target-bigquery` to access your Google BigQuery table.
* Click the **Accept** button to allow `tap-bigquery` to access your Google BigQuery table.
* You can close the tab after the signup flow is complete.

The data will be written to the table specified in your `config.json`.


## Development Note

Python tap tutorial
https://github.com/singer-io/getting-started/blob/master/docs/RUNNING_AND_DEVELOPING.md#a-python-tap

BigQuery Python example
https://cloud.google.com/bigquery/docs/quickstarts/quickstart-client-libraries#client-libraries-install-python

Use Cookie cutter
https://github.com/cookiecutter/cookiecutter

To create the skeleton for BigQuery tap
https://github.com/singer-io/singer-tap-template


---
## Original repo
https://github.com/anelendata/tap_bigquery

Copyright © 2019 Anelen Data
5 changes: 3 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

setup(
name="tap_bigquery",
version="0.1.0",
version="0.1.4",
description="Singer.io tap for extracting data",
author="Stitch",
url="http://singer.io",
Expand All @@ -12,7 +12,8 @@
install_requires=[
"singer-python>=5.0.12",
"requests",
"google-cloud-bigquery==1.16.0"
"google-cloud-bigquery==1.16.0",
"attrs==19.3.0"
],
entry_points="""
[console_scripts]
Expand Down
9 changes: 9 additions & 0 deletions tap_bigquery/sync_bigquery.py
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,13 @@ def do_sync(config, stream):
}
query = """SELECT {columns} FROM {table} WHERE 1=1""".format(**keys)

state = {
"type": "STATE",
"value": {
metadata["table"]: { metadata.get("datetime_key") : end_datetime}
}
}

for f in metadata.get("filters", []):
query = query + " AND " + f
if keys.get("datetime_key") and keys.get("start_datetime"):
Expand Down Expand Up @@ -137,3 +144,5 @@ def do_sync(config, stream):
"schema": stream.stream,
"record": record}
print(json.dumps(out_row))

print(json.dumps(state))

0 comments on commit 6474a9c

Please sign in to comment.