Google BigQuery Crawler

This crawler can be used to find information about the dependencies of all BigQuery projects that are available to you.

Setup

Install the Google Cloud SDK and set it up
Install Python dependencies from Pipfile with pipenv install
Start the Pipenv shell with pipenv shell
Execute main.py with py main.py (or python main.py / python3 main.py)

Get dependencies from BigQuery

After executing the script you get a prompt to enter some information for your search.

Enter project id: Where do you want to search? all means, you will search in all projects that are available for you. You can also enter a specific project id.
Enter the mode:
1. schema: This will search through all schemas
2. query: This will search through all view-queries
3. schquery: This will search through all scheduled query-queries
4. schquerytab: This will search through all scheduled query-destination tables
Search string: Enter a string you want to search for
Name of the result file: How should the name of the generated file be? Defaults to result.json. The script always generates a JSON file.

Examples

Example 1

As an example we want to get all scheduled queries that use the table dataset_1.table_1 in the project project_id_1.

Enter project id: project_id_1
Enter mode: schquery
Enter search string: dataset_1.table_1
Name of result file: table1Results.json

The result gets generated in a JSON file called table1Results.json in the root directory and looks like this for this example:

{
  "mode": "schquery",
  "search_string": "dataset_1.table_1",
  "results": [
    {
      "project_id": "project_id_1",
      "resources": [
        {
          "type": "SCHEDULED QUERY",
          "source": "sq_1",
          "destination": "dataset_2.t_1",
          "number_of_appearances": 1
        },
        {
          "type": "SCHEDULED QUERY",
          "source": "sq_2",
          "destination": "dataset_1.table_3",
          "number_of_appearances": 1
        }
      ]
    }
  ]
}

In the JSON you see the field results. This is the interesting part since here you can find the results of the search. The first result is:

{
  "type": "SCHEDULED QUERY",
  "source": "sq_1",
  "destination": "dataset_2.t_1",
  "number_of_appearances": 1
}

type is the type of the resource
source is the source table, view or scheduled query
destination is the destination table the scheduled query writes to
number_of_appearances is how often this result was found in the search process

→ The scheduled query sq_1 uses in its query the table dataset_1.table_1 and writes the outcome of this query to the table dataset_2.t_1.

Example 2

As a new example we want to get all views that use the table dataset_2.table_2 in the project project_id_1.

Enter project id: project_id_1
Enter mode: query
Enter search string: dataset_2.table_2
Name of result file: <Nothing entered>

The generated result.json (Since no file name was entered):

{
  "mode": "query",
  "search_string": "dataset_2.table_2",
  "results": [
    {
      "project_id": "project_id_1",
      "resources": [
        {
          "type": "VIEW",
          "source": "dataset_2.v_xyz",
          "destination": null,
          "number_of_appearances": 2
        },
        {
          "type": "VIEW",
          "source": "dataset_3.v_abc",
          "destination": null,
          "number_of_appearances": 3
        },
        {
          "type": "VIEW",
          "source": "dataset_3.v_lmn",
          "destination": null,
          "number_of_appearances": 1
        },
        {
          "type": "VIEW",
          "source": "dataset_5.v_151",
          "destination": null,
          "number_of_appearances": 2
        }
      ]
    }
  ]
}

The first result is:

{
  "type": "VIEW",
  "source": "dataset_2.v_xyz",
  "destination": null,
  "number_of_appearances": 2
}

→ The view dataset_2.v_xyz uses in its query the table dataset_2.table_2 twice.

Notes

In case you select all projects the resources of the projects would get separated in the result.json.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Table of contents

Google BigQuery Crawler

Setup

Get dependencies from BigQuery

Examples

Example 1

Example 2

Notes

About

Releases

Packages

Languages

Koenigseder/google-bigquery-crawler

Folders and files

Latest commit

History

Repository files navigation

Table of contents

Google BigQuery Crawler

Setup

Get dependencies from BigQuery

Examples

Example 1

Example 2

Notes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages