Skip to content

Commit

Permalink
Merge pull request #5 from Jigsaw-Code/tables
Browse files Browse the repository at this point in the history
add view creation to query and update docs on how to query the view
  • Loading branch information
ohnorobo authored Oct 30, 2020
2 parents 6be130b + 79e937f commit fc22669
Show file tree
Hide file tree
Showing 2 changed files with 39 additions and 15 deletions.
28 changes: 14 additions & 14 deletions docs/tables.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,21 +120,28 @@ This table is created by the script

### Reduced Table

These two tables are intended to be used together, they contain scan information
in one table `reduced_scans` and AS information in another `net_as`.
This table is actually a view joining two tables, in order to read over less
data with every request.

To use these tables together join on the `date` and `netblock` fields.

These tables contain only HTTPS scan data.
This table contains only HTTPS scan data.

These tables are created by the script
[https_reduced_scans.sql](../table/queries/https_reduced_scans.sql).

#### Table names

##### View

- firehook-censoredplanet.https_results.reduced_scans_geolocated

##### Joined Sub-tables

- `firehook-censoredplanet.https.reduced_scans`
- `firehook-censoredplanet.https.net_as`

These two tables are joined on their `date` and `netblock` fields to create the
view.

#### Table Formats

Reduced Scans
Expand All @@ -145,14 +152,7 @@ Reduced Scans
| domain | STRING | The domain being tested, eg. `example.com` |
| country | STRING | Autonomous system country, eg. `US` |
| netblock | STRING | Netblock of the IP, eg. `1.1.1.0/24` |
| asn | INTEGER | Autonomous system number, eg. `13335` |
| as_name | STRING | Autonomous system long name, eg. `Cloudflare, Inc.` |
| result | STRING | The source type, followed by the `: null` (meaning success) or error returned. eg. `HTTPS: null`, `HTTPS: Incorrect web response: status lines don't match` |
| count | INTEGER | How many measurements fit the exact pattern of this row? |

Net AS

| Field Name | Type | Contains |
| ------------ | ------- | -------- |
| date | DATE | Date that an individual measurement was taken |
| netblock | STRING | Netblock of the IP, eg. `1.1.1.0/24` |
| asn | INTEGER | Autonomous system number, eg. `13335` |
| as_full_name | STRING | Autonomous system long name, eg. `Cloudflare, Inc.` |
26 changes: 25 additions & 1 deletion table/queries/https_reduced_scans.sql
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.

CREATE TEMP FUNCTION CleanError(error string) AS (
CREATE TEMP FUNCTION CleanError(error STRING) AS (
REGEXP_REPLACE(REGEXP_REPLACE(REGEXP_REPLACE(REGEXP_REPLACE(
IF(error = "", "null", IFNULL(error, "null")),
"[0-9]+\\.[0-9]+\\.[0-9]+\\.[0-9]+", "[IP]"),
Expand Down Expand Up @@ -49,3 +49,27 @@ AS (
NOT SAFE.REGEXP_CONTAINS(error, "too many open files|address already in use|no route to host|connection refused|connect: connection timed out")
GROUP BY date, country, domain, netblock, result
);

# Drop the temp function before creating the view
# Since any temp functions in scope block view creation.
DROP FUNCTION CleanError;

CREATE OR REPLACE VIEW `firehook-censoredplanet.https_results.reduced_scans_geolocated`
OPTIONS(
friendly_name="Reduced Scan View",
description="A join of reduced scans with ASN info."
)
AS (
SELECT
date,
domain,
country,
netblock,
asn,
as_full_name AS as_name,
result,
count
FROM `firehook-censoredplanet.https_results.reduced_scans`
LEFT JOIN `firehook-censoredplanet.https_results.net_as`
USING (date, netblock)
);

0 comments on commit fc22669

Please sign in to comment.