Skip to content

Commit

Permalink
checkpoint
Browse files Browse the repository at this point in the history
  • Loading branch information
ohnorobo committed Sep 22, 2023
1 parent 3ff545d commit 9705083
Showing 1 changed file with 40 additions and 39 deletions.
79 changes: 40 additions & 39 deletions docs/base_tables.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ The json data is processed into a flat table format which looks like this.
| server_as_full_name | STRING | Autonomous system long name, eg. `Cloudflare, Inc.` |
| server_as_class | STRING | The type of AS eg. `Transit/Access`, `Content` (for CDNs) or `Enterprise` |
| server_country | STRING | Autonomous system country, eg. `US` |
| server_organization | STRING | The IP organization, eg. `US` |
| server_organization | STRING | The IP organization, eg. `United Technical Services` |
| |
| **Received Fields** | | :warning: These fields differ between scan types |
| |
Expand Down Expand Up @@ -171,14 +171,14 @@ The DNS (Satellite) data included the following alternative set of columns. (Man
| |
| resolver_ip | STRING | The ip address of the resolver being tested, eg. `1.1.1.1` |
| resolver_netblock | STRING | Netblock of the IP, eg. `1.1.1.0/24` |
| resolver_name | STRING | The domain name of the resolver. ex: 'ns1.uts.ae.` |
| resolver_name | STRING | The domain name of the resolver. ex: 'ns2.tower.com.ar.` |
| resolver_is_trusted | BOOLEAN | Whether the resolver is considered a 'trusted' resolver, ie '1.1.1.1', '8.8.8.8', '9.9.9.9' |
| resolver_asn | INTEGER | Autonomous system number, eg. `13335` |
| resolver_as_name | STRING | Autonomous system short name, eg. `CLOUDFLARENET` |
| resolver_as_full_name | STRING | Autonomous system long name, eg. `Cloudflare, Inc.` |
| resolver_as_class | STRING | The type of AS eg. `Transit/Access`, `Content` (for CDNs) or `Enterprise` |
| resolver_country | STRING | Autonomous system country, eg. `US` |
| resolver_organization | STRING | The IP organization, eg. `US` |
| resolver_organization | STRING | The IP organization, eg. `United Technical Services` |
| |
| **DNS Resolver Properties** |
| |
Expand All @@ -191,7 +191,7 @@ The DNS (Satellite) data included the following alternative set of columns. (Man
| **DNS Responses** |
| |
| received_error | STRING | Any error recieved from the resolver |
| received_rcode | INTEGER | Any [RCode](https://datatracker.ietf.org/doc/html/rfc5395#section-2.3) response recieved from the resolver. In the case of an error this is `-1`. ex: `2` representing `SERVFAIL`` |
| received_rcode | INTEGER | Any [RCode](https://datatracker.ietf.org/doc/html/rfc5395#section-2.3) response recieved from the resolver. In the case of an error this is `-1`. ex: `2` representing `SERVFAIL` |
| |
| **Analysis** | These analysis fields are generally obselete |
| |
Expand All @@ -209,49 +209,50 @@ The DNS (Satellite) data included the following alternative set of columns. (Man
| measurement_id | STRING | A uuid which is the same for observations which are part of the same measurement. </br> If there are 5 retries of a scan they will all have the same id. </br> eg. `a08df2fe70d54092916b8df87e330f47` |
| source | STRING | The name of the .tar.gz scan file this row came from. </br> eg. `CP_Satellite-2020-08-20-05-58-35` </br> Used internally and for debugging |
| |
| **Answers** | REPEATED STRUCT | Contains the following fields. Each represents an IP address answer received from the resolver, and subsequent metadata for that IP. |
| **Answers** | Each answer represents an IP address answer received from the resolver, and subsequent metadata for that IP. |
| |
| answers.ip | STRING | |
| answers.asn | INTEGER | |
| answers.as_name | STRING | |
| answers.ip_organization | STRING | |
| answers.censys_http_body_hash | STRING | |
| answers.censys_ip_cert | STRING | |
| answers | REPEATED STRUCT | |
| answers.ip | STRING | IP address recieved from the resolver eg. `1.2.3.4` |
| answers.asn | INTEGER | Autonomous system number for the received iP address eg. `13335` |
| answers.as_name | STRING | Name of the autonomous system eg. `CLOUDFLARENET` |
| answers.ip_organization | STRING | IP organization of the IP address eg. `United Technical Services` |
| answers.censys_http_body_hash | STRING | The hash of the HTTP body taken from Censys |
| answers.censys_ip_cert | STRING | The IP cert taken from Censys |
| |
| **Matches Control** | Whether the metadata of the returned IP matches the expected metadata of a control measurement |
| |
| answers.matches_control | REPEATED RECORD | |
| answers.matches_control.ip | BOOLEAN | |
| answers.matches_control.censys_http_body_hash | BOOLEAN | |
| answers.matches_control.censys_ip_cert | BOOLEAN | |
| answers.matches_control.asn | BOOLEAN | |
| answers.matches_control.as_name | BOOLEAN | |
| answers.match_confidence | FLOAT | Value from 0-1. Confidince that this IP response matches a control measurement |
| answers.matches_control.ip | BOOLEAN | Whether the IP matches an expected control IP |
| answers.matches_control.censys_http_body_hash | BOOLEAN | Whether the HTTP body hash matches an expected control |
| answers.matches_control.censys_ip_cert | BOOLEAN | Whether the Censys IP cert matches a control |
| answers.matches_control.asn | BOOLEAN | Whether the ASN matches an expected control ASN |
| answers.matches_control.as_name | BOOLEAN | Whether the AS name matches an expected control AS name |
| answers.match_confidence | FLOAT | Value from 0-1. Confidence that this IP response matches a control measurement |
| |
| **HTTP Request** | Metadata from the HTTP request made to the returned IP |
| |
| answers.http_error | STRING | |
| answers.http_response_status | STRING | |
| answers.http_response_headers | REPEATED STRING | |
| answers.http_response_body | STRING | |
| answers.http_analysis_is_known_blockpage | BOOLEAN | |
| answers.http_analysis_page_signature | STRING | |
| answers.http_error | STRING | Any recieved error, eg. `Network Timeout` |
| answers.http_response_status | STRING | The HTTP response status, eg. `301 Moved Permanently` |
| answers.http_response_headers | REPEATED STRING | Each HTTP header in the response eg. `Content-Type: text/html` |
| answers.http_response_body | STRING | The HTTP response body </br> eg. `<HTML><HEAD>\n<TITLE>Access Denied</TITLE>\n</HEAD></HTML>` </br> Truncated to 64k. |
| answers.http_analysis_is_known_blockpage | BOOLEAN | True if the received page matches a blockpage, False if it matches a known false positive blockpage, None otherwise. |
| answers.http_analysis_page_signature | STRING | A string describing the matched page </br> ex: `a_prod_cisco` (a know blockpage) or `x_document_moved` (a known false positive). </br> To see the pattern a signature matches check [blockpage signatures](https://github.com/censoredplanet/censoredplanet-analysis/blob/master/pipeline/metadata/data/blockpage_signatures.json) or [false positive signatures](https://github.com/censoredplanet/censoredplanet-analysis/blob/master/pipeline/metadata/data/false_positive_signatures.json) |
| |
| **HTTPS Request** | Metadata from the HTTPS request made to the returned IP |
| |
| answers.https_error | STRING | |
| answers.https_tls_version | INTEGER | |
| answers.https_tls_cipher_suite | STRING | |
| answers.https_tls_cert | BYTES | |
| answers.https_tls_cert_common_name | STRING | |
| answers.https_tls_cert_issuer | STRING | |
| answers.https_tls_cert_start_date | TIMESTAMP | |
| answers.https_tls_cert_end_date | TIMESTAMP | |
| answers.https_tls_cert_alternative_names | REPEATED STRING | |
| answers.https_tls_cert_has_trusted_ca | BOOLEAN | |
| answers.https_tls_cert_matches_domain | BOOLEAN | |
| answers.https_response_status | STRING | |
| answers.https_response_headers | REPEATED STRING | |
| answers.https_response_body | STRING | |
| answers.https_analysis_is_known_blockpage | BOOLEAN | |
| answers.https_analysis_page_signature | STRING | |
| answers.https_error | STRING | Any recieved error, eg. `TLS error` |
| answers.https_tls_version | INTEGER | The TLS version number eg. `771` (meaning TLS 1.2) |
| answers.https_tls_cipher_suite | STRING | The TLS cipher suite number </br> eg. `49199` (meaning TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) |
| answers.https_tls_cert | BYTES | The TLS certificate eg. `MIIG1DCCBb...` |
| answers.https_tls_cert_common_name | STRING | Common name of the TLS certificate `example.com` |
| answers.https_tls_cert_issuer | STRING | Issuer of the TLS certificate `Verisign` |
| answers.https_tls_cert_start_date | TIMESTAMP | The issue data of the certificate |
| answers.https_tls_cert_end_date | TIMESTAMP | The expiration data of the certificate |
| answers.https_tls_cert_alternative_names | REPEATED STRING | Alternative names from the TLS certificate `www.example.com` |
| answers.https_tls_cert_has_trusted_ca | BOOLEAN | Whether the issuing CA was trusted by the [Mozilla root CA list](https://wiki.mozilla.org/CA/Included_Certificates) when the request was made |
| answers.https_tls_cert_matches_domain | BOOLEAN | Whether the certificate is valid for the test domain |
| answers.https_response_status | STRING | The HTTP response status, eg. `301 Moved Permanently` |
| answers.https_response_headers | REPEATED STRING | Each HTTP header in the response eg. `Content-Type: text/html` |
| answers.https_response_body | STRING | The HTTP response body </br> eg. `<HTML><HEAD>\n<TITLE>Access Denied</TITLE>\n</HEAD></HTML>` </br> Truncated to 64k. |
| answers.https_analysis_is_known_blockpage | BOOLEAN | True if the received page matches a blockpage, False if it matches a known false positive blockpage, None otherwise. |
| answers.https_analysis_page_signature | STRING | A string describing the matched page </br> ex: `a_prod_cisco` (a know blockpage) or `x_document_moved` (a known false positive). </br> To see the pattern a signature matches check [blockpage signatures](https://github.com/censoredplanet/censoredplanet-analysis/blob/master/pipeline/metadata/data/blockpage_signatures.json) or [false positive signatures](https://github.com/censoredplanet/censoredplanet-analysis/blob/master/pipeline/metadata/data/false_positive_signatures.json) |

0 comments on commit 9705083

Please sign in to comment.