This is not an official Google product
This Data Studio Community Connector lets users query data from AWS S3 Buckets directly.
The connector is using AWS Athena for underlying queries.
This example is running in the us-west-2
region.
Create an IAM User with programmatic access.
Attach managed policies AmazonAthenaFullAccess
and AmazonS3ReadOnlyAccess
to this user.
Remember the user's access key and secret.
Visit the Athena Console and create a sample table:
CREATE EXTERNAL TABLE IF NOT EXISTS cloudfront_logs (
LogDate DATE,
Time STRING,
Location STRING,
Bytes INT,
RequestIP STRING,
Method STRING,
Host STRING,
Uri STRING,
Status INT,
Referrer STRING,
os STRING,
Browser STRING,
BrowserVersion STRING
) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = "^(?!#)([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+[^\(]+[\(]([^\;]+).*\%20([^\/]+)[\/](.*)$"
) LOCATION 's3://athena-examples-us-west-2/cloudfront/plaintext/';
You could then try SELECT * FROM "default"."cloudfront_logs" limit 10;
to preview the table.
In the connector, fill in the values like this:
Key | Value |
---|---|
AWS_ACCESS_KEY_ID |
{KEY} |
AWS_SECRET_ACCESS_KEY |
{SECRET} |
AWS Region |
{AWS_REGION} |
Glue Database Name |
default |
Work Group |
{YOUR WORKGROUP} |
Glue Table Name |
cloudfront_logs |
Query Output Location |
s3://aws-athena-query-results-{account_id}-us-west-2/data-studio |
Date Range Column Name |
LogDate |
For Query Output Location
, AWS should have created a S3 bucket to store the query results, you could find the bucket name in S3 console.
If not, you could create a S3 bucket that starts with the name aws-athena-query-results-
yourself.
Data Studio will automatically crawls the table schema.
You could then try to explore the data. Note that the sample data is ranged from 2014-07-05
to 2014-08-05
.