Amazon S3 Select - Phonebook Search is a simple serverless Java application illustrating the usage of Amazon S3 Select to execute a SQL query on a comma separated value (CSV) file stored on Amazon Simple Storage Service (Amazon S3). S3 Select does not require any database servers and runs directly on S3.
Generally available in April, 2018, S3 Select and Amazon S3 Glacier Select allow customers to run SQL queries directly on data stored in S3 and Amazon S3 Glacier. Customers previously needed to deploy a database to query this data. With Amazon S3 Select, you simply store your data on S3 and query away using simple (SQL) statements to filter the contents of Amazon S3 objects and retrieve only the subset of data that you need. By retrieving only a subset of the data, customers reduce the amount of data that Amazon S3 transfers, which reduces the cost and latency to retrieve this data.
Amazon S3 Select works on objects stored in CSV, JSON, or Apache Parquet format. Amazon S3 Select also supports compression on CSV and JSON objects with GZIP or BZIP2, and server-side encrypted objects.
You can perform SQL queries using AWS SDKs, the SELECT Object Content REST API, the AWS Command Line Interface (AWS CLI), or the Amazon S3 console.
In addition to using Amazon S3 for storage and running SQL queries, our simple phone book application will leverage Amazon API Gateway and AWS Lambda. In this sample, will use AWS Lambda to run the Amazon S3 Select SQL query. Amazon API Gateway will be used to interact with AWS Lambda.
The architecture for this workshop is the following:
The ‘Amazon S3 Select – Phonebook search’ demo showcases the power of S3 Select. S3 Select enables applications to retrieve only a subset of data from an object by using simple SQL expressions. By using S3 Select to retrieve only the data needed by your application, you can drastically improve performance and reduce cost.
This project contains a sample_data.csv file in CSV format that you can query to search for users based on name, occupation, or location. Requests are made through API Gateway via lambda to select a subset of data from the sample file. The lambda function uses Amazon S3 SDK for Java to issue the S3 Select query and returns the result back in JSON format.
- Amazon S3 is an object storage service that offers industry-leading scalability, data availability, security, and performance.
- Amazon S3 Select enables applications to retrieve only a subset of data from an object by using simple SQL expressions.
- Amazon API Gateway is a fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any scale.
- AWS Lambda lets you run code without provisioning or managing servers.
- AWS CloudFormation provides a common language for you to model and provision AWS and third party application resources in your cloud environment.
The quick start guide is intended to deploy the sample application in your own AWS account using an AWS CloudFormation template.
- Sign-in to your existing AWS account or Create a new AWS account
- Create an Amazon S3 bucket and note the name of the bucket you created, as this will be used throughout this project.
- Upload the sample_data.csv file located in project /src/test/resources directory to your S3 bucket.
- Upload packaged code lambdaCode-1.0.0.jar provided in /target directory to your S3 bucket.
- Using AWS Console, select ‘CloudFormation’ from the list of AWS Services.
- Choose ‘Create Stack’ .
- Select ‘Template is ready’ and ‘Upload a template file’
- Choose cloud_formation_template.yaml file located in project root directory and click "Next"
- On the next page, specify stack details
a. Choose a stack name
, such as "s3-select-phonebook-demo" b. Specify your bucket name (this is the bucket you created previously)
c. Specify the uploaded lambda code (this is the code you uploaded)
d. Specify the SampleData file name (this is the sample_data.csv file you uploaded previously)
- On subsequent pages, leave all other fields to their default values.
- On the final page, acknowledge all ‘Transform might require access capabilities’
- Choose Create Stack
At this point, your stack should have completed successfully. You will see a similar screen showing the status as CREATE_COMPLETE.
S3-select-phonebook application allows you to query a subset of data from the sample fictitious data. Take a look at the uploaded sample file and perform the following to query a subset of the data.
- Select the ‘Outputs’ tab from the CloudFormation Stacks and copy the value of your API Gateway endpoint.
- Using PostMan or Curl, you can issue a command to get a subset of data.
For example:
curl -d '{"name":"Jane"}' -X POST {ENTER_API_GATEWAY_ENDPOINT};
The above call should return Jane's information.
For example in my case:
curl -d '{"name":"Jane"}' -X POST https://3otm935he1.execute-api.us-west-2.amazonaws.com/Prod/s3-select-demo
The output is:
[{"Occupation":"Developer","PhoneNumber":"(949) 555-6704","City":"Chicago","Name":"Jane"}]
User interface coming soon!!
This section is for developers who are looking to customize the application.
This section provides a list of prerequisites that are required to successfully build the ‘s3-select phonebook search’ application.
- Sign-in to AWS or Create an Account
- Install Java SE Development Kit 8
- Install AWS SAM a. Note that while the instructions specify Docker as a pre-requisite, Docker is only necessary for local development via SAM local. Feel free to skip installing Docker if you are not deploying locally.
- Install Maven
Download the S3 Select demo application to your local machine and pick a region in the AWS console that matches your local configuration.
- Create an Amazon S3 bucket
- Upload sample_data.csv file located in project /src/test/resources directory to your Amazon S3 bucket
- Before deploying the project to SAM for the first time, you'll need to update some variables with your bucket name. Please update the following in the template.yaml file located in the project root directory.
a. Update Environment variables
Enter the name of your bucket.
BUCKET_NAME: {ENTER_BUCKET_NAME}
Enter the name/location of your sample file (e.g. sample_data.csv}
SAMPLE_DATA: {SAMPLE_DATA.csv}
b. Update Lambda Policy.
Enter the name of your S3 sample data ARN
(e.g. 'arn:aws:s3:::s3selectdemobucket/sample_data.csv')
Resource: 'arn:aws:s3:::{BUCKET_NAME/sample_data.csv'
Go to the root folder of ‘directory search’ and run the following SAM commands to build and deploy the application.
sam build
sam package --output-template packaged.yaml --s3-bucket {name_of_your_bucket}
sam deploy --template-file packaged.yaml --stack-name s3-select-phonebook-stack --capabilities CAPABILITY_IAM
S3-select-phonebook application allows you to query a subset of data from the sample fictitious data stored in comma separated value (CSV) format. Take a look at the uploaded sample file and perform the following steps to query a subset of the data.
- Select the ‘Outputs’ tab and copy the value of your API Gateway endpoint.
- Using PostMan or Curl, you can issue a command to get a subset of data.
For example:
curl -d '{"name":"Sam"}' -X POST {ENTER_API_GATEWAY_ENDPOINT};
The above call should return Sam’s information.
- Using AWS Console, select ‘CloudFormation’ from the list of AWS Services.
- Select the Stack you created.
- Click ‘Delete’ action button to delete the stack and all associated resources.
This library is licensed under the MIT-0 License. See the LICENSE file.