Skip to content

Commit cdd319e

Browse files
authored
Merge pull request #7822 from segmentio/amazon-s3-support
Amazon S3 not supported by Segment
2 parents ae8b7e1 + 566326b commit cdd319e

File tree

1 file changed

+22
-19
lines changed
  • src/connections/sources/catalog/cloud-apps/amazon-s3

1 file changed

+22
-19
lines changed

src/connections/sources/catalog/cloud-apps/amazon-s3/index.md

Lines changed: 22 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -5,41 +5,40 @@ id: GNLT5OQ45P
55
---
66
{% include content/source-region-unsupported.md %}
77

8-
This document contains a procedure that enables you to upload a CSV file containing data to Amazon S3, where it uses Lambda to automatically parse, format, and upload the data to Segment.
8+
This document outlines how to upload a CSV file containing data to [Amazon S3](https://aws.amazon.com/s3/){:target="_blank”}, which uses [Lambda](https://aws.amazon.com/lambda/){:target="_blank”} to automatically parse, format, and upload the data to Segment.
99

1010
You might have sources of data where you can't instrument Segment's SDKs, including other SaaS tools for which a Segment integration is not yet available. In many of these cases, you can extract data from these sources in CSV format, and then use Segment's server-side SDKs or HTTP tracking API to push the data to Segment.
1111

12-
The goal of this walkthrough is to make this process easier by providing an automated process that ingests this data. Once you complete this walkthrough, you will have the following Segment, Amazon S3, Lambda, and IAM resources deployed:
12+
The goal of this walkthrough is to make this process easier by providing an automated process that ingests this data. Once you complete this walkthrough, you will have the following Segment, Amazon S3, Lambda, and [IAM](https://aws.amazon.com/iam/){:target="_blank"} resources deployed:
1313

1414
- a Segment S3 source
1515
- an AWS Lambda function
1616
- an access policy for the Lambda function that grants Amazon S3 permission to invoke it
1717
- an AWS IAM execution role that grants the permissions your Lambda function needs through the permissions policy associated with this role
1818
- an AWS S3 source bucket with a notification configuration that invokes the Lambda function
1919

20-
2120
## Prerequisites
2221

23-
This tutorial assumes that you have some basic understanding of S3, Lambda and the `aws cli` tool. If you haven't already, follow the instructions in [Getting Started with AWS Lambda](https://docs.aws.amazon.com/lambda/latest/dg/getting-started.html){:target="_blank"} to create your first Lambda function. If you're unfamiliar with `aws cli`, follow the instructions in [Setting up the AWS Command Line Interface](https://docs.aws.amazon.com/polly/latest/dg/setup-aws-cli.html){:target="_blank"} before you proceed.
22+
This tutorial assumes that you have some basic understanding of S3, Lambda and the `aws cli` tool. If you haven't already, follow the instructions in [Getting Started with AWS Lambda](https://docs.aws.amazon.com/lambda/latest/dg/getting-started.html){:target="_blank"} to create your first Lambda function. If you're unfamiliar with `aws cli`, follow the instructions in [Setting up the AWS Command Line Interface](https://docs.aws.amazon.com/polly/latest/dg/setup-aws-cli.html){:target="_blank"} before you proceed.
2423

2524
This tutorial uses a command line terminal or shell to run commands. Commands appear preceded by a prompt symbol (`$`) and the name of the current directory, when appropriate.
2625

2726
On Linux and macOS, use your preferred shell and package manager. On macOS, you can use the Terminal application. On Windows 10, you can [install the Windows Subsystem for Linux](https://docs.microsoft.com/en-us/windows/wsl/install-win10){:target="_blank"} to get a Windows-integrated version of Ubuntu and Bash.
2827

2928
[Install NPM](https://www.npmjs.com/get-npm){:target="_blank"} to manage the function's dependencies.
3029

31-
## Getting Started
32-
30+
## Getting started
3331
### 1. Create an S3 source in Segment
32+
3433
Remember the write key for this source, you'll need it in a later step.
3534

36-
### 2. Create the Execution Role
35+
### 2. Create the execution role
3736

38-
Create the [execution role](https://docs.aws.amazon.com/lambda/latest/dg/lambda-intro-execution-role.html){:target="_blank"} that gives your function permission to access AWS resources.
37+
Create the [execution role](https://docs.aws.amazon.com/lambda/latest/dg/lambda-intro-execution-role.html){:target="_blank"} that gives your function permission to access AWS resources.
3938

4039
**To create an execution role**
4140

42-
1. Open the [roles page](https://console.aws.amazon.com/iam/home#/roles){:target="_blank"} in the IAM console.
41+
1. Open the [roles page](https://console.aws.amazon.com/iam/home#/roles){:target="_blank"} in the IAM console.
4342
2. Choose **Create role**.
4443
3. Create a role with the following properties:
4544
- Set the **Trusted entity** to **AWS Lambda**.
@@ -53,7 +52,7 @@ Create the [execution role](https://docs.aws.amazon.com/lambda/latest/dg/lambda
5352

5453
The **AWSLambdaExecute** policy has the permissions that the function needs to manage objects in Amazon S3, and write logs to CloudWatch Logs.
5554

56-
### 3. Create Local Files, an S3 Bucket and Upload a Sample Object
55+
### 3. Create local files, an S3 bucket and upload a sample object
5756

5857
Follow these steps to create your local files, S3 bucket and upload an object.
5958

@@ -73,7 +72,7 @@ Follow these steps to create your local files, S3 bucket and upload an object.
7372
3. Create your bucket. **Record your bucket name** - you'll need it later!
7473
4. In the source bucket, upload `track_1.csv`.
7574
76-
### 4. Create the Function
75+
### 4. Create the function
7776
7877
Next, create the Lambda function, install dependencies, and zip everything up so it can be deployed to AWS.
7978
@@ -260,11 +259,11 @@ The command above sets a 90-second timeout value as the function configuration.
260259
S3-Lambda-Segment$ aws lambda update-function-configuration --function-name <!Your Lambda Name!> --timeout 180
261260
```
262261

263-
### 5. Test the Lambda Function
262+
### 5. Test the lambda function
264263

265264
In this step, you invoke the Lambda function manually using sample Amazon S3 event data.
266265

267-
**To test the Lambda function**
266+
**To test the lambda function**
268267

269268
1. Create an empty file named `output.txt` in the `S3-Lambda-Segment` folder - the aws cli complains if it's not there.
270269
```bash
@@ -281,7 +280,7 @@ In this step, you invoke the Lambda function manually using sample Amazon S3 eve
281280
282281
**Note**: Calls to Segment's Object API don't show up the Segment debugger.
283282
284-
### Configure Amazon S3 to Publish Events
283+
### Configure Amazon S3 to publish events
285284
286285
In this step, you add the remaining configuration so that Amazon S3 can publish object-created events to AWS Lambda and invoke your Lambda function.
287286
You'll do the following:
@@ -348,11 +347,15 @@ Last, test your system to make sure it's working as expected:
348347
### Timestamps
349348
This script automatically transforms all CSV timestamp columns named `createdAt` and `timestamp` to timestamp objects, regardless of nesting, preparation for Segment ingestion. If your timestamps have a different name, search the example `index.js` code for the `colParser` function, and add your column names there for automatic transformation. If you make this modification, re-zip the package (using `zip -r function.zip .`) and upload the new zip to Lambda.
350349

351-
## CSV Formats
350+
## CSV formats
352351

353352
Define your CSV file structure based on the method you want to execute.
354353

355-
#### Identify Structure
354+
> warning "CSV support recommendation"
355+
>
356+
> Implementing a production-grade solution with this tutorial can be complex. Segment recommends that you submit feature requests for Segment reverse ETL for CSV support.
357+
358+
#### Identify structure
356359

357360
An `identify_XXXXX` .csv file uses the following field names:
358361

@@ -367,7 +370,7 @@ An `identify_XXXXX` .csv file uses the following field names:
367370
In the above structure, the `userId` is required, but all other items are optional. Start all traits with `traits.` and then the trait name, for example `traits.account_type`. Similarly, start context fields with `context.` followed by the canonical structure. The same structure applies to `integrations.` too.
368371

369372

370-
#### Page/Screen Structure
373+
#### Page/Screen structure
371374

372375
For example a `screen_XXXXX` or `page_YYYY` file has the following field names:
373376

@@ -380,7 +383,7 @@ For example a `screen_XXXXX` or `page_YYYY` file has the following field names:
380383
7. `timestamp` (Unix time) - Optional
381384
8. `integrations.<integration>` - Optional
382385

383-
#### Track Structure
386+
#### Track structure
384387

385388
For example a `track_XXXXX` file has the following field names:
386389

@@ -409,7 +412,7 @@ For any of these methods, you might need to pass nested JSON to the tracking or
409412

410413
The example `index.js` sample code above does not support ingestion of arrays. If you need this functionality you can modify the sample code as needed.
411414

412-
#### Object Structure
415+
#### Object structure
413416

414417
There are cases when Segment's tracking API is not suitable for datasets that you might want to move to a warehouse. This could be e-commerce product data, media content metadata, campaign performance, and so on.
415418

0 commit comments

Comments
 (0)