Skip to content

Latest commit

 

History

History
153 lines (130 loc) · 7.42 KB

README.md

File metadata and controls

153 lines (130 loc) · 7.42 KB

reflekt-registry

A lightweight, serverless, schema registry to host schemas for product analytics events defined in a Reflekt project. Compatible with popular event producers and consumers.

Currently, reflekt-registry is focused on supporting Segment as a producer and consumer. Support for more producers and consumers is on the way.

Getting Started

You can deploy your own instance of reflekt-registry to your AWS account in a few simple steps. We used an AWS Free Tier to host our registry, so you can too!

Requirements

You will need:

  1. An Amazon Web Services (AWS) account.
  2. The AWS CLI installed and configured.
  3. A Reflekt project with schemas defined. See Reflekt for details.
  4. An AWS S3 bucket to host schemas from your Reflekt project. See here for instructions on creating an S3 bucket.
  5. Run reflekt push to push schemas from your Reflekt project to your S3 bucket. See these Reflekt docs for details.
  6. To clone this repo, git clone https://github.com/GClunies/reflekt-registry.git.
  7. To create a virtual environment.
    • This repo contains a pyproject.toml file, so you can use poetry to create a virtual environment.
    • Or use pip with reflekt-registry/requirements.txt and your favorite virtual environment manager.

Configure

Setup the following environment variables in reflekt-registry/.chalice/config.json

Variable Description
REGISTRY_BUCKET The name of the S3 bucket that hosts schemas from your Reflekt project.
REGISTRY_BUCKET_REGION The name of the region where the S3 bucket is located.
SEGMENT_WRITE_KEY_VALID The write key for the Segment source where VALID events should be sent.
SEGMENT_WRITE_KEY_INVALID The write key for the Segment source where INVALID events should be sent.
DEBUG Set to "true" to enable debug logging to AWS CloudWatch.

Deploy

Inside the reflekt-registry/ directory, run the following to deploy your registry to AWS:

$ chalice deploy

Creating deployment package.
Updating policy for IAM role: reflekt-registry-dev
Updating lambda function: reflekt-registry-dev
Updating rest API
Resources deployed:
    # NAME OF LAMBDA FUNCTION (reflekt-registry-dev)
  - Lambda ARN: arn:aws:lambda:us-west-1:012345678987:function:reflekt-registry-dev
    # API ENDPOINT TO BE USED IN SDK CLIENTS
  - Rest API URL: https://foo77bar99.execute-api.us-west-1.amazonaws.com/api/

Chalice handles creating IAM roles, the Lambda function, and API Gateway endpoint for you. You can view these resources in the AWS Console.

After your first deploy only - you will need to grant your Lambda IAM role (reflekt-registry-dev in example above) permission to access your S3 bucket. For instructions on how to add a policy to an IAM role, see here (read section To embed an inline policy for a user or role (console)). Add this policy to your IAM role:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "s3:ListBucket",
            "Resource": [
                "arn:aws:s3:::<YOUR_BUCKET_NAME>"
            ]
        },
        {
            "Effect": "Allow",
            "Action": "s3:*Object",
            "Resource": "arn:aws:s3:::<YOUR_BUCKET_NAME>/*"
        }
    ]
}

Usage

Most SDKs (e.g., Segment's analytics.js, analytics-python) support custom endpoints. Simply configure the SDK client to send events to your reflekt-registry endpoint. The registry will validate the events against schemas in your S3 bucket and send them to the appropriate consumer.

import segment.analytics as segment_analytics
from datetime import datetime

segment_analytics.write_key = "abc123def456ghi789jkl012mno345pqr678stu901vwx234yz567"

# Specify custom endpoint + 'validate/<sdk_vendor>' (e.g., 'validate/segment')
# reflekt-registry knows how to handle events from SDK
segment_analytics.host = "https://foo77bar99.execute-api.us-west-1.amazonaws.com/api/validate/segment"

segment_analytics.track(
        user_id="test_user",
        event="Test Event",
        timestamp=datetime.now(),
        properties={
            "schema_id": "segment/demo/Test_Event/1-0.json",  # REQUIRED TO VALIDATE EVENT
            "test_property": "test_value",
        },
    )

When sending events to a reflekt-registry, you must include the schema_id as a property in the event, set to the $id of the schema in your Reflekt project (and S3 bucket) that the event should be validated against. For example, the schema_id in the example above validates against this schema:

{
    "$schema": "http://json-schema.org/draft-07/schema#",
    "$id": "segment/demo/Test_Event/1-0.json",
    "description": "User viewed their shopping cart.",
    "self": {
        "vendor": "com.reflekt-ci",
        "name": "Test Event",
        "format": "jsonschema",
        "version": "1-0",
        "metadata": {
            "code_owner": "Maura",
            "product_owner": "Greg"
        }
    },
    "type": "object",
    "properties": {
        "schema_id": {
            "description": "The schema ID of the event.",
            "const": "segment/demo/Test_Event/1-0.json"
        },
        "test_property": {
            "type": "string",
            "description": "This is a test property."
        }
    },
    "required": [
        "schema_id",
        "test_property"
    ],
    "additionalProperties": false
}

Valid events will be sent to the Segment consumer specified by SEGMENT_WRITE_KEY_VALID. Invalid events will be sent to the Segment consumer specified by SEGMENT_WRITE_KEY_INVALID.

Producers

Supported producers:

  • Segment SDKs (e.g. analytics.js). See Segment docs for full SDK list.

👀 More producers to come! 👀

Consumers

Supported consumers:

  • Segment Sources. By default, reflekt-registry is configure to:
    • Send valid events to a Segment source with write key SEGMENT_WRITE_KEY_VALID
    • Send invalid events to a Segment source with write key SEGMENT_WRITE_KEY_INVALID

👀 More consumers to come! 👀

Architecture

reflekt-registry is built on top of AWS and Chalice, making it easy to manage and deploy. We used an AWS Free Tier to host our registry, so you can too!

Behind the scenes, reflekt-registry is composed of 3 AWS components:

  1. An S3 bucket to store schemas from a Reflekt project.
  2. An API Gateway endpoint that accepts events from producers.
  3. A Lambda function to validate events against schemas in the S3 bucket, routing them to the appropriate consumer.