Trellis PGVector Demo

This project demonstrates the integration of Trellis for unstructured data to SQL extraction combined with pgvector for vector similarity search in PostgreSQL.

Introduction

This demo showcases how to:

Use Trellis to extract structured data from unstructured email content.
Generate vector embeddings for the extracted data using OpenAI's API.
Store and query the extracted data and embeddings using PostgreSQL with pgvector extension.

Note: This project is a work in progress. The logic to combine data from the Trellis API with vector embeddings and perform SQL searches across both is still under development.

Setup

Prerequisites

Docker
Node.js
npm

PostgreSQL with pgvector

Pull the pgvector PostgreSQL Docker image:
```
docker pull pgvector/pgvector:pg16
```
Start the PostgreSQL container:
```
docker compose up -d
```
To stop the container (this will wipe the database):
```
docker compose down
```

Environment

Create a .env file with the following:

TRELLIS_API_KEY=your_trellis_api_key
OPENAI_API_KEY=your_openai_api_key

You can get a Trellis API key here.

Server

Run npm install to install the dependencies.

Run npm run start to start the server.

Trellis pg vector Demo Steps

Setup

Upload emails to Trellis

Place your email assets in ./assets (Enron demo data provided)

Run:

curl -X PUT http://localhost:3000/upload-emails \
-H "Content-Type: application/json" \
-d '{
  "projectName": "your_project_name"
}'

Check the status of the Trellis upload (optional)

Run:

curl -X GET "http://localhost:3000/check-upload-status?projectName=your_project_name"

Embedding and Trellis Tranformation

Embed the emails and store them in the DB

Run:

curl -X POST http://localhost:3000/embed-emails

Initiate the Trellis transformation process

Run:

curl -X POST http://localhost:3000/transform-emails \
-H "Content-Type: application/json" \
-d '{
  "projectName": "your_project_name"
}'

Save the returned transformationId for future use

Check the status of the Trellis transformation (optional)
- Run:
```
curl -X GET http://localhost:3000/fetch-transformation-results?transformationId=your_transformation_id
```
  Replace your_transformation_id with the ID from step 4

Fetch and save the Trellis transformation results to existing data

Run:

curl -X GET "http://localhost:3000/fetch-transformation-results?transformationId=your_transformation_id"

Search

Search for emails using column filter and vector search

Run:

curl http://localhost:3000/search-emails \
-H "Content-Type: application/json" \
-d '{
  "query": "HOW ABOUT SOME ICE CREAM?????",
  "filters": {
    "emotional_tone": "gratitude",
    "compliance_risk": false
  },
  "limit": 3
}'

This is using L2 distance (Euclidean distance) for vector similarity search
Meaning lower similarity_score results are more similar

Note: Make sure to replace placeholder IDs with actual IDs returned from the API calls.

Other API Endpoints

You can also find most of these requests in this Postman collection.

Seeding Data

To seed the database with sample data:

curl -X POST http://localhost:3000/seeder

This will insert predefined email data into the database.

Check Embedding Column Type

To check the data type of the embedding column:

curl -X GET http://localhost:3000/check-embedding-type

This endpoint is useful for verifying that the embedding column is correctly set up as a vector type.

Fetch All Emails

To retrieve all emails from the database:

curl -X GET http://localhost:3000/emails

This will return a JSON array of all email records in the database.

Save a New Email

To save a new email to the database:

curl -X POST http://localhost:3000/emails \
-H "Content-Type: application/json" \
-d '{
  "ext_file_id": "example_id",
  "email_content": "Example email content",
  "email_from": "[email protected]"
}'

Note: Make sure to include all required fields in the JSON payload.

Note on pgvector Installation

If you encounter issues with the pgvector extension, ensure you have the correct Docker image. You may need to manually install the pgvector extension in your PostgreSQL instance. Refer to the pgvector documentation for detailed installation instructions.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
src		src
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Trellis PGVector Demo

Introduction

Setup

Prerequisites

PostgreSQL with pgvector

Environment

Server

Trellis pg vector Demo Steps

Setup

Embedding and Trellis Tranformation

Search

Other API Endpoints

Seeding Data

Check Embedding Column Type

Fetch All Emails

Save a New Email

Note on pgvector Installation

About

Releases

Packages

Languages

moritzWa/trellis-pgvector-demo

Folders and files

Latest commit

History

Repository files navigation

Trellis PGVector Demo

Introduction

Setup

Prerequisites

PostgreSQL with pgvector

Environment

Server

Trellis pg vector Demo Steps

Setup

Embedding and Trellis Tranformation

Search

Other API Endpoints

Seeding Data

Check Embedding Column Type

Fetch All Emails

Save a New Email

Note on pgvector Installation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages