This project demonstrates the integration of Trellis for unstructured data to SQL extraction combined with pgvector for vector similarity search in PostgreSQL.
This demo showcases how to:
- Use Trellis to extract structured data from unstructured email content.
- Generate vector embeddings for the extracted data using OpenAI's API.
- Store and query the extracted data and embeddings using PostgreSQL with pgvector extension.
Note: This project is a work in progress. The logic to combine data from the Trellis API with vector embeddings and perform SQL searches across both is still under development.
- Docker
- Node.js
- npm
PostgreSQL with pgvector
-
Pull the pgvector PostgreSQL Docker image:
docker pull pgvector/pgvector:pg16
-
Start the PostgreSQL container:
docker compose up -d
-
To stop the container (this will wipe the database):
docker compose down
Create a .env
file with the following:
TRELLIS_API_KEY=your_trellis_api_key
OPENAI_API_KEY=your_openai_api_key
You can get a Trellis API key here.
Run npm install
to install the dependencies.
Run npm run start
to start the server.
-
Upload emails to Trellis
- Place your email assets in
./assets
(Enron demo data provided) - Run:
curl -X PUT http://localhost:3000/upload-emails \ -H "Content-Type: application/json" \ -d '{ "projectName": "your_project_name" }'
- Place your email assets in
-
Check the status of the Trellis upload (optional)
- Run:
curl -X GET "http://localhost:3000/check-upload-status?projectName=your_project_name"
- Run:
-
Embed the emails and store them in the DB
- Run:
curl -X POST http://localhost:3000/embed-emails
- Run:
-
Initiate the Trellis transformation process
- Run:
curl -X POST http://localhost:3000/transform-emails \ -H "Content-Type: application/json" \ -d '{ "projectName": "your_project_name" }'
- Save the returned
transformationId
for future use
- Run:
-
Check the status of the Trellis transformation (optional)
- Run:
Replace
curl -X GET http://localhost:3000/fetch-transformation-results?transformationId=your_transformation_id
your_transformation_id
with the ID from step 4
- Run:
-
Fetch and save the Trellis transformation results to existing data
- Run:
curl -X GET "http://localhost:3000/fetch-transformation-results?transformationId=your_transformation_id"
- Run:
- Search for emails using column filter and vector search
- Run:
curl http://localhost:3000/search-emails \ -H "Content-Type: application/json" \ -d '{ "query": "HOW ABOUT SOME ICE CREAM?????", "filters": { "emotional_tone": "gratitude", "compliance_risk": false }, "limit": 3 }'
- This is using L2 distance (Euclidean distance) for vector similarity search
- Meaning lower
similarity_score
results are more similar
- Run:
Note: Make sure to replace placeholder IDs with actual IDs returned from the API calls.
You can also find most of these requests in this Postman collection.
To seed the database with sample data:
curl -X POST http://localhost:3000/seeder
This will insert predefined email data into the database.
To check the data type of the embedding column:
curl -X GET http://localhost:3000/check-embedding-type
This endpoint is useful for verifying that the embedding column is correctly set up as a vector type.
To retrieve all emails from the database:
curl -X GET http://localhost:3000/emails
This will return a JSON array of all email records in the database.
To save a new email to the database:
curl -X POST http://localhost:3000/emails \
-H "Content-Type: application/json" \
-d '{
"ext_file_id": "example_id",
"email_content": "Example email content",
"email_from": "[email protected]"
}'
Note: Make sure to include all required fields in the JSON payload.
If you encounter issues with the pgvector extension, ensure you have the correct Docker image. You may need to manually install the pgvector extension in your PostgreSQL instance. Refer to the pgvector documentation for detailed installation instructions.