The primary objectives of this recommendation engine are twofold. Firstly, it aims to develop an experimental recommendation engine utilizing the pgvector
extension and PL/Python
. Secondly, it seeks to construct the same engine using the aidb
extension, with the intent of demonstrating the extension's ease of implementation and its capability to abstract complexities without compromising functionality.
The objective of this experiment is to leverage the CLIP model in conjunction with PostgreSQL, employing the pgvector extension and PL/Python to execute transformation functions directly within the database for efficient searching. This setup involves a dataset of 44k Images of catalog.
Instead of storing the images directly in the database, we store only their full file paths. The actual data stored in the database consists of image embeddings, which are generated by the CLIP model and encapsulated in 512-dimensional vectors as required by the model. This approach enables rapid search capabilities on a standard laptop.
We are showing also text to image search, searching on catalog passing text as input.
Download and unzip the dataset from https://www.kaggle.com/datasets/paramaggarwal/fashion-product-images-small/download?datasetVersionNumber=1
into a folder like following dataset/images
Postgresql 16 installed.
aidb is EDB Postgres AI database extension and it should be installed. If not installed please install it by following the step by step installation guide in the following link: https://www.enterprisedb.com/docs/edb-postgres-ai/ai-ml/install-tech-preview/
EDB Language pack installed. Install pgvector 0.6 extension from https://github.com/pgvector/pgvector Validate that pl-python3u is working well
run select public.test_plpython() inside the database;
postgres=# select public.test_plpython();
test_plpython
PL/Python is working!
(1 row)
Run pip install from EDB Python directory as: pip install -r requirements.txt
Python Environment: The Python environment accessible to PostgreSQL should have the necessary libraries installed:
You can run aidb recommendation app in two ways. First you can run as a full streamlit app from initializing the db to running the search. Every operation is behind the UI;
%streamlit run code/app.py
Second, you can first run the python script to initialize the database and to load the data. Then initially run connect_encode.py
file to install aidb extension as shown in below, create and refresh retriever to collect and generate embeddings from image datas in an S3 bucket.
The images should be stored into that S3 bucket to run the python script. S3 endpoint is optional leave blank if the s3 bucket is not public. Then you should pass the name of the S3 bucket name as an argument like in below;
%python code/connect_encode.py retriver_name s3_bucket_name s3_endpoint
Change the db connection with the necessary port, username, password from create_db_connection
function and DATABASE_URL
variable.
To run with aidb use the below code. s3_endpoint is optional Streamlit doesn't natively support command-line arguments in the same way as typical Python scripts. Therefore enter single quoted empty string '' if the s3 bucket is not public.
% streamlit run code/app_search_aidb.py retriever_name s3_bucket_name s3_endpoint
Example if the S3 bucket is not public and I added AWS Secret and AWS Access key initially as environment variables;
% streamlit run code/app_search_aidb.py recommendation_engine bilge-ince-test ''
Example search texts : red shoes, red women shoes, black shoes....
Open psql and create the table
drop table products_emb;
CREATE TABLE products_emb (
Id integer,
gender VARCHAR(50),
masterCategory VARCHAR(100),
subCategory VARCHAR(100),
articleType VARCHAR(100),
baseColour VARCHAR(50),
Season text,
year INTEGER,
usage text null,
productDisplayName TEXT null,
Image_path text null,
embedding vector(512)
);
Install the functions inside DDL folder:
1 - load_fashion_tag -- this function will read the products table and insert inside the new products_emb and add 2 columns embedding and image_path
postgres=# select load_fashion_tag('dataset/images','product', 32);
NOTICE: Processed 32 images in 1.172111988067627 seconds. rows inserted 32
NOTICE: Processed 32 images in 0.810783863067627 seconds. rows inserted 64
NOTICE: Processed 32 images in 0.8137722015380859 seconds. rows inserted 96
NOTICE: Processed 32 images in 0.9017457962036133 seconds. rows inserted 128
NOTICE: Processed 32 images in 0.8105340003967285 seconds. rows inserted 160
NOTICE: Processed 32 images in 0.8043057918548584 seconds. rows inserted 192
...
NOTICE: Processed 32 images in 0.786837100982666 seconds. rows inserted 44411
NOTICE: Processed 32 images in 0.6174778938293457 seconds. rows inserted 44435
NOTICE: Total Rows: 44435 Total function execution time: 1283.9920008182526 seconds. Model loading time: 2.249537944793701 seconds. Fetching time: 0.05452418327331543 seconds.
from psql run the following with the path as final path where you unzip all images
postgres=# select load_fashion_tag('/Users/francksidi/Downloads/archive/images','product', 32);
Change the connection info inside. Run from the command line. Copy the logo.png image in the directory in which the python program is running. For instance look for : red shoes, red women shoes, black shoes.... the application is inside code directory.
%streamlit run app_search_adv.py
Similarity Search using Streamlit application Catalog Search and Free Text Search on Catalog and Search on Similar Images
Change the connection info inside. Run from the command line. Copy the logo.png image in the directory in which the python program is running. For instance look for : red shoes, red women shoes, black shoes.... the application is inside code directory. upload a similar image and search
%streamlit run app_search_final.py