Skip to content

francksidi-edb/Fashion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Recommendation Engine

The primary objectives of this recommendation engine are twofold. Firstly, it aims to develop an experimental recommendation engine utilizing the pgvector extension and PL/Python. Secondly, it seeks to construct the same engine using the aidb extension, with the intent of demonstrating the extension's ease of implementation and its capability to abstract complexities without compromising functionality.

Catalog Using Postgresql & PGvector

The objective of this experiment is to leverage the CLIP model in conjunction with PostgreSQL, employing the pgvector extension and PL/Python to execute transformation functions directly within the database for efficient searching. This setup involves a dataset of 44k Images of catalog.

Instead of storing the images directly in the database, we store only their full file paths. The actual data stored in the database consists of image embeddings, which are generated by the CLIP model and encapsulated in 512-dimensional vectors as required by the model. This approach enables rapid search capabilities on a standard laptop.

We are showing also text to image search, searching on catalog passing text as input.

Sample Dataset

Download and unzip the dataset from https://www.kaggle.com/datasets/paramaggarwal/fashion-product-images-small/download?datasetVersionNumber=1 into a folder like following dataset/images

Install all dependencies for transformers and validate that CLIP model is available

Postgresql 16 installed.

If aidb extension will be used;

aidb is EDB Postgres AI database extension and it should be installed. If not installed please install it by following the step by step installation guide in the following link: https://www.enterprisedb.com/docs/edb-postgres-ai/ai-ml/install-tech-preview/

If aidb won't be used but pgvector and PL/Python will be used;

EDB Language pack installed. Install pgvector 0.6 extension from https://github.com/pgvector/pgvector Validate that pl-python3u is working well

run select public.test_plpython() inside the database;

postgres=# select public.test_plpython(); test_plpython
PL/Python is working! (1 row)

After above installations install requirements;

Run pip install from EDB Python directory as: pip install -r requirements.txt

Python Environment: The Python environment accessible to PostgreSQL should have the necessary libraries installed:

Run

Run with aidb extension

You can run aidb recommendation app in two ways. First you can run as a full streamlit app from initializing the db to running the search. Every operation is behind the UI;

%streamlit run code/app.py

Second, you can first run the python script to initialize the database and to load the data. Then initially run connect_encode.py file to install aidb extension as shown in below, create and refresh retriever to collect and generate embeddings from image datas in an S3 bucket. The images should be stored into that S3 bucket to run the python script. S3 endpoint is optional leave blank if the s3 bucket is not public. Then you should pass the name of the S3 bucket name as an argument like in below;

%python code/connect_encode.py retriver_name s3_bucket_name s3_endpoint

Similarity Search using Streamlit application Catalog Search and Free Text Search on Catalog.

Change the db connection with the necessary port, username, password from create_db_connection function and DATABASE_URL variable.

To run with aidb use the below code. s3_endpoint is optional Streamlit doesn't natively support command-line arguments in the same way as typical Python scripts. Therefore enter single quoted empty string '' if the s3 bucket is not public.

% streamlit run code/app_search_aidb.py retriever_name s3_bucket_name s3_endpoint

Example if the S3 bucket is not public and I added AWS Secret and AWS Access key initially as environment variables;

% streamlit run code/app_search_aidb.py recommendation_engine bilge-ince-test ''

Example search texts : red shoes, red women shoes, black shoes....

Run without aidb (pgvector and PL/Python only)

Create products_emb, install Pl_python3u functions

Open psql and create the table

drop table products_emb;

CREATE TABLE products_emb (
    Id integer,
    gender VARCHAR(50),
    masterCategory VARCHAR(100),
    subCategory VARCHAR(100),
    articleType VARCHAR(100),
    baseColour VARCHAR(50),
    Season text,
    year INTEGER,
    usage text null,
    productDisplayName TEXT null,
    Image_path text null, 
    embedding vector(512)
);

Install the functions inside DDL folder:

1 - load_fashion_tag -- this function will read the products table and insert inside the new products_emb and add 2 columns embedding and image_path

postgres=# select load_fashion_tag('dataset/images','product', 32);

NOTICE:  Processed 32 images in 1.172111988067627 seconds. rows inserted 32

NOTICE:  Processed 32 images in 0.810783863067627 seconds. rows inserted 64

NOTICE:  Processed 32 images in 0.8137722015380859 seconds. rows inserted 96

NOTICE:  Processed 32 images in 0.9017457962036133 seconds. rows inserted 128

NOTICE:  Processed 32 images in 0.8105340003967285 seconds. rows inserted 160

NOTICE:  Processed 32 images in 0.8043057918548584 seconds. rows inserted 192
...

NOTICE:  Processed 32 images in 0.786837100982666 seconds. rows inserted 44411


NOTICE:  Processed 32 images in 0.6174778938293457 seconds. rows inserted 44435

NOTICE:  Total Rows: 44435 Total function execution time: 1283.9920008182526 seconds. Model loading time: 2.249537944793701 seconds. Fetching time: 0.05452418327331543 seconds.

Generate Embedding

from psql run the following with the path as final path where you unzip all images

postgres=# select load_fashion_tag('/Users/francksidi/Downloads/archive/images','product', 32);

Similarity Search using Streamlit application Catalog Search and Free Text Search on Catalog.

Change the connection info inside. Run from the command line. Copy the logo.png image in the directory in which the python program is running. For instance look for : red shoes, red women shoes, black shoes.... the application is inside code directory.

%streamlit run app_search_adv.py

Similarity Search using Streamlit application Catalog Search and Free Text Search on Catalog and Search on Similar Images

Change the connection info inside. Run from the command line. Copy the logo.png image in the directory in which the python program is running. For instance look for : red shoes, red women shoes, black shoes.... the application is inside code directory. upload a similar image and search

%streamlit run app_search_final.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •