GitHub - dnth/pgsql-search: Simplify PostgreSQL search using Python.

🌟 Key Features

Currrent and planned features:

PostgreSQL Full Text Search
Vector text-to-image search
Vector image-to-image search
Hybrid search with RRF

📦 Installation

This project uses pixi to manage dependencies and environments.

If you're on Linux or macOS, you can install pixi using the following commands:

curl -fsSL https://pixi.sh/install.sh | bash

Then clone the repository:

git clone https://github.com/dnth/pgsql-search.git
cd pgsql-search

Install the project:

pixi install

This should install all the dependencies of the project including PostgreSQL, CUDA, PyTorch, and pgvector into a virtual environment.

Tip

Why pixi and not uv?

We are using PostgreSQL database in this project and it's not installable directly via uv or pip. But PostgreSQL is installable via conda.

Instead of using conda, we use pixi to manage the environment and dependencies. Plus, pixi uses uv under the hood to pull Python packages. This gives us the speed of uv for Python packages and the flexibility of conda for system level dependencies.

🚀 Quickstart

Start the local database server:

pixi run configure-db

This initializes the database and starts the server. You should see a folder named mylocal_db in your current directory. This folder contains the database files.

Once the database is set up, let's run the quickstart script:

pixi run quickstart

This script will load a dataset with images and captions, create a database, insert the dataset into the database, and run a full text search and print the results.

If everything goes well, you should see the results printed in the terminal.

🛠️ Usage

Currently, we only support Hugging Face datasets. Let's load a dataset with images and captions.

from pgsql_search.loader import HuggingFaceDatasets

ds = HuggingFaceDatasets("UCSC-VLAA/Recap-COCO-30K") # Load the dataset
ds.save_images("../data/images") # Save the images to a local folder
ds = ds.select_columns(["image_filepath", "caption"]) # Select the columns we want to use

ds.dataset is a Hugging Face Dataset object. You are free to perform any operations supported by the datasets package.

ds.dataset

Dataset({
    features: ['image_filepath', 'caption'],
    num_rows: 30504
})

From ds.dataset we see that we have 30504 rows in the dataset with 2 columns: image_filepath and caption. Now we can create a database and insert the dataset into the database.

from pgsql_search.database import PostgreSQLDatabase, ColumnType

PostgreSQLDatabase.create_database("my_database")

Insert the dataset into the database:

df = ds.dataset.to_pandas()

with PostgreSQLDatabase("my_database") as db:
    db.initialize_table("image_metadata")
    db.add_column("image_filepath", ColumnType.TEXT, nullable=False)
    db.add_column("caption", ColumnType.TEXT, nullable=True)

    db.insert_dataframe(df)

Once completed, we can run a full text search on the database.

from pgsql_search.database import PostgreSQLDatabase

query = "man in a yellow shirt"

with PostgreSQLDatabase("my_database") as db:
    res = db.full_text_search(
        query=query, 
        table_name="image_metadata", 
        search_column="caption", 
        num_results=10
    )

The output is a pandas DataFrame with the results and query.

output.mp4

If you'd like to inspect the database, you can do so with the following command:

pixi run inspect-db

This will open an interactive terminal to inspect the database.

pgsql-inspectdb.mp4

If you want to stop the database server, you can do so with the following command:

pixi run stop-db

And to remove the database entirely:

pixi run remove-db

Test

pixi run -e test pytest

Name		Name	Last commit message	Last commit date
Latest commit History 99 Commits
assets		assets
nbs		nbs
scripts		scripts
src/pgsql_search		src/pgsql_search
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pixi.lock		pixi.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌟 Key Features

📦 Installation

🚀 Quickstart

🛠️ Usage

Test

About

Releases

Packages

Languages

License

dnth/pgsql-search

Folders and files

Latest commit

History

Repository files navigation

🌟 Key Features

📦 Installation

🚀 Quickstart

🛠️ Usage

Test

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages