This project implements a Library Recommendation System using Python, DuckDB, and OpenAI's GPT-based language models. It includes data ingestion, a recommender module, and an application for suggesting books to users based on their preferences and past interactions.
-
Data Ingestion:
- Loads data from CSV files into a DuckDB database.
- Prepares the
books
,users
, anduser_checkouts
tables.
-
Recommender Module:
- Uses OpenAI's GPT model (via LangChain) to generate book recommendations.
- Structured responses include book details, a synopsis, and a personalized email-ready summary.
-
Main Application:
- Executes the ingestion process and runs the main recommendation application seamlessly.
-
Streamlit Application:
- User interface that also that acts as a database manager and recommender UI for a librarian
- Contains ability to add users/books and track checkouts
- Ability in future to send user an email with recommendation
-
Dockerized Setup:
- Dockerfile provided for easy deployment and environment consistency.
- Runs on Python 3.11.
- Docker
- Python 3.11 (optional, if running locally)
- OpenAI API Key
project/
│
├── Dockerfile # Docker configuration
├── requirements.txt # Python dependencies
├── main.py # Entry point for the application
├── recommender.py # Recommender system implementation
├── app.py # UI and Streamlit App
├── data_collection/ # Directory for data ingestion scripts
│ └── ingest.py # Data ingestion script
├── data/ # Directory for data and database files
│ ├── library_db.duckdb # DuckDB database (auto-created on start-up)
| ├── ingest.py # Running data ingestions
│ └── csv/ # Directory for CSV input files
│ ├── book.csv # Books data
│ ├── checkouts.csv # User checkouts data
│ └── users.csv # Users data
Build the Docker Image
docker build -t library-recommender --build-arg OPENAI_API_KEY=your_openai_api_key .
Run the Application
docker run -p 8501:8501 library-recommender
- The container mounts the data directory for accessing CSV files and saving the DuckDB database.
Install Dependencies
Ensure you have Python 3.11 and install the required libraries:
pip install -r requirements.txt
Set Environment Variables
Create a .env file for the OpenAI API key:
export OPENAI_API_KEY=your_openai_api_key
Run data ingest
python ./data/ingest.py
Run the Application
streamlit run app.py
Below is a preview of the Library Recommendation System in action:
The entry point for the application:
- Runs ingest.py to load data into DuckDB.
- Executes the recommendation system via app.py.
The entry point for the application:
- Has ability to interact with database, like an admin terminal (Not all ACID methods)
- Can display recommendations and updates recommendations based on database changes
Handles data ingestion:
- Reads book.csv, checkouts.csv, and users.csv.
- Creates and populates the DuckDB tables: books, users, and user_checkouts.
Implements the Library Recommender:
- Defines a LibraryRecommender class that:
- Connects to the DuckDB database.
- Fetches user and catalog data.
- Generates structured recommendations using OpenAI’s GPT models via LangChain.
- Responses include:
- Recommended book title.
- A brief synopsis.
- Personalized feedback and a grade.
- A summarized email-ready message.
Builds a containerized environment with:
- Python 3.11.
- Dependencies from requirements.txt.
- Prepares /app/data/csv for input data.
Example Workflow
- Data Ingestion:
- ingest.py processes and uploads data to DuckDB.
- Recommendation Generation:
- Input: User name and query (e.g., “I enjoy fantasy novels”).
- Output: A recommendation response with details about a relevant book.
- Dockerized Execution:
- Run the complete workflow in a reproducible container.
Input:
User: Alice
Query: I love books with strong female protagonists and magical worlds.
Output:
{
"book_id": "1",
"title": "Matilda",
"feedback": "This book features a bright young girl with extraordinary powers who overcomes challenges.",
"grade": 95,
"synopsis": "A story about Matilda, a gifted child navigating a world of neglectful parents and an oppressive headmistress.",
"summarized_email_body": "Hi Alice, based on your interest in strong female protagonists and magical worlds, we recommend 'Matilda'. Here's why you'll love it: 'Matilda' tells the inspiring story of a brilliant young girl overcoming the odds with her intelligence and extraordinary abilities."
}
Python Dependencies (requirements.txt)
pandas
duckdb
langchain
langchain_openai
openai
python-dotenv
pydantic
Common Issues
- DuckDB Data Ingestion Fails:
- Ensure CSV files are correctly formatted.
- Verify that all data values are compatible with DuckDB column types.
- OpenAI API Key Error:
- Ensure the .env file contains a valid OPENAI_API_KEY.
- Check API usage limits on your OpenAI account.
- Docker Volume Mount Issues:
- Verify that the data directory exists locally and is mounted correctly to /app/data inside the container.
Acknowledgments
- DuckDB: In-memory SQL database engine.
- LangChain: Framework for building applications with LLMs.
- OpenAI: Provider of GPT-based language models.
- Developers: Use Docker for consistent environments.
- Data Scientists: Extend the recommendation logic by experimenting with LangChain prompts or DuckDB queries.
- Demonstrations: Showcase the workflow end-to-end with easy Docker deployment.