Skip to content

Commit

Permalink
Merge pull request #1338 from qdrant/blog-facial-recognition
Browse files Browse the repository at this point in the history
[blog] Facial Recognition - Twin Celebrity App
davidmyriel authored Dec 4, 2024

Verified

This commit was signed with the committer’s verified signature.
r-n-o Arnaud
2 parents b59910d + 9da2cab commit 50adcff
Showing 5 changed files with 146 additions and 0 deletions.
146 changes: 146 additions & 0 deletions qdrant-landing/content/blog/facial-recognition.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
---
title: "Building a Facial Recognition System with Qdrant"
draft: false
short_description: "Combine AI, FaceNet, and Qdrant to build a cool app."
description: "Build an AI app that uses facial recognition embeddings & vector search to match users with their celebrity look-alikes."
preview_image: /blog/facial-recognition/social_preview.png
social_preview_image: /blog/facial-recognition/social_preview.png
date: 2024-12-03T00:00:00-08:00
author: David Myriel
featured: false
tags:
- vector search
- embeddings
- facial recognition
- Qdrant
- Streamlit
- ZenML
- data visualization
---

# The Twin Celebrity App

In the era of personalization, combining cutting-edge technology with fun can create engaging applications that resonate with users. One such project is the [**Twin Celebrity app**](https://github.com/neural-maze/vector-twin), a tool that matches users with their celebrity look-alikes using facial recognition embeddings and [**vector search**](/advanced-search/) powered by Qdrant. This blog post dives into the architecture, tools, and practical advice for developers who want to build this app—or something similar.

The [**Twin Celebrity app**](https://github.com/neural-maze/vector-twin) identifies which celebrity a user resembles by analyzing a selfie. The app utilizes:
- **Face recognition embeddings**: Generated by a ResNet-based **FaceNet** model.
- **Vector similarity search**: Powered by Qdrant to find the closest match.
- **ZenML**: For orchestrating data pipelines.
- **Streamlit**: As the front-end interface.

> This project not only demonstrates the capabilities of modern vector databases but also serves as an exciting introduction to embedding-based applications.
---

## Learn From the App's Creator

We interviewed the engineer behind this project, [**Miguel Otero Pedrido**](https://www.linkedin.com/in/migueloteropedrido/), who is also the founder of [**The Neural Maze**](https://www.youtube.com/@TheNeuralMaze). Miguel explains in detail how he put the app together, as well as his choice of tools.

<iframe width="560" height="315" src="https://www.youtube.com/embed/UJ2jTEBae3A?si=m9sHtiXTY4n0OsB2" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>

Miguel recently published a video on his YouTube channel: [**The Neural Maze**](https://www.youtube.com/@TheNeuralMaze).

For detailed steps to build the app, watch [**Building a Twin Celebrity App**](https://www.youtube.com/watch?v=LltFAum3gVg).
___

## Architecture

**Search Engine:** [Qdrant](https://qdrant.tech) stands out as a high-performance [**vector database**](/qdrant-vector-database/) built in Rust, known for its reliability and speed. Its advanced features, such as [**vector visualization**](/documentation/web-ui/) and efficient [**querying**](/documentation/concepts/search/), make it a go-to choice for developers working on embedding-based projects.

![architecture](/blog/facial-recognition/architecture.png)

**ML Framework:** [ZenML](https://www.zenml.io) simplifies pipeline creation with a modular, cloud-agnostic framework that ensures clean, scalable, and portable code, ideal for cross-platform workflows.

**Facial Recognition:** [MTCNN](https://github.com/ipazc/mtcnn#) ensures consistent face alignment, making the embeddings more reliable.

**Embedding Model:** [FaceNet](https://github.com/davidsandberg/facenet) provides lightweight, pre-trained facial embeddings, balancing accuracy and efficiency, making it perfect for tasks like the Twin Celebrity app.

**Frontend:** [Streamlit](https://github.com/streamlit) streamlines UI development, enabling rapid prototyping with minimal effort, allowing developers to focus on core functionalities.

## Application Workflows

The app is divided into two phases - **The Offline Phase**, where the celebrity images are vectorized and **The Online Phase**, which carries out a live [**similarity search**]().

![online-offline](/blog/facial-recognition/online-offline.png)

**The Offline Phase**

The first step is dataset preparation. Celebrity images are fetched from **HuggingFace’s dataset library** to serve as the foundation for embeddings.

Next - [**MTCNN**](https://github.com/ipazc/mtcnn#) aligns celebrities faces within images.
Then, a pre-trained [**FaceNet**](https://en.wikipedia.org/wiki/FaceNet) model is used to generate 512-dimensional embeddings for each image. This ensures consistent and high-quality representation of facial features.

Finally, these embeddings, along with metadata, are stored in [**Qdrant Cloud**](/cloud/). This enables efficient retrieval and management of the data for later use.

---

**The Online Phase**

In the online phase, user interaction begins with a **Streamlit app**. The app captures a selfie and converts it into an embedding using the same FaceNet model.

The generated embedding is then queried against Qdrant, which retrieves the top matches based on similarity.

Finally, the results are displayed in an intuitive interface, showing the user their **closest celebrity match** and making the interaction engaging and seamless.

---

## How to Build the App

### 1. Set Up the Offline Pipeline
Using ZenML, the pipeline consists of:
- **Data Loading**: Fetch images and labels (e.g., "Brad Pitt") from Hugging Face.
- **Sampling**: Reduce dataset size for faster processing, selecting around 3,000 images.
- **Embedding Generation**: Convert images into embeddings using MTCNN for face detection and FaceNet for embedding creation.
- **Storage in Qdrant**: Save embeddings into a collection named `celebrities`.

### 2. Create the Online Application
The Streamlit app handles:
- **Image Capture**: Takes a selfie through a webcam or uploaded file.
- **Embedding Querying**: Sends the embedding to Qdrant, retrieves the top matches, and visualizes the similarity.

### 3. Deployment Options

- Deploy the app on platforms like **Google Cloud**, **AWS**, or **Azure**. Setting up CI/CD pipelines can streamline updates and deployments.

- The application can be containerized using **Docker**. For hosting, **Google Cloud Run** is an excellent choice, as it efficiently manages containerized applications without requiring extensive infrastructure management.

- The deployment process is streamlined further with CI/CD pipelines, such as those provided by **Cloud Build or GitHub Actions**, which automate the steps for building, testing, and deploying updates.

### 4. Test the Quality of Your Embeddings

You can always use [**Qdrant’s visualization tools**](/documentation/web-ui/) to refine accuracy and ensure clusters align with expectations.

![architecture](/blog/facial-recognition/web-ui.png)

If your data is properly embedded, then the visualization tool will appropriately cluster celebrity images into groups.

---

## Lessons and Takeaways

Scalability poses challenges when working with large datasets, such as 20,000+ images. Consider optimizations like [**quantization**](/documentation/guides/quantization/) to reduce memory usage or precomputing average embeddings for clusters can significantly minimize storage and computational costs. These strategies ensure the system remains performant as the dataset grows.

The potential real-world applications of this technology extend far beyond entertainment. Similar systems can be used in security applications for embedding-based facial recognition to secure access to buildings or devices.

In **healthcare**, they can assist in analyzing features such as moles or skin textures. In **retail**, they enable personalized recommendations based on user photos, demonstrating the versatility of this approach.

---

## Next Steps for Developers

- Start by [**cloning the project repository**](https://github.com/neural-maze/vector-twin) to understand the architecture and functionality.

- Expand the dataset with more celebrity images for diversity or fine-tune the FaceNet model for improved accuracy.

- Consider deploying a mobile-friendly version using frameworks like **Flutter** or **React Native** for a seamless user experience.

> For scalability, implement **multi-GPU setups** to speed up embedding generation and optimize storage with techniques like quantization or average embeddings.
To enhance functionality, explore features like **video input for real-time matches** or add **metadata such as celebrity bios** to enrich user interaction. Experiment with custom similarity scoring for more tailored results.

## More Links

- Miguel's [LinkedIn profile](https://www.linkedin.com/in/migueloteropedrido/)
- Miguel's [Substack blog](https://theneuralmaze.substack.com)
- The Neural Maze [YouTube channel](https://www.youtube.com/@TheNeuralMaze)
- Twin Celebrity [GitHub Repository](https://github.com/neural-maze/vector-twin)
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 50adcff

Please sign in to comment.