Merge pull request #1338 from qdrant/blog-facial-recognition

[blog] Facial Recognition - Twin Celebrity App
qdrant · Dec 4, 2024 · 50adcff · 50adcff
2 parents b59910d + 9da2cab
commit 50adcff
Showing 5 changed files with 146 additions and 0 deletions.
diff --git a/qdrant-landing/content/blog/facial-recognition.md b/qdrant-landing/content/blog/facial-recognition.md
@@ -0,0 +1,146 @@
+---
+title: "Building a Facial Recognition System with Qdrant"
+draft: false
+short_description: "Combine AI, FaceNet, and Qdrant to build a cool app."
+description: "Build an AI app that uses facial recognition embeddings & vector search to match users with their celebrity look-alikes."
+preview_image: /blog/facial-recognition/social_preview.png
+social_preview_image: /blog/facial-recognition/social_preview.png
+date: 2024-12-03T00:00:00-08:00
+author: David Myriel
+featured: false
+tags:
+  - vector search
+  - embeddings
+  - facial recognition
+  - Qdrant
+  - Streamlit
+  - ZenML
+  - data visualization
+---
+
+# The Twin Celebrity App 
+
+In the era of personalization, combining cutting-edge technology with fun can create engaging applications that resonate with users. One such project is the [**Twin Celebrity app**](https://github.com/neural-maze/vector-twin), a tool that matches users with their celebrity look-alikes using facial recognition embeddings and [**vector search**](/advanced-search/) powered by Qdrant. This blog post dives into the architecture, tools, and practical advice for developers who want to build this app—or something similar.
+
+The [**Twin Celebrity app**](https://github.com/neural-maze/vector-twin) identifies which celebrity a user resembles by analyzing a selfie. The app utilizes:
+- **Face recognition embeddings**: Generated by a ResNet-based **FaceNet** model.
+- **Vector similarity search**: Powered by Qdrant to find the closest match.
+- **ZenML**: For orchestrating data pipelines.
+- **Streamlit**: As the front-end interface.
+
+> This project not only demonstrates the capabilities of modern vector databases but also serves as an exciting introduction to embedding-based applications.
+
+---
+
+## Learn From the App's Creator
+
+We interviewed the engineer behind this project, [**Miguel Otero Pedrido**](https://www.linkedin.com/in/migueloteropedrido/), who is also the founder of [**The Neural Maze**](https://www.youtube.com/@TheNeuralMaze). Miguel explains in detail how he put the app together, as well as his choice of tools.
+
+<iframe width="560" height="315" src="https://www.youtube.com/embed/UJ2jTEBae3A?si=m9sHtiXTY4n0OsB2" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
+
+Miguel recently published a video on his YouTube channel: [**The Neural Maze**](https://www.youtube.com/@TheNeuralMaze).
+
+For detailed steps to build the app, watch [**Building a Twin Celebrity App**](https://www.youtube.com/watch?v=LltFAum3gVg).
+___
+
+## Architecture
+
+**Search Engine:** [Qdrant](https://qdrant.tech) stands out as a high-performance [**vector database**](/qdrant-vector-database/) built in Rust, known for its reliability and speed. Its advanced features, such as [**vector visualization**](/documentation/web-ui/) and efficient [**querying**](/documentation/concepts/search/), make it a go-to choice for developers working on embedding-based projects. 
+
+![architecture](/blog/facial-recognition/architecture.png)
+
+**ML Framework:** [ZenML](https://www.zenml.io) simplifies pipeline creation with a modular, cloud-agnostic framework that ensures clean, scalable, and portable code, ideal for cross-platform workflows.
+
+**Facial Recognition:** [MTCNN](https://github.com/ipazc/mtcnn#) ensures consistent face alignment, making the embeddings more reliable.
+
+**Embedding Model:** [FaceNet](https://github.com/davidsandberg/facenet) provides lightweight, pre-trained facial embeddings, balancing accuracy and efficiency, making it perfect for tasks like the Twin Celebrity app.
+
+**Frontend:** [Streamlit](https://github.com/streamlit) streamlines UI development, enabling rapid prototyping with minimal effort, allowing developers to focus on core functionalities.
+
+## Application Workflows
+
+The app is divided into two phases - **The Offline Phase**, where the celebrity images are vectorized and **The Online Phase**, which carries out a live [**similarity search**]().
+
+![online-offline](/blog/facial-recognition/online-offline.png)
+
+**The Offline Phase**
+
+The first step is dataset preparation. Celebrity images are fetched from **HuggingFace’s dataset library** to serve as the foundation for embeddings.
+
+Next - [**MTCNN**](https://github.com/ipazc/mtcnn#) aligns celebrities faces within images.
+Then, a pre-trained [**FaceNet**](https://en.wikipedia.org/wiki/FaceNet) model is used to generate 512-dimensional embeddings for each image. This ensures consistent and high-quality representation of facial features.
+
+Finally, these embeddings, along with metadata, are stored in [**Qdrant Cloud**](/cloud/). This enables efficient retrieval and management of the data for later use.
+
+---
+
+**The Online Phase**
+
+In the online phase, user interaction begins with a **Streamlit app**. The app captures a selfie and converts it into an embedding using the same FaceNet model.
+
+The generated embedding is then queried against Qdrant, which retrieves the top matches based on similarity.
+
+Finally, the results are displayed in an intuitive interface, showing the user their **closest celebrity match** and making the interaction engaging and seamless.
+
+---
+
+## How to Build the App
+
+### 1. Set Up the Offline Pipeline
+Using ZenML, the pipeline consists of:
+- **Data Loading**: Fetch images and labels (e.g., "Brad Pitt") from Hugging Face.
+- **Sampling**: Reduce dataset size for faster processing, selecting around 3,000 images.
+- **Embedding Generation**: Convert images into embeddings using MTCNN for face detection and FaceNet for embedding creation.
+- **Storage in Qdrant**: Save embeddings into a collection named `celebrities`.
+
+### 2. Create the Online Application
+The Streamlit app handles:
+- **Image Capture**: Takes a selfie through a webcam or uploaded file.
+- **Embedding Querying**: Sends the embedding to Qdrant, retrieves the top matches, and visualizes the similarity.
+
+### 3. Deployment Options
+
+- Deploy the app on platforms like **Google Cloud**, **AWS**, or **Azure**. Setting up CI/CD pipelines can streamline updates and deployments.
+
+-  The application can be containerized using **Docker**. For hosting, **Google Cloud Run** is an excellent choice, as it efficiently manages containerized applications without requiring extensive infrastructure management. 
+
+- The deployment process is streamlined further with CI/CD pipelines, such as those provided by **Cloud Build or GitHub Actions**, which automate the steps for building, testing, and deploying updates. 
+
+### 4. Test the Quality of Your Embeddings
+
+You can always use [**Qdrant’s visualization tools**](/documentation/web-ui/) to refine accuracy and ensure clusters align with expectations.
+
+![architecture](/blog/facial-recognition/web-ui.png)
+
+If your data is properly embedded, then the visualization tool will appropriately cluster celebrity images into groups.
+
+---
+
+## Lessons and Takeaways
+
+Scalability poses challenges when working with large datasets, such as 20,000+ images. Consider optimizations like [**quantization**](/documentation/guides/quantization/) to reduce memory usage or precomputing average embeddings for clusters can significantly minimize storage and computational costs. These strategies ensure the system remains performant as the dataset grows.
+
+The potential real-world applications of this technology extend far beyond entertainment. Similar systems can be used in security applications for embedding-based facial recognition to secure access to buildings or devices. 
+
+In **healthcare**, they can assist in analyzing features such as moles or skin textures. In **retail**, they enable personalized recommendations based on user photos, demonstrating the versatility of this approach.
+
+---
+
+## Next Steps for Developers
+
+- Start by [**cloning the project repository**](https://github.com/neural-maze/vector-twin) to understand the architecture and functionality. 
+
+- Expand the dataset with more celebrity images for diversity or fine-tune the FaceNet model for improved accuracy. 
+
+- Consider deploying a mobile-friendly version using frameworks like **Flutter** or **React Native** for a seamless user experience.
+
+> For scalability, implement **multi-GPU setups** to speed up embedding generation and optimize storage with techniques like quantization or average embeddings. 
+
+To enhance functionality, explore features like **video input for real-time matches** or add **metadata such as celebrity bios** to enrich user interaction. Experiment with custom similarity scoring for more tailored results. 
+
+## More Links
+
+- Miguel's [LinkedIn profile](https://www.linkedin.com/in/migueloteropedrido/)
+- Miguel's [Substack blog](https://theneuralmaze.substack.com)
+- The Neural Maze [YouTube channel](https://www.youtube.com/@TheNeuralMaze)
+- Twin Celebrity [GitHub Repository](https://github.com/neural-maze/vector-twin)
diff --git a/qdrant-landing/static/blog/facial-recognition/architecture.png b/qdrant-landing/static/blog/facial-recognition/architecture.png
diff --git a/qdrant-landing/static/blog/facial-recognition/online-offline.png b/qdrant-landing/static/blog/facial-recognition/online-offline.png
diff --git a/qdrant-landing/static/blog/facial-recognition/social_preview.png b/qdrant-landing/static/blog/facial-recognition/social_preview.png
diff --git a/qdrant-landing/static/blog/facial-recognition/web-ui.png b/qdrant-landing/static/blog/facial-recognition/web-ui.png