[DMP 2024]: Clustering large amount of videos #81

dennyabrain · 2024-02-16T02:40:58Z

Ticket Contents

Description

Feluda allows researchers, factcheckers and journalists to explore and analyze large quantity of multimeda content. One important modality on Indian social media is video. The scope of this task is to explore various automated techniques suited for this task and after consultation with the team, implement an end to end workflow that can be used to surface visual or temporal trends in a large collection of videos.

Goals

Review Literature with our team and do research and prototyping to review state of the art ML and classical DSP techniques
Optimize the solution for consistent RAM and CPU usage (limit the spikes caused by variables like file size, video length etc) since it will need to scale up for million videos.
Integrate the solution into Feluda by creating a operator that adheres to Feluda operator's interface

Expected Outcome

Feluda's goal is to provide a simple CLI or scriptable interface for Analysing multimodal social media data. In that vein, all the work that you do should be executable and configurable via scripts and config files. The solution should look at feluda's architecture and its various components to identify best ways to enable this.
The solution should have a way to configure data source (database with file IDs or a S3 bucket with files), specify and implement the data processing pipeline and where the result will be stored. Our current implementation uses S3 and SQL database for data source and Elasticsearch for storing result but additional sources or stores can be added if apt for this project.

Acceptance Criteria

Regular Interactive Demos with the team using a public jupyter notebook pushed to our experiments repository
Working feluda operator with tests that can be run as an independent worker in the cloud to schedule processing jobs over a large dataset
Output Structured data that can be passed onto a UI service (web or mobile) for downstream use cases

Implementation Details

One way we have approached this is by using Vector Embeddings. We have done this to great success to surface visual trends in Images. We used ResNet model to generate vector embeddings and store them in elasticsearch. We also used t-sne to reduce the dimensions of the vector embeddings to then display them in a 2D visualization. It can be viewed here
A detailed report over feluda's usage in a project to analyze images can be read here
The relevant feluda operator can be studied here
The code for tsne is here
A prior study of various ways to get insights out of images has been documented here

Mockups/Wireframes

This is an interactive visualization of Image clustering done using Feluda.

Doing UI development or integrating with any UI software is not part of this project but it might help to see what sort of downstream applications we use Feluda for.

Product Name

Feluda

Organisation Name

Tattle

Domain

Open Source Library

Tech Skills Needed

Computer Vision, Docker, Machine Learning, Performance Improvement, Python

Mentor(s)

@dennyabrain @duggalsu

Category

Data Science, Machine Learning

Sayanjones · 2024-04-08T19:55:04Z

Hey @dennyabrain I'm Sayan, am interested in contributing to the video analysis project! My skills in computer vision, machine learning, and Python are a great fit. I'm eager to explore video analysis using techniques like vector embeddings.

Proficient in Docker and performance optimization, I can ensure the solution scales efficiently. I value open-source development and look forward to contributing demos.

Is there a way you prefer for me to reach out? I'm looking forward to exploring how I can contribute.

dennyabrain · 2024-04-09T03:44:31Z

Hi @Sayanjones we can use this issue to communicate approaches. If you start concretely implementing something, you can make a new issue specific to your approach and we can take the conversation there.

Ris-code · 2024-04-10T22:42:12Z

Hi @dennyabrain

I'm Rishav Aich, pursuing my BTech in artificial intelligence and data science from IIT Jodhpur. Being a student of AI, I have done courses on deep learning, machine learning, and AI. I am proficient in C++, Python, and R programming languages. I have a strong background in development, more specifically, backend development. I have used Docker in various projects.

This project completely aligns with my skills. It would be great to contribute to this.

Please advise me on how to get started with the project.

AbhimanyuSamagra · 2024-04-12T12:45:00Z

Do not ask process related questions about how to apply and who to contact in the above ticket. The only questions allowed are about technical aspects of the project itself. If you want help with the process, you can refer instructions listed on Unstop and any further queries can be taken up on our Discord channel titled DMP queries.

Aryankb · 2024-04-15T21:54:08Z

Hey @dennyabrain , This is Aryan from IIIT - Naya Raipur, I am currently persuing my B. Tech in DATA SCIENCE AND ARTIFICIAL INTELLIGENCE. I have good experience in deep learning , computer vision, and NLP. I've worked on several projects, such as self-driving cars using camera input. I am really excited to work on this project as I feel this is a perfect match for me. Also, I am going to learn Docker in the future.

dennyabrain · 2024-04-16T10:48:30Z

Hi everyone,

Thank you for expressing interest in this issue. Depending on your interests and skills, you can take ANY ONE of the following approaches :

Look at the problem statement and propose your approach
Remember the main problem statement - Given a large number of video files, find a way to group identical and similar video files. This approach would be ideal for anyone who is interested in or studies ML and/or DSP. By thinking about the problem statement, reviewing existing literature on it and proposing your approach here, we would all learn something from it and the mentors should be able to nudge you in the right direction.
Try getting feluda working on your machine
Feluda is a moderately complex software and has many moving parts. Getting it working on your machine itself can be a challenge. We have a guide on it here. If you are is a software developer/tinkerer, this might be a good place to start because once you have Feluda working locally and you can see the various existing functionalities, that might give you an idea of how to proceed.
Recreate our code on a jupyter notebook or google collab notebook
We already have some code that takes video files and converts them into vectors. We also have code that takes these vectors and clusters them. I would take this approach if you are a software engineer with some ML engineering skills and you know your way around using ML models. Once you get this working on your notebook we can try out different pretrained models to evaluate performance.

You'll have me or members from our team to guide if you get stuck on any of these approaches. Taking some conrete steps on any of these 3 steps would help us know what your interests and skills are and give you concrete feedback when you get stuck.

All the best!

AbhimanyuSamagra · 2024-04-23T10:48:13Z

Do not ask process related questions about how to apply and who to contact in the above ticket. The only questions allowed are about technical aspects of the project itself. If you want help with the process, you can refer instructions listed on Unstop and any further queries can be taken up on our Discord channel titled DMP queries. Here's a Video Tutorial on how to submit a proposal for a project.

Aryankb · 2024-04-25T21:51:05Z

Hey @dennyabrain , i have some queries regarding the project :-

what will be the length of videos?
Is there any available dataset with pre-defined classes ?
A video is a combination of audio, images and texts. what should be the most important classification criteria out of these?
How many classes should be there for classification? please give some examples.

aatmanvaidya · 2024-04-26T03:19:01Z

Hey @dennyabrain , i have some queries regarding the project :-

what will be the length of videos?

Is there any available dataset with pre-defined classes ?

A video is a combination of audio, images and texts. what should be the most important classification criteria out of these?

How many classes should be there for classification? please give some examples.

Hi @Aryankb

Generally, expect the length to be anywhere between 30sec - 20mins.
Currently we don't have a dataset with pre-defined classes, but feel free to look for such datasets
To the best of my knowledge, a video is just a series of images, so to answer your question, the most important classification criteria would be image. Please investigate a bit deeper into this. Also take a look at the 3rd point in @dennyabrain comment. That is an example of clustering images using a certain type of embedding.
There is no specific number of classes, but think of classes as metadata to these videos in the context of social media. Some examples could be - memes, political, health, paper documents, news etc, these are very broad labels, you can think of some specific ones too.

Hope this helps

Mithilesh1609 · 2024-04-27T07:38:59Z

Hey @dennyabrain, Mithilesh here, I have experience and passion for creating end-to-end, highly scalable computer vision pipelines, I am working with a young start-up as a machine learning engineer, I have led a similar project implementation for one of the largest edTech companies in the world, where we worked on clustering on a similar type of video(avg length of 10 mins) and then recommended video based on user mistake in the test, where we work with embedding creation and efficient search algorithm, apart from this I have lead creating and scaling of the computer vision based exam grading tool from 50 users to 4 million users with docker and AWS, and bring down the running time by 70% over three iterations and that help government organize world's largest AI graded examination. I am very eager to contribute in this project and make clustering of video more efficient and scale it fast.

Aryankb · 2024-05-01T14:15:12Z

Hey @dennyabrain , I am Aryan Kumar Baghel, from IIIT - NAYA RAIPUR
I was exploring the ways to extract unique frames from the video. I tried to extract unique keyframes from some videos using ffmpeg - to extract keyframes from the video , k-means- to extract unique keyframes from keyframes extracted by ffmpeg, and here are the results :-

(We can select one image from each cluster, as the representation of that cluster, then further we can use some image captioning models to generate small captions for each image. Next we can combine all captions to generate the final caption for the video or use them to classify the video accurately.)

Google Collab Notebook

Video Link : https://drive.google.com/file/d/1Qr08m4Bf0JjTszExDLoey2LCqcJjJl3n/view?usp=drive_link
Clusters :

Video 2 link : https://drive.google.com/file/d/1QnupjsK7ILQUYrqlPT2pTdTAzoy8Wi-C/view?usp=drive_link
Clusters :

I'll be now working on ways to cluster the images such that it selects the no. of clusters automatically, Please give your reviews and directions for the future work.

aaradhyasinghgaur · 2024-05-04T10:37:24Z

Hi everyone,

Thank you for expressing interest in this issue. Depending on your interests and skills, you can take ANY ONE of the following approaches :

1. Look at the problem statement and propose your approach
   Remember the main problem statement - Given a large number of video files, find a way to group identical and similar video files. This approach would be ideal for anyone who is interested in or studies ML and/or DSP. By thinking about the problem statement, reviewing existing literature on it and proposing your approach here, we would all learn something from it and the mentors should be able to nudge you in the right direction.

2. Try getting feluda working on your machine
   Feluda is a moderately complex software and has many moving parts. Getting it working on your machine itself can be a challenge. We have a guide on it [here](https://github.com/tattle-made/feluda/wiki/Setup-Feluda-Locally). If you are is a software developer/tinkerer, this might be a good place to start because once you have Feluda working locally and you can see the various existing functionalities, that might give you an idea of how to proceed.

3. Recreate our code on a jupyter notebook or google collab notebook
   We already have some code that takes [video files and converts them into vectors](https://github.com/tattle-made/feluda/blob/main/src/core/operators/vid_vec_rep_resnet.py). We also have code that takes these vectors and [clusters them](https://github.com/tattle-made/data-experiments/blob/master/tSNE-clustering.ipynb). I would take this approach if you are a software engineer with some ML engineering skills and you know your way around using ML models. Once you get this working on your notebook we can try out different pretrained models to evaluate performance.

You'll have me or members from our team to guide if you get stuck on any of these approaches. Taking some conrete steps on any of these 3 steps would help us know what your interests and skills are and give you concrete feedback when you get stuck.

All the best!

Hey @dennyabrain ,
I'm Aaradhya Singh , currently a 2nd year undergrad of computer science and engineering , proficcient in C/C++ , python , deep learning and machine learning and a researcher and learner for various upcoming technlogies and tech stacks...after reading at your suggested approches ......I might be able to fine tune some models to the efficiency which are mostly built upon CNN/RNN architectures and use pipeline/heirarchical approach to solve the complex problem of the classification or creating clusters of the content....looking forward to work on it and updating on my findings

dennyabrain · 2024-06-17T03:52:17Z

@Snehil-Shah can you comment here, so I can assign the issue to you?

Snehil-Shah · 2024-06-17T04:07:13Z

@dennyabrain Yes.

aatmanvaidya · 2024-06-17T11:19:07Z

Snehil-Shah · 2024-06-25T19:20:47Z

Weekly Learnings & Updates

Week 1

Set up my local development environment and workflow.
Set up a timeline and weekly check-ins with the mentors as part of the onboarding process.

Week 2

Benchmarked popular image embedding models to extract semantic features from video frames and average them into a video vector.
Compared pre-trained models like ResNet18, CLIP-ViT-B-32, EfficientNet-B0 and DeiT-medium-16.
Neurally encoded around 100 videos from a combined UCF101 subset and a custom dataset of popular topics like memes, nature, and commentary.
Plotted t-SNE reduced vectors to evaluate clustering and visual distribution:
Clustered them using k-means and examined each cluster to evaluate clusters and spot outliers:

Week 3

Reviewed literature and ran experiments on video transformers and 3D neural net architectures.
Benchmarked various video embedding models to extract active features and capture frame interpolation.
Compared pre-trained models like I3D-R50, R3D-18, SlowFast-R50, VideoMAE, ViViT and X-CLIP.
Ran inference using the above tests by plotting t-SNE reduced vectors and individual clusters.
Achieved near zero outliers and true action recognition that was missing with image embedding models.
Successfully clustered 100 videos into 13 classes each correctly corresponding to the original classes, all with just one outlier.
Finalized a hybrid approach simulating a multi-stream pipeline to capture the static and active aspects of a video.
Notebook containing all benchmarks, experiments, inferences, and opinions mentioned till now for reference.

Week 4

Explored and implemented zero-shot classification with promising results utilizing the multi-modal nature of CLIP-based transformer models.
This meant being able to classify videos into newer classes without any fine-tuning and retraining of a new linear head. It works by relying on vector similarity between video embeddings and text embeddings (made from the class names) in a common vector space.

This would allow fact-checkers and researchers to surface visual trends based on labels such as "newspaper", "screenshots", "memes" etc. without any additional training overhead.
Benchmarked various clustering algorithms from scikit-learn on efficiency and results by plotting individual clusters.
Finalized k-means and agglomerative clustering if the number of clusters are known and Affinity Propagation for unknown number of clusters.
Notebook containing all benchmarks, experiments, inferences, and opinions on zero-shot classification and clustering algorithms.

Week 5

Explored various frame sampling strategies for sampling both static and active aspects of a video.
Tried sampling static aspects using methods like QR decomposition, shot-transition detection, sampling I-frames using ffmpeg, and sampling cluster centroids by clustering frames.
Tried sampling active aspects using methods like simple RGB subtraction between near-adjacent frames, farneback's optical flow algorithm and improved it with background subtraction.
Extracting the most active parts of the video is tricky as it can be receptive to noise like frequent shot transitions and shaky camera work but datasets like the kinetics (most video embedding models mentioned above are trained on it) contain short, simple action sequences with a static background and a clear subject. Basically, the model wouldn't be able to identify action from multi-angle action sequences as seen in movies.
One solution can be using shot transition detection to isolate each shot using shot-transition detection, and then individually measuring optical flow in each window.

Week 6

Built a custom dataset of ~120 videos depicting the Indian social media context with videos of varying lengths, qualities, and subjects meant to capture the diversity of media the operator can expect when deployed for downstream tasks like fact-checking and finding visual trends on social media research in India.
This will allow us to run inference on production expectations and further tune the pipeline.
Profiled and benchmarked CLIP-ViT-B-32 and ResNet18 for CPU and memory usage using memray and pyinstrument to estimate deployment requirements.

Week 7

Ran inference on our clustering and zero-shot classification pipelines using our custom dataset to gauge performance on our use case.
Achieved 76-82% accuracy on our zero-shot classifier. This accuracy is really good given there was no additional training done on our custom dataset (consisting of various indian-context videos of varied lengths and qualities) for classes the model has never seen before.
Notebooks containing inferences on clustering and zero-shot classification using a custom dataset.

Week 8

Worked on a Feluda operator implementing the above pipeline and wrote tests for it, adhering to the Feluda operator interface.
Pull request for the same.

Week 9

Profiled the above operator for CPU and memory usage using memray and pyinstrument to estimate deployment requirements.

Check out the full profiling findings and conclusions here.
Worked on a Feluda operator for video classification using a zero-shot approach, adhering to the Feluda operator interface.
Pull request for the same.

Week 10

On leave

Week 11

Worked on a simple Feluda operator for clustering embeddings from sources of various modalities and supporting multiple modes of operation, adhering to the Feluda operator interface.
Pull request for the same.

Week 12

Started work on the worker for clustering media items. Worked on setting up config files and Dockerfile for the Feluda worker with relevant operators and RabbitMQ queue configurations.
Documented the new operators.

Week 13

Completed the worker logic and payload writer.
Documented the worker.
Pull request for the worker.

dennyabrain added the DMP 2024 label Feb 16, 2024

dennyabrain mentioned this issue Apr 16, 2024

[April 14 - April 27] Engage with contributors #270

Closed

dennyabrain mentioned this issue Apr 25, 2024

feat: Improve UI for Slur Crowdsource Feature tattle-made/Uli#546

Merged

Snehil-Shah mentioned this issue Apr 27, 2024

Clustering videos using vector-similarity #277

Closed

Aryankb mentioned this issue May 1, 2024

Extraction of unique keyframes from video #287

Closed

dennyabrain mentioned this issue Jun 14, 2024

Clustering large amount of videos #354

Closed

7 tasks

aatmanvaidya assigned Snehil-Shah Jun 17, 2024

dennyabrain closed this as completed Oct 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DMP 2024]: Clustering large amount of videos #81

[DMP 2024]: Clustering large amount of videos #81

dennyabrain commented Feb 16, 2024 •

edited

Loading

Sayanjones commented Apr 8, 2024

dennyabrain commented Apr 9, 2024

Ris-code commented Apr 10, 2024

AbhimanyuSamagra commented Apr 12, 2024

Aryankb commented Apr 15, 2024 •

edited

Loading

dennyabrain commented Apr 16, 2024

AbhimanyuSamagra commented Apr 23, 2024

Aryankb commented Apr 25, 2024

aatmanvaidya commented Apr 26, 2024

Mithilesh1609 commented Apr 27, 2024

Aryankb commented May 1, 2024

aaradhyasinghgaur commented May 4, 2024

dennyabrain commented Jun 17, 2024

Snehil-Shah commented Jun 17, 2024

aatmanvaidya commented Jun 17, 2024 •

edited

Loading

Snehil-Shah commented Jun 25, 2024 •

edited

Loading

[DMP 2024]: Clustering large amount of videos #81

[DMP 2024]: Clustering large amount of videos #81

Comments

dennyabrain commented Feb 16, 2024 • edited Loading

Ticket Contents

Description

Goals

Expected Outcome

Acceptance Criteria

Implementation Details

Mockups/Wireframes

Product Name

Organisation Name

Domain

Tech Skills Needed

Mentor(s)

Category

Sayanjones commented Apr 8, 2024

dennyabrain commented Apr 9, 2024

Ris-code commented Apr 10, 2024

AbhimanyuSamagra commented Apr 12, 2024

Aryankb commented Apr 15, 2024 • edited Loading

dennyabrain commented Apr 16, 2024

AbhimanyuSamagra commented Apr 23, 2024

Aryankb commented Apr 25, 2024

aatmanvaidya commented Apr 26, 2024

Mithilesh1609 commented Apr 27, 2024

Aryankb commented May 1, 2024

aaradhyasinghgaur commented May 4, 2024

dennyabrain commented Jun 17, 2024

Snehil-Shah commented Jun 17, 2024

aatmanvaidya commented Jun 17, 2024 • edited Loading

Weekly Goals

Week 1

Week 2

Week 3

Week 4

Week 5

Week 6

Week 7

Week 8

Week 9

Week 10

Week 11

Week 12

Week 13

Snehil-Shah commented Jun 25, 2024 • edited Loading

Weekly Learnings & Updates

Week 1

Week 2

Week 3

Week 4

Week 5

Week 6

Week 7

Week 8

Week 9

Week 10

Week 11

Week 12

Week 13

dennyabrain commented Feb 16, 2024 •

edited

Loading

Aryankb commented Apr 15, 2024 •

edited

Loading

aatmanvaidya commented Jun 17, 2024 •

edited

Loading

Snehil-Shah commented Jun 25, 2024 •

edited

Loading