-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clustering videos using vector-similarity #277
Comments
@Snehil-Shah This is very good progress :) I have some general improvements/ideas about this project to share and then specific things about feluda. General Improvements :
Specific Feedback for FeludaLargely the project is composed of the following components
Generally go through the wiki to learn more. I think this should be pretty useful to setup feluda locally - https://github.com/tattle-made/feluda/wiki/Setup-Feluda-Locally A quick note about operators. So far our operators work on individual items. But for this project we might be for the first time figuring out how to make operators that work on collections. So that part would be novel and feel free to think about how you'd solve it. |
Hi @Snehil-Shah , great work!! Even I wanted to ask one thing,
tsne_embeddings = TSNE(n_components=2, learning_rate=150, perplexity=20, angle=0.2, verbose=2).fit_transform(X)
|
Just from the visual interpretation of the graph: (as circled in the issue description)
Not all original labels are distinctly classified, but it still can classify similar videos together.. |
@aatmanvaidya I went ahead and clustered them into 10 labels (using K-Means) and tried to print a visualization by shading each image label differently (updated the notebook). This is how it went: The coordinates of the images is different from before as the |
I wanted to add to the conversation you are having about dimensions that it's consistent with how we have done this in the past. For search we use multi dimensional vectors. The dimension reduction is strictly used for visualization. When we did the first iteration, we chose 2D for tsne simply coz it was easier to render on a 2D canvas on web. We can try 3D if we have the time I guess. |
okay understood the point about dimensionality reduction - if its strictly for visualization then its fine, meaning 2D is fine. @Snehil-Shah I looked at the updated notebook code, things look good to me for now
|
@Snehil-Shah Not sure if you have made up your mind between Uli and Feluda. Do consider submitting a proposal for clustering videos project in Feluda. I think you'll appreciate the complexity in this one and we could also use some focussed in depth exploration on this problem as part of this project. |
also Do consider joining our slack. As we are nearing the proposal submission deadline, it could be handy for solving any doubts. https://admin417477.typeform.com/to/nVuNyG?typeform-source=tattle.co.in |
@dennyabrain I tried doing that, but it says it requires a tattle email address or an invitation |
@dennyabrain I am definitely inclined towards Feluda, it feels more challenging and will be a great learning experience. On a side note, I was thinking of submitting all three of my proposals to Tattle's projects, just so there is some flexibility on your end. Is that alright? |
@Snehil-Shah that works for me. I think hopefully the two issues of audio and video have a lot of commonality and less work for you :) regarding slack, please share your email. I'll send an invite. |
closing this issue because the DMP program has started. |
Related to #81
Description
@dennyabrain I tried clustering around 300 videos (from this dataset) using algorithms from your experiment's repo.
Google colab notebook
I first used your approach of taking 5 frames of a video, extracting their features using the RESNET model and taking their average to generate the final embedding. And then using your approach of t-SNE reduction, plotted the thumbnails on a graph:
Observations listed in the notebook
I will be doing some R&D on some other ways to extract features from videos and using different models in our current approach as well (like CLIP which I have used before).
I will be now be working on setting up feluda and studying how feluda operators work etc. Would appreciate some directions...
The text was updated successfully, but these errors were encountered: