Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recipe for embedding reduction [Videos] #418

Merged
merged 3 commits into from
Oct 24, 2024

Conversation

plon-Susk7
Copy link

This PR is related to issue #410

  1. Added notebook example for embedding reduction using dimension_reduction operator.
  2. Used Hugging face dataset "sayakpaul/ucf101-subset" which has 10 classes, as example.
  3. Used vid_vec_rep_clip operator to extract embeddings.

@aatmanvaidya aatmanvaidya self-requested a review October 24, 2024 11:17
@aatmanvaidya
Copy link
Collaborator

@plon-Susk7 great work!
just a few things

  1. can we add some more 1-2 lines descriptions in the markdown -- you have the headings, just one two more lines explaining the process - like you are extracting embedding's using CLIP etc etc
  2. I am guessing that you have to download the huggingface hub, matplotlib libraries? where are you downloading them? in the .venv? what if we download them in the notebook? what I mean is, what if the first cell is something like this
!pip install huggingface-hub
!pip install matplotlib
!pip install datasets

This way we make sure the user has to only run the notebook and they should not worry about fixing package install issue.
3. The final plot looks great, is there a chance that the plot is a bit more spatial, like right now the thumbnails overlap a lot
4. Also, how many videos are there in the dataset?

"\n",
"dataset_name = \"UCF101_subset/train\"\n",
"hf_dataset_identifier = \"sayakpaul/ucf101-subset\"\n",
"filename = \"UCF101_subset.tar.gz\"\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the tar.gz file gets deleted in the end?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the code to remove tar.gz file from cache.

@plon-Susk7
Copy link
Author

plon-Susk7 commented Oct 24, 2024

This way we make sure the user has to only run the notebook and they should not worry about fixing package install issue. 3. The final plot looks great, is there a chance that the plot is a bit more spatial, like right now the thumbnails overlap a lot 4. Also, how many videos are there in the dataset?

There are 10 classes in the dataset, so I took 5 from each. In total the notebook processes 50 videos.

@aatmanvaidya
Copy link
Collaborator

@plon-Susk7 this looks great, merging the PR now

can you also do one small thing - on the issue - #410, can you write detailed instructions on how to download the jupyter notebook in the .venv and then exec into the docker container and run it.
I will add those instructions to the wiki

@aatmanvaidya aatmanvaidya merged commit 98b9336 into tattle-made:development Oct 24, 2024
4 of 5 checks passed
@plon-Susk7 plon-Susk7 deleted the development branch October 25, 2024 06:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants