Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assign topics to videos and playlists #584

Merged
merged 4 commits into from
Mar 8, 2024
Merged

Assign topics to videos and playlists #584

merged 4 commits into from
Mar 8, 2024

Conversation

mbertrand
Copy link
Member

@mbertrand mbertrand commented Mar 5, 2024

What are the relevant tickets?

Closes #579

Description (What does it do?)

For videos, runs an opensearch course query for each video to guesstimate the covered topics based on the video title & description fields, and assigns the 3 most common topics from the results. (Port of open-discussions functionality).
For video playlists, assigns the 3 most common topics from the videos in the playlist.

How can this be tested?

Follow testing instructions for PR #558. Everything should work the same, except that most video and video playlist resources should now have up to 3 topics assigned to them after the ETL pipeline is complete. This assumes you already have a decent amount of courses available with topics. If you don't, run ./manage.py backpopulate_ocw_data --skip-contentfiles first, it will likely take 20-30 minutes or so.

Copy link

codecov bot commented Mar 5, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 77.26%. Comparing base (c48fa83) to head (2d3f970).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #584      +/-   ##
==========================================
+ Coverage   77.18%   77.26%   +0.07%     
==========================================
  Files         243      243              
  Lines       10525    10561      +36     
  Branches     1786     1793       +7     
==========================================
+ Hits         8124     8160      +36     
  Misses       2230     2230              
  Partials      171      171              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mbertrand mbertrand added Needs Review An open Pull Request that is ready for review and removed Work in Progress labels Mar 6, 2024
@abeglova abeglova self-assigned this Mar 7, 2024
Copy link
Contributor

@abeglova abeglova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks fine. I have all the ocw courses in my local database, and the topic assignments were not very good. In particular it seemed quite bad for humanities videos. It seemed to assign the top topics( Probability and Statistics, Engineering, Computer Science, Business etc to everything)

@abeglova
Copy link
Contributor

abeglova commented Mar 7, 2024

I'm not sure if expanding the columns that are used to make the assignments beyond title, course number and the descriptions would help. Attaching the course that the video came from to the video and getting the topics that way would be definitely better

@mbertrand
Copy link
Member Author

I created a new issue to improve the video topics assignment: #594

@mbertrand mbertrand merged commit 8defe2d into main Mar 8, 2024
8 checks passed
@odlbot odlbot mentioned this pull request Mar 12, 2024
9 tasks
@mbertrand mbertrand deleted the mb/video_topics branch May 13, 2024 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Review An open Pull Request that is ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Video Search: generate topic data for videos based on content similarity to courses
2 participants