-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assign topics to videos and playlists #584
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #584 +/- ##
==========================================
+ Coverage 77.18% 77.26% +0.07%
==========================================
Files 243 243
Lines 10525 10561 +36
Branches 1786 1793 +7
==========================================
+ Hits 8124 8160 +36
Misses 2230 2230
Partials 171 171 ☔ View full report in Codecov by Sentry. |
08146b6
to
cea9dea
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks fine. I have all the ocw courses in my local database, and the topic assignments were not very good. In particular it seemed quite bad for humanities videos. It seemed to assign the top topics( Probability and Statistics, Engineering, Computer Science, Business etc to everything)
I'm not sure if expanding the columns that are used to make the assignments beyond title, course number and the descriptions would help. Attaching the course that the video came from to the video and getting the topics that way would be definitely better |
I created a new issue to improve the video topics assignment: #594 |
5b20894
to
ab8f4a1
Compare
What are the relevant tickets?
Closes #579
Description (What does it do?)
For videos, runs an opensearch course query for each video to guesstimate the covered topics based on the video title & description fields, and assigns the 3 most common topics from the results. (Port of open-discussions functionality).
For video playlists, assigns the 3 most common topics from the videos in the playlist.
How can this be tested?
Follow testing instructions for PR #558. Everything should work the same, except that most video and video playlist resources should now have up to 3 topics assigned to them after the ETL pipeline is complete. This assumes you already have a decent amount of courses available with topics. If you don't, run
./manage.py backpopulate_ocw_data --skip-contentfiles
first, it will likely take 20-30 minutes or so.