-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: Youtube API to download english transcripts #1139
base: main
Are you sure you want to change the base?
Conversation
This comment has been minimized.
This comment has been minimized.
Pull Request Test Coverage Report for Build 2982211757
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amirtds Thanks a lot for working on this issue.
I have one concern though. This fix introduces a lot of new git conflicts (#1139 (comment)). It seems like a very costly addition to the platform, what do you think?
What's the customer need for this fix? Is it a blocker for most customers or it's an important feature for a niche segment?
If it's the latter, I suggest keeping the solution out of the platform and seek an upstream-first fix. Again, we're looking at a 27 conflicts which is more costly than a complete re-do of the whole pull request.
If it's a critical fix for all customers, then we should be looking to invest in a more packaged feature appsembler/transcripts
that integrates smoothly with the platform until we fix it upstream.
I don't think this should be merged as-is.
A lot of the conflicts appear to be pretty simple things like |
Good catch Anders. It's great that the conflicts are for that reason. @amirtds would you mind reverting style changes that are causing conflicts? |
d5c67cc
to
c12d2f2
Compare
This comment has been minimized.
This comment has been minimized.
c12d2f2
to
7f49484
Compare
This comment has been minimized.
This comment has been minimized.
7f49484
to
a1e0e00
Compare
This comment has been minimized.
This comment has been minimized.
a1e0e00
to
5296063
Compare
This comment has been minimized.
This comment has been minimized.
5296063
to
6e7b792
Compare
This comment has been minimized.
This comment has been minimized.
6e7b792
to
a6a10b3
Compare
This comment has been minimized.
This comment has been minimized.
Checking git merge conflicts against https://github.com/edx/edx-platform.git
New conflicting files with 'open-release/koa.master'
New conflicting files with 'open-release/lilac.master'
New conflicting files with 'open-release/maple.master'
New conflicting files with 'open-release/nutmeg.master'
New conflicting files with 'master'
|
Background
https://appsembler.atlassian.net/browse/BLACK-2163
By the end of 2021 YouTube Deprecated their Timedtext API. This caused problem for our customers to download automatically the transcripts for their YouTube video for their videos in Open edX. Open edX team is aware of this error https://openedx.atlassian.net/browse/TNL-9460, but there was no movement on solving this issue from open edx end.
We tried to solve this by replacing the API with a new YouTube API, but the new API requires OAuth authentication and only works with videos in the same channel and even with corresponding changes in the codebase the Transcripts didn't work.
Our solution
I developed a new API that responds with SRT version of transcripts for a given YouTube video ID for example https://us-central1-appsembler-tahoe-0.cloudfunctions.net/youtube-transcript?video_id=AcZZlbWRyUM You can try this with different videos just replace what comes after
video_id=
with your YouTube video ID.After receiving the Transcripts from our API we make a call to
transcripts/upload
to upload the transcript for a given unit component location, and we store the transcripts. To make it visible in Studio and LMS.The functionality should work as the following video
Filmage.2022-02-28_204527.mp4