-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Install dependencies with uv #81
Conversation
6946ed7
to
e2553aa
Compare
5b0e19e
to
49950f8
Compare
.autoupdate/preupdate
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for Jenkins when it runs dependency updates.
.github/workflows/test.yml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to run tests using uv now.
Dockerfile
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update to running with uv, and a notable change to the base Docker image to get Python3.10 so we can manage Whisper dependencies with uv.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This documentation for the benchmarking process. You can ignore the output files below.
docs/report.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a program for generating a report using the baseline transcripts and the latest transcripts in the SDR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a couple small suggestions and needs a rebase, but looks great!
We've been installing Python dependencies with Pip, and not tracking their versions. Since we've started using uv in some other infrastructure team Python projects it makes sense to add here so that speech-to-text can be tracked by infra-team's weekly dependency update process. Unlike pip, uv always installs system specific Python wheels. There wasn't a wheel available for triton (an openai-whisper's dependency) under Python3.8 so the installation of dependencies failed. So, in addition to adding uv this PR also upgrades our base Docker image from `nvidia/cuda:12.1.0-devel-ubuntu20.04` to `nvidia/cuda:12.8.0-cudnn-devel-ubuntu22.04` which allows us to install python3.10 when we `apt install python3`. Since this significantly change Whisper's behavior I wanted to be able to compare the VTT transcript output before and after the Docker image change. I added the start of a benchmarking system that will allow us to compare the output of a set of 22 SDR items, with a previous benchmark. Ideally this benchmark would be human vetted, and actually represent a ground truth for what we believe the transcript should be. But for the time being it is simply a snapshot in time of what the transcript looked like today. See the benchmark/README.md file for details. Closes #80 Refs #65
it's a python project, but the update hook was already added in sul-dlss/speech-to-text#81 closes sul-dlss/speech-to-text#65
Note: it looks like A LOT of files have changed, but you can ignore the majority of them in
docs/reports
anddocs/baseline
which are the output of running a benchmark.We've been installing Python dependencies with Pip, and not tracking their versions. Since we've started using uv in some other infrastructure team Python projects it makes sense to add here so that speech-to-text can be tracked by infra-team's weekly dependency update process.
Unlike pip, uv always installs system specific Python wheels. There wasn't a wheel available for triton (an openai-whisper's dependency) under Python3.8 so the installation of dependencies failed.
So, in addition to adding uv this PR also upgrades our base Docker image from
nvidia/cuda:12.1.0-devel-ubuntu20.04
tonvidia/cuda:12.8.0-cudnn-devel-ubuntu22.04
which allows us to install python3.10 when weapt install python3
.Since this could significantly change Whisper's behavior I wanted to be able to compare the VTT transcript output before and after the Docker image change. I added the start of a benchmarking system that will allow us to compare the output of a set of 22 SDR items, with a previous benchmark. Ideally this benchmark would be human vetted, and actually represent a ground truth for what we believe the transcript should be. But for the time being it is simply a snapshot in time of what the transcript looked like today. See the benchmark/README.md file for details.
Closes #80
Refs #65