Install dependencies with uv #81

edsu · 2025-01-31T15:15:09Z

Note: it looks like A LOT of files have changed, but you can ignore the majority of them in docs/reports and docs/baseline which are the output of running a benchmark.

We've been installing Python dependencies with Pip, and not tracking their versions. Since we've started using uv in some other infrastructure team Python projects it makes sense to add here so that speech-to-text can be tracked by infra-team's weekly dependency update process.

Unlike pip, uv always installs system specific Python wheels. There wasn't a wheel available for triton (an openai-whisper's dependency) under Python3.8 so the installation of dependencies failed.

So, in addition to adding uv this PR also upgrades our base Docker image from nvidia/cuda:12.1.0-devel-ubuntu20.04 to nvidia/cuda:12.8.0-cudnn-devel-ubuntu22.04 which allows us to install python3.10 when we apt install python3.

Since this could significantly change Whisper's behavior I wanted to be able to compare the VTT transcript output before and after the Docker image change. I added the start of a benchmarking system that will allow us to compare the output of a set of 22 SDR items, with a previous benchmark. Ideally this benchmark would be human vetted, and actually represent a ground truth for what we believe the transcript should be. But for the time being it is simply a snapshot in time of what the transcript looked like today. See the benchmark/README.md file for details.

Closes #80
Refs #65

edsu · 2025-01-31T23:57:58Z

.autoupdate/preupdate

This is for Jenkins when it runs dependency updates.

edsu · 2025-01-31T23:58:20Z

.github/workflows/test.yml

Need to run tests using uv now.

edsu · 2025-01-31T23:59:04Z

Dockerfile

Update to running with uv, and a notable change to the base Docker image to get Python3.10 so we can manage Whisper dependencies with uv.

edsu · 2025-01-31T23:59:34Z

docs/README.md

This documentation for the benchmarking process. You can ignore the output files below.

edsu · 2025-02-01T00:00:47Z

docs/report.py

This is a program for generating a report using the baseline transcripts and the latest transcripts in the SDR.

jmartin-sul

a couple small suggestions and needs a rebase, but looks great!

docs/README.md

We've been installing Python dependencies with Pip, and not tracking their versions. Since we've started using uv in some other infrastructure team Python projects it makes sense to add here so that speech-to-text can be tracked by infra-team's weekly dependency update process. Unlike pip, uv always installs system specific Python wheels. There wasn't a wheel available for triton (an openai-whisper's dependency) under Python3.8 so the installation of dependencies failed. So, in addition to adding uv this PR also upgrades our base Docker image from `nvidia/cuda:12.1.0-devel-ubuntu20.04` to `nvidia/cuda:12.8.0-cudnn-devel-ubuntu22.04` which allows us to install python3.10 when we `apt install python3`. Since this significantly change Whisper's behavior I wanted to be able to compare the VTT transcript output before and after the Docker image change. I added the start of a benchmarking system that will allow us to compare the output of a set of 22 SDR items, with a previous benchmark. Ideally this benchmark would be human vetted, and actually represent a ground truth for what we believe the transcript should be. But for the time being it is simply a snapshot in time of what the transcript looked like today. See the benchmark/README.md file for details. Closes #80 Refs #65

it's a python project, but the update hook was already added in sul-dlss/speech-to-text#81 closes sul-dlss/speech-to-text#65

edsu marked this pull request as draft January 31, 2025 15:15

edsu force-pushed the uv branch 3 times, most recently from 6946ed7 to e2553aa Compare January 31, 2025 15:26

jmartin-sul mentioned this pull request Jan 31, 2025

send Honeybadger alert on nonzero script exit #78

Merged

2 tasks

edsu force-pushed the uv branch 15 times, most recently from 5b0e19e to 49950f8 Compare January 31, 2025 23:57

edsu commented Jan 31, 2025

View reviewed changes

.autoupdate/preupdate Outdated

Copy link

Contributor Author

edsu Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for Jenkins when it runs dependency updates.

edsu commented Jan 31, 2025

View reviewed changes

.github/workflows/test.yml Outdated

Copy link

Contributor Author

edsu Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to run tests using uv now.

edsu commented Jan 31, 2025

View reviewed changes

edsu commented Feb 1, 2025

View reviewed changes

jmartin-sul approved these changes Feb 1, 2025

View reviewed changes

docs/README.md Outdated Show resolved Hide resolved

docs/README.md Outdated Show resolved Hide resolved

edsu force-pushed the uv branch from 49950f8 to 90f926f Compare February 3, 2025 23:11

edsu force-pushed the uv branch from 90f926f to a6f2c1c Compare February 3, 2025 23:16

edsu marked this pull request as ready for review February 3, 2025 23:18

edsu merged commit 65e6100 into main Feb 3, 2025
3 checks passed

edsu deleted the uv branch February 3, 2025 23:24

jmartin-sul mentioned this pull request Feb 7, 2025

[REVIEW BUT DON'T MERGE] add speech-to-text service to weekly dependency updates sul-dlss/access-update-scripts#293

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Install dependencies with uv #81

Install dependencies with uv #81

edsu commented Jan 31, 2025 •

edited

Loading

edsu Jan 31, 2025

edsu Jan 31, 2025

edsu Jan 31, 2025

edsu Jan 31, 2025

edsu Feb 1, 2025

jmartin-sul left a comment

Install dependencies with uv #81

Install dependencies with uv #81

Conversation

edsu commented Jan 31, 2025 • edited Loading

edsu Jan 31, 2025

Choose a reason for hiding this comment

edsu Jan 31, 2025

Choose a reason for hiding this comment

edsu Jan 31, 2025

Choose a reason for hiding this comment

edsu Jan 31, 2025

Choose a reason for hiding this comment

edsu Feb 1, 2025

Choose a reason for hiding this comment

jmartin-sul left a comment

Choose a reason for hiding this comment

edsu commented Jan 31, 2025 •

edited

Loading