Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add speech-to-text to weekly dependency updates #65

Open
Tracked by #1 ...
jmartin-sul opened this issue Dec 10, 2024 · 0 comments · May be fixed by sul-dlss/access-update-scripts#293
Open
Tracked by #1 ...

add speech-to-text to weekly dependency updates #65

jmartin-sul opened this issue Dec 10, 2024 · 0 comments · May be fixed by sul-dlss/access-update-scripts#293
Assignees

Comments

@jmartin-sul
Copy link
Member

Here's an example from another Python project: https://github.com/sul-dlss/was-pywb/blob/main/.autoupdate/preupdate

That project uses Poetry, but rialto-airflow uses uv to manage dependencies, and we may want to consider adapting that example to uv. The author of this ticket is unsure of the pros/cons offhand, so part of the work for someone who isn't already familiar with both would be to ask for input from people who worked on rialto-airflow what they'd recommend, since that's the more recent of the two projects.

@edsu edsu self-assigned this Jan 30, 2025
edsu added a commit that referenced this issue Jan 31, 2025
We've been installing Python dependencies with Pip, and not tracking
their versions. Since we've started using uv in some other Python
projects it probably makes sense to add here so that speech-to-text can
be tracked by infra-team's weekly dependency update process.

Closes #80
Refs #65
edsu added a commit that referenced this issue Jan 31, 2025
We've been installing Python dependencies with Pip, and not tracking
their versions. Since we've started using uv in some other Python
projects it probably makes sense to add here so that speech-to-text can
be tracked by infra-team's weekly dependency update process.

Closes #80
Refs #65
edsu added a commit that referenced this issue Jan 31, 2025
We've been installing Python dependencies with Pip, and not tracking
their versions. Since we've started using uv in some other Python
projects it probably makes sense to add here so that speech-to-text can
be tracked by infra-team's weekly dependency update process.

Closes #80
Refs #65
edsu added a commit that referenced this issue Jan 31, 2025
We've been installing Python dependencies with Pip, and not tracking
their versions. Since we've started using uv in some other Python
projects it probably makes sense to add here so that speech-to-text can
be tracked by infra-team's weekly dependency update process.

Closes #80
Refs #65
edsu added a commit that referenced this issue Jan 31, 2025
We've been installing Python dependencies with Pip, and not tracking
their versions. Since we've started using uv in some other
infrastructure team Python projects it makes sense to add here so that
speech-to-text can be tracked by infra-team's weekly dependency update
process.

Unlike pip, uv always installs system specific Python wheels. There
wasn't a wheel available for triton (an openai-whisper's dependency)
under Python3.8 so the installation of dependencies failed.

So, in addition to adding uv this PR also upgrades our base Docker image
from `nvidia/cuda:12.1.0-devel-ubuntu20.04` to
`nvidia/cuda:12.8.0-cudnn-devel-ubuntu22.04` which allows us to install
python3.10 when we `apt install python3`.

Since this significantly change Whisper's behavior I wanted to be able to
compare the VTT transcript output before and after the Docker image
change. I added the start of a benchmarking system that will allow us to
compare the output of a set of 22 SDR items, with a previous benchmark.
Ideally this benchmark would be human vetted, and actually represent a
ground truth for what we believe the transcript should be. But for the
time being it is simply a snapshot in time of what the transcript looked
like today. See the benchmark/README.md file for details.

Closes #80
Refs #65
edsu added a commit that referenced this issue Jan 31, 2025
We've been installing Python dependencies with Pip, and not tracking
their versions. Since we've started using uv in some other
infrastructure team Python projects it makes sense to add here so that
speech-to-text can be tracked by infra-team's weekly dependency update
process.

Unlike pip, uv always installs system specific Python wheels. There
wasn't a wheel available for triton (an openai-whisper's dependency)
under Python3.8 so the installation of dependencies failed.

So, in addition to adding uv this PR also upgrades our base Docker image
from `nvidia/cuda:12.1.0-devel-ubuntu20.04` to
`nvidia/cuda:12.8.0-cudnn-devel-ubuntu22.04` which allows us to install
python3.10 when we `apt install python3`.

Since this significantly change Whisper's behavior I wanted to be able to
compare the VTT transcript output before and after the Docker image
change. I added the start of a benchmarking system that will allow us to
compare the output of a set of 22 SDR items, with a previous benchmark.
Ideally this benchmark would be human vetted, and actually represent a
ground truth for what we believe the transcript should be. But for the
time being it is simply a snapshot in time of what the transcript looked
like today. See the benchmark/README.md file for details.

Closes #80
Refs #65
edsu added a commit that referenced this issue Jan 31, 2025
We've been installing Python dependencies with Pip, and not tracking
their versions. Since we've started using uv in some other
infrastructure team Python projects it makes sense to add here so that
speech-to-text can be tracked by infra-team's weekly dependency update
process.

Unlike pip, uv always installs system specific Python wheels. There
wasn't a wheel available for triton (an openai-whisper's dependency)
under Python3.8 so the installation of dependencies failed.

So, in addition to adding uv this PR also upgrades our base Docker image
from `nvidia/cuda:12.1.0-devel-ubuntu20.04` to
`nvidia/cuda:12.8.0-cudnn-devel-ubuntu22.04` which allows us to install
python3.10 when we `apt install python3`.

Since this significantly change Whisper's behavior I wanted to be able to
compare the VTT transcript output before and after the Docker image
change. I added the start of a benchmarking system that will allow us to
compare the output of a set of 22 SDR items, with a previous benchmark.
Ideally this benchmark would be human vetted, and actually represent a
ground truth for what we believe the transcript should be. But for the
time being it is simply a snapshot in time of what the transcript looked
like today. See the benchmark/README.md file for details.

Closes #80
Refs #65
edsu added a commit that referenced this issue Jan 31, 2025
We've been installing Python dependencies with Pip, and not tracking
their versions. Since we've started using uv in some other
infrastructure team Python projects it makes sense to add here so that
speech-to-text can be tracked by infra-team's weekly dependency update
process.

Unlike pip, uv always installs system specific Python wheels. There
wasn't a wheel available for triton (an openai-whisper's dependency)
under Python3.8 so the installation of dependencies failed.

So, in addition to adding uv this PR also upgrades our base Docker image
from `nvidia/cuda:12.1.0-devel-ubuntu20.04` to
`nvidia/cuda:12.8.0-cudnn-devel-ubuntu22.04` which allows us to install
python3.10 when we `apt install python3`.

Since this significantly change Whisper's behavior I wanted to be able to
compare the VTT transcript output before and after the Docker image
change. I added the start of a benchmarking system that will allow us to
compare the output of a set of 22 SDR items, with a previous benchmark.
Ideally this benchmark would be human vetted, and actually represent a
ground truth for what we believe the transcript should be. But for the
time being it is simply a snapshot in time of what the transcript looked
like today. See the benchmark/README.md file for details.

Closes #80
Refs #65
edsu added a commit that referenced this issue Jan 31, 2025
We've been installing Python dependencies with Pip, and not tracking
their versions. Since we've started using uv in some other
infrastructure team Python projects it makes sense to add here so that
speech-to-text can be tracked by infra-team's weekly dependency update
process.

Unlike pip, uv always installs system specific Python wheels. There
wasn't a wheel available for triton (an openai-whisper's dependency)
under Python3.8 so the installation of dependencies failed.

So, in addition to adding uv this PR also upgrades our base Docker image
from `nvidia/cuda:12.1.0-devel-ubuntu20.04` to
`nvidia/cuda:12.8.0-cudnn-devel-ubuntu22.04` which allows us to install
python3.10 when we `apt install python3`.

Since this significantly change Whisper's behavior I wanted to be able to
compare the VTT transcript output before and after the Docker image
change. I added the start of a benchmarking system that will allow us to
compare the output of a set of 22 SDR items, with a previous benchmark.
Ideally this benchmark would be human vetted, and actually represent a
ground truth for what we believe the transcript should be. But for the
time being it is simply a snapshot in time of what the transcript looked
like today. See the benchmark/README.md file for details.

Closes #80
Refs #65
edsu added a commit that referenced this issue Jan 31, 2025
We've been installing Python dependencies with Pip, and not tracking
their versions. Since we've started using uv in some other
infrastructure team Python projects it makes sense to add here so that
speech-to-text can be tracked by infra-team's weekly dependency update
process.

Unlike pip, uv always installs system specific Python wheels. There
wasn't a wheel available for triton (an openai-whisper's dependency)
under Python3.8 so the installation of dependencies failed.

So, in addition to adding uv this PR also upgrades our base Docker image
from `nvidia/cuda:12.1.0-devel-ubuntu20.04` to
`nvidia/cuda:12.8.0-cudnn-devel-ubuntu22.04` which allows us to install
python3.10 when we `apt install python3`.

Since this significantly change Whisper's behavior I wanted to be able to
compare the VTT transcript output before and after the Docker image
change. I added the start of a benchmarking system that will allow us to
compare the output of a set of 22 SDR items, with a previous benchmark.
Ideally this benchmark would be human vetted, and actually represent a
ground truth for what we believe the transcript should be. But for the
time being it is simply a snapshot in time of what the transcript looked
like today. See the benchmark/README.md file for details.

Closes #80
Refs #65
edsu added a commit that referenced this issue Jan 31, 2025
We've been installing Python dependencies with Pip, and not tracking
their versions. Since we've started using uv in some other
infrastructure team Python projects it makes sense to add here so that
speech-to-text can be tracked by infra-team's weekly dependency update
process.

Unlike pip, uv always installs system specific Python wheels. There
wasn't a wheel available for triton (an openai-whisper's dependency)
under Python3.8 so the installation of dependencies failed.

So, in addition to adding uv this PR also upgrades our base Docker image
from `nvidia/cuda:12.1.0-devel-ubuntu20.04` to
`nvidia/cuda:12.8.0-cudnn-devel-ubuntu22.04` which allows us to install
python3.10 when we `apt install python3`.

Since this significantly change Whisper's behavior I wanted to be able to
compare the VTT transcript output before and after the Docker image
change. I added the start of a benchmarking system that will allow us to
compare the output of a set of 22 SDR items, with a previous benchmark.
Ideally this benchmark would be human vetted, and actually represent a
ground truth for what we believe the transcript should be. But for the
time being it is simply a snapshot in time of what the transcript looked
like today. See the benchmark/README.md file for details.

Closes #80
Refs #65
edsu added a commit that referenced this issue Jan 31, 2025
We've been installing Python dependencies with Pip, and not tracking
their versions. Since we've started using uv in some other
infrastructure team Python projects it makes sense to add here so that
speech-to-text can be tracked by infra-team's weekly dependency update
process.

Unlike pip, uv always installs system specific Python wheels. There
wasn't a wheel available for triton (an openai-whisper's dependency)
under Python3.8 so the installation of dependencies failed.

So, in addition to adding uv this PR also upgrades our base Docker image
from `nvidia/cuda:12.1.0-devel-ubuntu20.04` to
`nvidia/cuda:12.8.0-cudnn-devel-ubuntu22.04` which allows us to install
python3.10 when we `apt install python3`.

Since this significantly change Whisper's behavior I wanted to be able to
compare the VTT transcript output before and after the Docker image
change. I added the start of a benchmarking system that will allow us to
compare the output of a set of 22 SDR items, with a previous benchmark.
Ideally this benchmark would be human vetted, and actually represent a
ground truth for what we believe the transcript should be. But for the
time being it is simply a snapshot in time of what the transcript looked
like today. See the benchmark/README.md file for details.

Closes #80
Refs #65
edsu added a commit that referenced this issue Jan 31, 2025
We've been installing Python dependencies with Pip, and not tracking
their versions. Since we've started using uv in some other
infrastructure team Python projects it makes sense to add here so that
speech-to-text can be tracked by infra-team's weekly dependency update
process.

Unlike pip, uv always installs system specific Python wheels. There
wasn't a wheel available for triton (an openai-whisper's dependency)
under Python3.8 so the installation of dependencies failed.

So, in addition to adding uv this PR also upgrades our base Docker image
from `nvidia/cuda:12.1.0-devel-ubuntu20.04` to
`nvidia/cuda:12.8.0-cudnn-devel-ubuntu22.04` which allows us to install
python3.10 when we `apt install python3`.

Since this significantly change Whisper's behavior I wanted to be able to
compare the VTT transcript output before and after the Docker image
change. I added the start of a benchmarking system that will allow us to
compare the output of a set of 22 SDR items, with a previous benchmark.
Ideally this benchmark would be human vetted, and actually represent a
ground truth for what we believe the transcript should be. But for the
time being it is simply a snapshot in time of what the transcript looked
like today. See the benchmark/README.md file for details.

Closes #80
Refs #65
edsu added a commit that referenced this issue Jan 31, 2025
We've been installing Python dependencies with Pip, and not tracking
their versions. Since we've started using uv in some other
infrastructure team Python projects it makes sense to add here so that
speech-to-text can be tracked by infra-team's weekly dependency update
process.

Unlike pip, uv always installs system specific Python wheels. There
wasn't a wheel available for triton (an openai-whisper's dependency)
under Python3.8 so the installation of dependencies failed.

So, in addition to adding uv this PR also upgrades our base Docker image
from `nvidia/cuda:12.1.0-devel-ubuntu20.04` to
`nvidia/cuda:12.8.0-cudnn-devel-ubuntu22.04` which allows us to install
python3.10 when we `apt install python3`.

Since this significantly change Whisper's behavior I wanted to be able to
compare the VTT transcript output before and after the Docker image
change. I added the start of a benchmarking system that will allow us to
compare the output of a set of 22 SDR items, with a previous benchmark.
Ideally this benchmark would be human vetted, and actually represent a
ground truth for what we believe the transcript should be. But for the
time being it is simply a snapshot in time of what the transcript looked
like today. See the benchmark/README.md file for details.

Closes #80
Refs #65
edsu added a commit that referenced this issue Jan 31, 2025
We've been installing Python dependencies with Pip, and not tracking
their versions. Since we've started using uv in some other
infrastructure team Python projects it makes sense to add here so that
speech-to-text can be tracked by infra-team's weekly dependency update
process.

Unlike pip, uv always installs system specific Python wheels. There
wasn't a wheel available for triton (an openai-whisper's dependency)
under Python3.8 so the installation of dependencies failed.

So, in addition to adding uv this PR also upgrades our base Docker image
from `nvidia/cuda:12.1.0-devel-ubuntu20.04` to
`nvidia/cuda:12.8.0-cudnn-devel-ubuntu22.04` which allows us to install
python3.10 when we `apt install python3`.

Since this significantly change Whisper's behavior I wanted to be able to
compare the VTT transcript output before and after the Docker image
change. I added the start of a benchmarking system that will allow us to
compare the output of a set of 22 SDR items, with a previous benchmark.
Ideally this benchmark would be human vetted, and actually represent a
ground truth for what we believe the transcript should be. But for the
time being it is simply a snapshot in time of what the transcript looked
like today. See the benchmark/README.md file for details.

Closes #80
Refs #65
edsu added a commit that referenced this issue Jan 31, 2025
We've been installing Python dependencies with Pip, and not tracking
their versions. Since we've started using uv in some other
infrastructure team Python projects it makes sense to add here so that
speech-to-text can be tracked by infra-team's weekly dependency update
process.

Unlike pip, uv always installs system specific Python wheels. There
wasn't a wheel available for triton (an openai-whisper's dependency)
under Python3.8 so the installation of dependencies failed.

So, in addition to adding uv this PR also upgrades our base Docker image
from `nvidia/cuda:12.1.0-devel-ubuntu20.04` to
`nvidia/cuda:12.8.0-cudnn-devel-ubuntu22.04` which allows us to install
python3.10 when we `apt install python3`.

Since this significantly change Whisper's behavior I wanted to be able to
compare the VTT transcript output before and after the Docker image
change. I added the start of a benchmarking system that will allow us to
compare the output of a set of 22 SDR items, with a previous benchmark.
Ideally this benchmark would be human vetted, and actually represent a
ground truth for what we believe the transcript should be. But for the
time being it is simply a snapshot in time of what the transcript looked
like today. See the benchmark/README.md file for details.

Closes #80
Refs #65
edsu added a commit that referenced this issue Jan 31, 2025
We've been installing Python dependencies with Pip, and not tracking
their versions. Since we've started using uv in some other
infrastructure team Python projects it makes sense to add here so that
speech-to-text can be tracked by infra-team's weekly dependency update
process.

Unlike pip, uv always installs system specific Python wheels. There
wasn't a wheel available for triton (an openai-whisper's dependency)
under Python3.8 so the installation of dependencies failed.

So, in addition to adding uv this PR also upgrades our base Docker image
from `nvidia/cuda:12.1.0-devel-ubuntu20.04` to
`nvidia/cuda:12.8.0-cudnn-devel-ubuntu22.04` which allows us to install
python3.10 when we `apt install python3`.

Since this significantly change Whisper's behavior I wanted to be able to
compare the VTT transcript output before and after the Docker image
change. I added the start of a benchmarking system that will allow us to
compare the output of a set of 22 SDR items, with a previous benchmark.
Ideally this benchmark would be human vetted, and actually represent a
ground truth for what we believe the transcript should be. But for the
time being it is simply a snapshot in time of what the transcript looked
like today. See the benchmark/README.md file for details.

Closes #80
Refs #65
edsu added a commit that referenced this issue Jan 31, 2025
We've been installing Python dependencies with Pip, and not tracking
their versions. Since we've started using uv in some other
infrastructure team Python projects it makes sense to add here so that
speech-to-text can be tracked by infra-team's weekly dependency update
process.

Unlike pip, uv always installs system specific Python wheels. There
wasn't a wheel available for triton (an openai-whisper's dependency)
under Python3.8 so the installation of dependencies failed.

So, in addition to adding uv this PR also upgrades our base Docker image
from `nvidia/cuda:12.1.0-devel-ubuntu20.04` to
`nvidia/cuda:12.8.0-cudnn-devel-ubuntu22.04` which allows us to install
python3.10 when we `apt install python3`.

Since this significantly change Whisper's behavior I wanted to be able to
compare the VTT transcript output before and after the Docker image
change. I added the start of a benchmarking system that will allow us to
compare the output of a set of 22 SDR items, with a previous benchmark.
Ideally this benchmark would be human vetted, and actually represent a
ground truth for what we believe the transcript should be. But for the
time being it is simply a snapshot in time of what the transcript looked
like today. See the benchmark/README.md file for details.

Closes #80
Refs #65
edsu added a commit that referenced this issue Jan 31, 2025
We've been installing Python dependencies with Pip, and not tracking
their versions. Since we've started using uv in some other
infrastructure team Python projects it makes sense to add here so that
speech-to-text can be tracked by infra-team's weekly dependency update
process.

Unlike pip, uv always installs system specific Python wheels. There
wasn't a wheel available for triton (an openai-whisper's dependency)
under Python3.8 so the installation of dependencies failed.

So, in addition to adding uv this PR also upgrades our base Docker image
from `nvidia/cuda:12.1.0-devel-ubuntu20.04` to
`nvidia/cuda:12.8.0-cudnn-devel-ubuntu22.04` which allows us to install
python3.10 when we `apt install python3`.

Since this significantly change Whisper's behavior I wanted to be able to
compare the VTT transcript output before and after the Docker image
change. I added the start of a benchmarking system that will allow us to
compare the output of a set of 22 SDR items, with a previous benchmark.
Ideally this benchmark would be human vetted, and actually represent a
ground truth for what we believe the transcript should be. But for the
time being it is simply a snapshot in time of what the transcript looked
like today. See the benchmark/README.md file for details.

Closes #80
Refs #65
edsu added a commit that referenced this issue Feb 3, 2025
We've been installing Python dependencies with Pip, and not tracking
their versions. Since we've started using uv in some other
infrastructure team Python projects it makes sense to add here so that
speech-to-text can be tracked by infra-team's weekly dependency update
process.

Unlike pip, uv always installs system specific Python wheels. There
wasn't a wheel available for triton (an openai-whisper's dependency)
under Python3.8 so the installation of dependencies failed.

So, in addition to adding uv this PR also upgrades our base Docker image
from `nvidia/cuda:12.1.0-devel-ubuntu20.04` to
`nvidia/cuda:12.8.0-cudnn-devel-ubuntu22.04` which allows us to install
python3.10 when we `apt install python3`.

Since this significantly change Whisper's behavior I wanted to be able to
compare the VTT transcript output before and after the Docker image
change. I added the start of a benchmarking system that will allow us to
compare the output of a set of 22 SDR items, with a previous benchmark.
Ideally this benchmark would be human vetted, and actually represent a
ground truth for what we believe the transcript should be. But for the
time being it is simply a snapshot in time of what the transcript looked
like today. See the benchmark/README.md file for details.

Closes #80
Refs #65
edsu added a commit that referenced this issue Feb 3, 2025
We've been installing Python dependencies with Pip, and not tracking
their versions. Since we've started using uv in some other
infrastructure team Python projects it makes sense to add here so that
speech-to-text can be tracked by infra-team's weekly dependency update
process.

Unlike pip, uv always installs system specific Python wheels. There
wasn't a wheel available for triton (an openai-whisper's dependency)
under Python3.8 so the installation of dependencies failed.

So, in addition to adding uv this PR also upgrades our base Docker image
from `nvidia/cuda:12.1.0-devel-ubuntu20.04` to
`nvidia/cuda:12.8.0-cudnn-devel-ubuntu22.04` which allows us to install
python3.10 when we `apt install python3`.

Since this significantly change Whisper's behavior I wanted to be able to
compare the VTT transcript output before and after the Docker image
change. I added the start of a benchmarking system that will allow us to
compare the output of a set of 22 SDR items, with a previous benchmark.
Ideally this benchmark would be human vetted, and actually represent a
ground truth for what we believe the transcript should be. But for the
time being it is simply a snapshot in time of what the transcript looked
like today. See the benchmark/README.md file for details.

Closes #80
Refs #65
jmartin-sul added a commit to sul-dlss/access-update-scripts that referenced this issue Feb 7, 2025
it's a python project, but the update hook was already added in sul-dlss/speech-to-text#81

closes sul-dlss/speech-to-text#65
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants