-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
app is unexpectedly slow #24
Comments
I can't reproduce this problem, while running the same video asset on my laptop. I tried for-loops with
(running time of |
cpb-aacip-dummy1_WhisperDummyTEST_0.mmif.json Running on an EC2 g5.2xlarge instance, I ran this command:
yielding:
In contrast, I ran this
yielding:
For both runs, the CLAMS app downloaded the "tiny" library which is only 72MB and dowloads in less than 2s. The CLAMS app takes about 2.5x the time, compared to the standard CLI version of Whisper. |
Using a much shorter test file, $ ffmpeg -i /home/krim/working-data/aes-data/wgbh-samples2-less/cpb-aacip-528-0g3gx45t80.mp3 2>&1 | grep Duration
Duration: 00:03:11.62, start: 0.023021, bitrate: 192 kb/s I can observe a significantly more time consumed when time-stamping words; $ time whisper --fp16 False --language en --model tiny /home/krim/working-data/aes-data/wgbh-samples2-less/cpb-aacip-528-0g3gx45t80.mp3 > /dev/null
real 0m16.040s
user 1m50.845s
sys 0m5.132s
$ time whisper --fp16 False --language en --model tiny --word_timestamps True /home/krim/working-data/aes-data/wgbh-samples2-less/cpb-aacip-528-0g3gx45t80.mp3 > /dev/null
real 2m24.231s
user 30m49.201s
sys 0m21.988s Then, with the said (long) file, $ time whisper --fp16 False --language en --model tiny ~/cpb-aacip-002d7648e70.mp4 > /dev/null
real 2m40.821s
user 20m12.450s
sys 0m41.715s
$ time whisper --fp16 False --language en --model tiny --word_timestamps True ~/cpb-aacip-002d7648e70.mp4 > /dev/null
real 17m28.261s
user 232m24.400s
sys 2m48.780s
CLAMS whisper wrapper always uses |
A possible "fix" can be to expose a parameter to turn off the word-level time stamps to get transcripts as fast as possible, then post-process subsets of the output MMIFs (based on demand) with a force-aligner app that is sensitive to and capable of processing |
I'm seeing only a minor difference in runtime that depend on the presence of the 69s vs 67s See results in this file. Here are the actual commands I'm using to run these tests:
|
On this system $ fastfetch --logo none
...
OS: Red Hat Enterprise Linux 9.3 x86_64
Host: PowerEdge R820
Kernel: Linux 5.14.0-362.18.1.el9_3.x86_64
Uptime: 229 days(!), 23 hours, 54 mins
Packages: 1059 (rpm)
Shell: bash 5.1.8
CPU: Intel(R) Xeon(R) E5-4620 0 (64) @ 2.60 GHz
GPU: Matrox Electronics Systems Ltd. G200eR2
Memory: 33.20 GiB / 125.29 GiB (26%)
Swap: 43.83 MiB / 32.00 GiB (0%)
$ cat /proc/cpuinfo | grep flag | head -n1
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm pti ssbd ibrs ibpb stibp tpr_shadow flexpriority ept vpid xsaveopt dtherm ida arat pln pts vnmi md_clear flush_l1d I tried the short time (mentioned above) with 4 different invocations
All models were pre-downloaded, and I ran each invocation three times; here's the runtime measured by
I'm running the same loop on other files (one more "short" one and two "long" files) and will report more numbers once it all finished (might take 1-2 days). |
While waiting for the whisper(s) to complete transcription, I ran the same script on a machine with powerful GPUs.
And here's the numbers (non-GPU numbers in the top half of the following are from the previous experiment and should be identical to those in the above comment).
I'm seeing the effect of word-level timestamping is gone with GPU, and for some reason, dockerized HTTP mode of the clams app runs many-fold faster than the vanilla |
I noticed that. In fact, that seems to be true on both the GPU system and the less powerful system. It really doesn't make any sense. I think it must be a bug! Maybe some parameter isn't getting passed? |
These are really helpful figures Keigh! Awesome. In some ways, they raise more questions than they answer. I'm looking forward to discussing on Monday. If you have time, could you please try adding I ask because that's the configuration that's most relevant to GBH -- the one we're most likely to use in production. (It's also the configuration for which I've observed slowness.) |
After the discussion yesterday, I ran some additional tests, and found
So to address the first problem, I simply ran the containers without the Here's more findings from runtime reports under GPU environment after fixing the above issues:
|
Bug Description
Originally reported by @owencking via slack
Reproduction steps
Expected behavior
No response
Log output
No response
Screenshots
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: