Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

yunitate segmentation outside audio duration #153

Open
gobbios opened this issue Oct 20, 2019 · 1 comment
Open

yunitate segmentation outside audio duration #153

gobbios opened this issue Oct 20, 2019 · 1 comment

Comments

@gobbios
Copy link

gobbios commented Oct 20, 2019

yunitate.sh seems to produce rttm files with segments that go beyond (or even are completely outside) the duration of the source wave file.

the audio I'm using is this:
vagrant ssh -c "sox --i '/vagrant/data/0513.wav'"

Input File : '/vagrant/data/0513.wav'
Channels : 1
Sample Rate : 44100
Precision : 16-bit
Duration : 00:10:04.12 = 26641575 samples = 45308.8 CDDA sectors
File Size : 53.3M
Bit Rate : 706k
Sample Encoding: 16-bit Signed Integer PCM

So that amounts to 604.12 seconds duration.

After running vagrant ssh -c "yunitate.sh data/", I get the following rttm (only last few lines shown):

SPEAKER 0513.rttm 1 601.4 0.1 CHI
SPEAKER 0513.rttm 1 601.5 1.2 FEM
SPEAKER 0513.rttm 1 602.7 2.1 CHI

where the last segment starts inside the source wave file's duration, but goes beyond the end (602.7 + 2.1 = 604.9).

When running vagrant ssh -c "yunitate.sh data/ english" things become even stranger:

SPEAKER 0513.rttm 1 601.6 0.6 FEM
SPEAKER 0513.rttm 1 603.3 0.1 CHI
SPEAKER 0513.rttm 1 603.6 0.1 CHI
SPEAKER 0513.rttm 1 603.9 0.3 CHI
SPEAKER 0513.rttm 1 604.2 0.1 FEM

Here the last segment starts after the end of the original source.

This becomes problematic when using the latter file for vagrant ssh -c "~/launcher/WCE_from_SAD_outputs.sh /vagrant/data/ yunitator_english". Here, the tool finishes without error message, but doesn't produce the word count output. The wav_tmp folder is still present and contains this empty (corrupt?) wav file:

Input File : '/vagrant/data/wav_tmp/yunitator_english_0513_00604200-00000100.wav'
Channels : 1
Sample Rate : 44100
Precision : 16-bit
Sample Encoding: 16-bit Signed Integer PCM

And finally, if I use this file in the analyze.sh pipeline, I get the following message:

(MSG) [2] in SMILExtract : openSMILE starting!
(MSG) [2] in SMILExtract : config file is: MED_2s_100ms_htk.conf
(MSG) [2] in cComponentManager : successfully registered 96 component types.
(MSG) [2] in cComponentManager : successfully finished createInstances
(19 component instances were finalised, 1 data memories were finalised)
(MSG) [2] in cComponentManager : starting single thread processing loop
(MSG) [2] in cComponentManager : Processing finished! System ran for 60436 ticks.
sox WARN trim: End position is after expected end of audio.
sox WARN trim: Last 1 position(s) not reached.
/home/vagrant/utils/analyze.sh: line 40: /vagrant/data//detailed_outputs/WCE_yunitator_english_0513.rttm: No such file or directory
paste: /vagrant/data//wce.temp: No such file or directory

vcm_0513.rttm and yunitator_english_0513.rttm are present in detailed_output, but the corresponding wce_0513.rttm is missing.

One hackish solution might be to append a second or two of silence to the end of the source wave file, I suppose. I haven't tried that yet.

@gobbios
Copy link
Author

gobbios commented Oct 20, 2019

I tried the silence approach with partial success:

vagrant ssh -c "sox /vagrant/data/0513.wav /vagrant/data/0513padded.wav pad 0 5"
vagrant ssh -c "sox --i '/vagrant/data/0513padded.wav'"

Input File : '/vagrant/data/0513padded.wav'
Channels : 1
Sample Rate : 44100
Precision : 16-bit
Duration : 00:10:09.12 = 26862075 samples = 45683.8 CDDA sectors
File Size : 53.7M
Bit Rate : 706k
Sample Encoding: 16-bit Signed Integer PCM

vagrant ssh -c "yunitate.sh data/"

SPEAKER 0513padded.rttm 1 600.0 4.2 CHI
SPEAKER 0513padded.rttm 1 604.2 2.0 FEM
SPEAKER 0513padded.rttm 1 606.2 3.8 MAL
SPEAKER 0513padded.rttm 1 610.2 0.8 MAL

still goes over the end time of the audio (barely though).

while vagrant ssh -c "yunitate.sh data/ english"

SPEAKER 0513padded.rttm 1 596.2 0.3 CHI
SPEAKER 0513padded.rttm 1 597.2 0.7 CHI
SPEAKER 0513padded.rttm 1 598.0 6.1 CHI

does works fine now. Incidentally the last segment aligns with the original file duration now (604.1).

WCE_from_SAD_outputs.sh /vagrant/data/ yunitator_english and analyze.sh data/ work as expected with the latter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant