Few questions #1

Fhrozen · 2023-04-13T04:08:05Z

Thank you very much for your excellent work with this tool.

I am currently trying to use it in ESPnet to generate the alignments for training a model with durations similar to the MFA tool.
However, I am facing some minor issues.

I understand the tool is still under construction, but a little guide will help me continue with my implementation.

you need to add a __init__.py at the alqlign folder so you execute the command alqalign.run.
In run.py, what is step 1 supposed to do? Transcribe the audio into word/text or just only into phonemes; should it have a similar behavior as step 2?

In the case of using step 1, text is no longer required, right? So the argument text becomes unneeded? Is it also necessary to move the text processing after step 1 (

alqalign/alqalign/run.py

Lines 70 to 92 in 10081d5

    
           if text_file.is_dir(): 
        
               for text_path in sorted(text_file.glob('*')): 
        
                   utt_id = text_path.stem 
        
                   if utt_id in utt2audio: 
        
                       audio_files.append(utt2audio[utt_id]) 
        
                       text_files.append(text_path) 
        
                       output_dirs.append(output_dir / utt_id) 
        
                       utt_ids.append(utt_id) 
        
           else: 
        
               for i, line in enumerate(open(text_file, 'r')): 
        
                   if text_format == 'kaldi': 
        
                       fields = line.strip().split() 
        
                       utt_id = fields[0] 
        
                       sent = ' '.join(fields[1:]) 
        
                   else: 
        
                       utt_id = str(i) 
        
                       sent = line 
        
                   if utt_id in utt2audio: 
        
                       audio_files.append(utt2audio[utt_id]) 
        
                       text_files.append(sent) 
        
                       output_dirs.append(output_dir / utt_id) 
        
                       utt_ids.append(utt_id)

) >>>

alqalign/alqalign/run.py

Line 107 in 10081d5

when using a scp file, do you need to use a text format to load the file?

alqalign/alqalign/run.py

Lines 60 to 62 in 10081d5

    
           for line in open(audio_file): 
        
               utt_id, ark_key = line.strip().split() 
        
               utt2audio[utt_id] = ark_key

, Is it not possible to just use kaldiio.load_scp to load the file?

The text was updated successfully, but these errors were encountered:

Fhrozen · 2023-04-13T04:09:51Z

I forgot, you need to add this package to your setup.py
transphone loguru
But, not sure why do you need loguru if it logging should be enough (?)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Few questions #1

Few questions #1

Fhrozen commented Apr 13, 2023

Fhrozen commented Apr 13, 2023

Few questions #1

Few questions #1

Comments

Fhrozen commented Apr 13, 2023

Fhrozen commented Apr 13, 2023