Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Few questions #1

Open
Fhrozen opened this issue Apr 13, 2023 · 1 comment
Open

Few questions #1

Fhrozen opened this issue Apr 13, 2023 · 1 comment

Comments

@Fhrozen
Copy link

Fhrozen commented Apr 13, 2023

Thank you very much for your excellent work with this tool.

I am currently trying to use it in ESPnet to generate the alignments for training a model with durations similar to the MFA tool.
However, I am facing some minor issues.

I understand the tool is still under construction, but a little guide will help me continue with my implementation.

  • you need to add a __init__.py at the alqlign folder so you execute the command alqalign.run.
  • In run.py, what is step 1 supposed to do? Transcribe the audio into word/text or just only into phonemes; should it have a similar behavior as step 2?
  • In the case of using step 1, text is no longer required, right? So the argument text becomes unneeded? Is it also necessary to move the text processing after step 1 (

    alqalign/alqalign/run.py

    Lines 70 to 92 in 10081d5

    if text_file.is_dir():
    for text_path in sorted(text_file.glob('*')):
    utt_id = text_path.stem
    if utt_id in utt2audio:
    audio_files.append(utt2audio[utt_id])
    text_files.append(text_path)
    output_dirs.append(output_dir / utt_id)
    utt_ids.append(utt_id)
    else:
    for i, line in enumerate(open(text_file, 'r')):
    if text_format == 'kaldi':
    fields = line.strip().split()
    utt_id = fields[0]
    sent = ' '.join(fields[1:])
    else:
    utt_id = str(i)
    sent = line
    if utt_id in utt2audio:
    audio_files.append(utt2audio[utt_id])
    text_files.append(sent)
    output_dirs.append(output_dir / utt_id)
    utt_ids.append(utt_id)
    ) >>>
  • when using a scp file, do you need to use a text format to load the file?

    alqalign/alqalign/run.py

    Lines 60 to 62 in 10081d5

    for line in open(audio_file):
    utt_id, ark_key = line.strip().split()
    utt2audio[utt_id] = ark_key
    , Is it not possible to just use kaldiio.load_scp to load the file?
@Fhrozen
Copy link
Author

Fhrozen commented Apr 13, 2023

I forgot, you need to add this package to your setup.py
transphone loguru
But, not sure why do you need loguru if it logging should be enough (?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant