Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't find language "eng" when converting #21

Closed
dhdaines opened this issue Nov 7, 2019 · 11 comments
Closed

Can't find language "eng" when converting #21

dhdaines opened this issue Nov 7, 2019 · 11 comments

Comments

@dhdaines
Copy link
Collaborator

dhdaines commented Nov 7, 2019

Not sure what's going on here, since the log indicates that it has definitely found the languages.

Also, is there some good reason why the lists of mappings are now stored in binary pickle files instead of, say, just looking them up in the filesystem like we were doing before? It seems brittle and opaque to me.

INFO - Adding mapping between eng-ipa and eng-arpabet to composite transducer.
ERROR - No lang called eng. Please try again.
Traceback (most recent call last):
  File "/home/dhd/py/readalongs3.7/bin/readalongs", line 11, in <module>
    load_entry_point('readalongs', 'console_scripts', 'readalongs')()
  File "/home/dhd/py/readalongs3.7/lib/python3.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/dhd/py/readalongs3.7/lib/python3.7/site-packages/flask/cli.py", line 557, in main
    return super(FlaskGroup, self).main(*args, **kwargs)
  File "/home/dhd/py/readalongs3.7/lib/python3.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/dhd/py/readalongs3.7/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/dhd/py/readalongs3.7/lib/python3.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/dhd/py/readalongs3.7/lib/python3.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/dhd/py/readalongs3.7/lib/python3.7/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/home/dhd/py/readalongs3.7/lib/python3.7/site-packages/flask/cli.py", line 412, in decorator
    return __ctx.invoke(f, *args, **kwargs)
  File "/home/dhd/py/readalongs3.7/lib/python3.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/dhd/work/ReadAlong-Studio/readalongs/cli.py", line 103, in align
    if kwargs['save_temps'] else None))
  File "/home/dhd/work/ReadAlong-Studio/readalongs/align.py", line 109, in align_audio
    xml = convert_xml(xml)
  File "/home/dhd/work/ReadAlong-Studio/readalongs/text/convert_xml.py", line 179, in convert_xml
    convert_words(xml_copy, word_unit, output_orthography)
  File "/home/dhd/work/ReadAlong-Studio/readalongs/text/convert_xml.py", line 126, in convert_words
    converter = make_g2p(unit['lang'], output_orthography)
  File "/home/dhd/py/readalongs3.7/lib/python3.7/site-packages/g2p/__init__.py", line 132, in make_g2p
    raise(FileNotFoundError)
FileNotFoundError

@roedoejet
Copy link
Owner

Hey David,

There's two parts to this, the error you're getting is because there's no mapping in g2p between eng and eng-arpabet and that's because the g2p that we currently use for eng is lexicon-based and g2p only does the rule-based. There's an issue in the RAS repo (https://github.com/dhdaines/ReadAlong-Studio/issues/15) about it, but I seemed a little unclear about whether we were going to go ahead with that solution or not.

The second part is about the pickle. I agree that this isn't a great solution/is brittle - part of the reasoning was that I was trying to make it a little faster and future-proof it for when we have possibly 100+ mappings. RAS only accepts conversion languages that have a valid path to eng-arpabet and I'm currently using the networkx package to create the graph of all of the mappings. That graph takes longer to create than I would like at runtime so that's why I save it as a pickle. For the g2p library, I want to be able to access some of the mapping metadata without having to loop through the entire filesystem.

In any case, the solution to the first part could be either:

  1. Move the LexiconG2P over to g2p (I'm not too keen on this because the whole g2p studio works on an assumption of rule-based g2p)

  2. Put an if statement that uses the RAS LexiconG2P if the input language is eng

  3. Implement an entirely other g2p, like the one in issue 15

  4. Add a rule-based g2p table for eng in g2p

@roedoejet
Copy link
Owner

I know your demo is today so hopefully you got whatever you needed working for that. I'm around if you want to talk and fix this up quickly if it's necessary for your demo.

@dhdaines
Copy link
Collaborator Author

dhdaines commented Nov 7, 2019

Hi! Luckily it wasn't actually a demo... I guess I wasn't aware that the effect of https://github.com/dhdaines/ReadAlong-Studio/pull/8 was going to be to completely break Readalong-Studio (since we always have to convert things to eng-arpabet at the moment...) Or am I incorrect in assuming that it doesn't work at all?

@dhdaines
Copy link
Collaborator Author

dhdaines commented Nov 7, 2019

The issue is that it doesn't work even when no actual English G2P is required, as far as I can tell...

@roedoejet
Copy link
Owner

Ah, no, well then something else is happening. It works for me with every language other than an English readalong. That is, it works for eng-ipa, and eng-arpabet, just not eng, because there is no rule-based eng -> eng-ipa lookup table.

@roedoejet
Copy link
Owner

What is the input text that you're using?

@dhdaines
Copy link
Collaborator Author

dhdaines commented Nov 7, 2019

The Chatino counting demo...

(readalongs3.7) dhd@minipc:~/work/ReadAlong-Samples/ctp/counting$ readalongs align counting.xml counting_mono.wav counting_aligned
...
INFO - Adding mapping between ctp-ipa and eng-ipa to composite transducer.
INFO - Adding mapping between eng-ipa and eng-arpabet to composite transducer.
ERROR - No lang called eng. Please try again.

@dhdaines
Copy link
Collaborator Author

dhdaines commented Nov 7, 2019

oh. I see, it does have English in it! oops :)

@dhdaines
Copy link
Collaborator Author

dhdaines commented Nov 7, 2019

did you actually make it work in the past? I think Pat mentioned it was a good one to use, which is why I tried it...

@roedoejet
Copy link
Owner

roedoejet commented Nov 7, 2019

Yes, it's just the name "Skyler" that is "English" so just remove the xml:lang="eng" and it should align.

@dhdaines
Copy link
Collaborator Author

dhdaines commented Nov 7, 2019

haha ok!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants