Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow loading patterns and exceptions from files or an array reference #1

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

bpj
Copy link

@bpj bpj commented Nov 22, 2016

The place to find suitable files is
http://mirror.ctan.org/language/hyph-utf8/tex/generic/hyph-utf8/patterns/txt/
right from the people who maintain TeX hyphenation files.

*The* place to find suitable files is
<http://mirror.ctan.org/language/hyph-utf8/tex/generic/hyph-utf8/patterns/txt/>
right from the people who maintain TeX hyphenation files.
@PhilterPaper
Copy link

Indeed, there is a large supply of hyphenation pattern files (and exception lists) there at ctan.org. Now, what would be the most useful way to make this available to users of Text::Hyphen? I suspect that most will not want the overhead (and failures due to connectivity problems) of directly reading from the CTAN site every time you use Text::Hyphen. I'm not sure that packaging all these files into Text::Hyphen (rather than individual Text::Hyphen::XX modules) would be good (too bulky, and most languages will not be used by any given user). Is there a way to subscribe to pattern/exception files for those languages one is interested in? Perhaps then build a local Text::Hyphen::XX on the fly, or add it to/ update Text::Hyphen?

I think it would be useful to extend Text::Hyphen::new() with a $lang language option (default, en_US) to bring in the appropriate pattern/exception files for this language, in the manner of Text::Hyphen::XX, for this hyphenation object. That way, you could hyphenate multiple languages in one session with multiple hyphenation objects, if desired. $lang might then trigger pulling in the pattern and exception files from a local library under Text::Hyphen. It would be up to the owner of the system to maintain these files in sync with CTAN (I doubt that they will change very often). There would be no need to maintain separate Text::Hyphen::XX modules. Perhaps Text::Hyphen could look in a local cache for the desired language content, and if not found, refresh it from CTAN? It might be good to have a local utility to check with CTAN and refresh any changed language files, as well as load any new ones on request.

Text/
   Hyphen/
      existing code
      languages/
         en_US/    shipped with Text::Hyphen
         de_DE/    optional German pattern/exceptions, etc.

OK, enough coffee-fueled ramblings for today!

@PhilterPaper
Copy link

PhilterPaper commented Nov 19, 2020

  1. I vaguely recall that languages such as German (DE) have some strange behavior at word splits, doubling one of the letters or something. Does anyone know if the Text::Hyphen package handles that correctly? Is it specified in the CTAN and Text::Hyphen::DE patterns? Even if this behavior has been dropped in current German orthography, I'm sure that some people will want to be able to do it for older texts. It's not so much a matter of figuring where the hyphenation point is, as it is which letter to double (and where), and to account for this in fragment lengths.
  2. This winter I might have some time to do some extensions to this package to implement the CTAN read, and come up with a PR. But first, I need to know if KAPPA would be happy to merge such a PR. I'll open an issue to draw attention to this discussion (being in a PR, it may be hidden from many).
  3. In package TeX::Hyphen there is a parameter file for specifying the file (apparently combined pattern and exceptions), and a parameter style that has something to do with the language. The documentation mumbles something about language-specific shortcuts, but I haven't explored it. I should probably do a thorough comparison with Text::Hyphen before investing any more time in either.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants