Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proper name removal? #2

Open
eoghanmurray opened this issue Jun 28, 2019 · 1 comment
Open

proper name removal? #2

eoghanmurray opened this issue Jun 28, 2019 · 1 comment

Comments

@eoghanmurray
Copy link

Was wondering why 'dobhar' was appearing so high up in the list and after puzzling over the dictionary entries on focloir & teanglann, I remembered that Gaoth Dobhair would likely be a common Gaeltacht placename mentioned in the source texts. Just want to mention it as an issue if others' use this repository and add a query as to whether proper names were correctly identified? (I know Gaillimh is in the list and kept capitalized which is fine)

@michmech
Copy link
Owner

Yes, this is a problem that happens a lot when lemmatizing Irish-language texts. Irish-language placenames often consist of normal, perfectly meaningful words. It is difficult to (automatically) separate the occurrences of such words inside placenames from their occurrence outside placenames. This messes up the frequency statistics a bit, especially for frequent placenames.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants