Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thanglish Word Issue #1

Open
josephmiller2000 opened this issue Aug 24, 2021 · 7 comments
Open

Thanglish Word Issue #1

josephmiller2000 opened this issue Aug 24, 2021 · 7 comments

Comments

@josephmiller2000
Copy link

Snap_Shot_02419

enna = என்ன
en = என்
na = ன

@subins2000
Copy link
Member

I don't know Tamil, so can't really fix this problem. This must be a problem with the transliteration scheme. Pinging @Kishore96in since he's familiar with this.

The "How To Write A Word" is actually reverse transliteration. It shouldn't give output for english words. It's a bug which I've fixed in subins2000/varnamd@7563a9e

@josephmiller2000
Copy link
Author

Snap_Shot_02425

Ok, i'm just listing out the issue.

@Kishore96in
Copy link

Kishore96in commented Aug 25, 2021

This issue should be fixed by the changes to the scheme file in varnamproject/libvarnam#152

This is what I get on my system (with the scheme file from that MR):
Screenshot_20210825_182121_crop

@subins2000
Copy link
Member

Thank you for confirming it @Kishore96in . Have merged it to GoVarnam. The changes are now live at https://varnam.subinsb.com as well

@Kishore96in
Copy link

The issue still seems to be reproducible at https://varnam.subinsb.com . @subins2000 Are you not using any wordlist to train that instance for Tamil? If you are interested, I can provide the wordlist that I am using to train varnam.

The 'canonical' way to type 'என்ன' would be 'ennnna', but of course this is not intuitive.

The 'root cause' is that like many other Indian languages, Tamil has multiple sounds which would get mapped to the same English string 'na'. The workaround used in the scheme file for Tamil was to map these sounds to 'na', 'Na', 'nna', and so on (I don't know what the other languages do). In an attempt to allow more 'natural' input, I had modified the scheme file so that all these sounds also have 'na' as a 'secondary' transliteration (the ones inside the nested square brackets). Even with the changes, varnam only shows such suggestions if it is trained with a wordlist (before the changes to the scheme, varnam would not show such suggestions even after learning from a wordlist). Is there some better way to implement this?

To summarize, completely fixing this issue would require changes to the scheme file (already merged) and training with a wordlist.

@subins2000
Copy link
Member

Thank you for the explanation. It makes more sense now. It's kind of difficult to understand since I don't know about the language much.

The issue still seems to be reproducible at https://varnam.subinsb.com . @subins2000 Are you not using any wordlist to train that instance for Tamil? If you are interested, I can provide the wordlist that I am using to train varnam.

In the server https://varnam.subinsb.com there were no words in dictionary except for Malayalam. I have now imported some 1 lakh mostly words for Tamil. The suggestion என்ன now comes for "enna" but it's at 7th. Do you have a good word corpus or is this alright ?

The 'canonical' way to type 'என்ன' would be 'ennnna', but of course this is not intuitive.

How many sounds are there for na in Tamil ? In Malayalam there are (this mapping is the same in Malayalam varnam scheme as well) :

  • na (single) - ന
  • nna (double na) - ന്ന
  • Na (single) - ണ
  • NNa (double Na) - ണ്ണ
  • n or n_ (chill of na) - ൻ

From looking at the Tamil scheme file ன் is mentioned as chill letter. Is it so ? In the malayalam scheme, to bring chillaksharam in between words, we use an underscore after n_. This was a recent change. Usually in Malayalam chillaksharam don't come in between words with rare exceptions. Is that the same for Tamil as well ?

@subins2000 subins2000 reopened this Aug 26, 2021
@Kishore96in
Copy link

How many sounds are there for na in Tamil ?

In Tamil, for 'na', we have
ந - tongue touches teeth
ன - tongue touches alveolar ridge
ண - tongue is slightly curled backwards

As far as I understand, it seems ந and ன are both denoted by ന in Malayalam. The double na-s which you mention would be written in Tamil as ன்ன and ண்ண, i.e. we don't have dedicated conjoined characters to represent those.

From looking at the Tamil scheme file ன் is mentioned as chill letter. Is it so ?

I don't completely understand the concept of 'chill letters' in Malayalam, but it seems to be a variation of other letters that appears only at the end of words. If so, I don't think that concept exists in Tamil.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants