You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I wonder if we should not normalize unicode as part of our Atlas data prep. I was looking on line about how to do it and found this code from some guy named Tauber .... @jtauber@lcerrato@AlisonBabeu
from unicodedata import normalize
curword = normalize("NFC",m[1])
My thinking:
Anything in our repos should probably be normalized (e.g., the Greek from the Greco-Arabic corpus).
Anything we import into Atlas, we should normalize. That would imply some code in the Atlas data prep pipeline (I think)
Thoughts?
The text was updated successfully, but these errors were encountered:
I wonder if we should not normalize unicode as part of our Atlas data prep. I was looking on line about how to do it and found this code from some guy named Tauber ....
@jtauber @lcerrato @AlisonBabeu
from unicodedata import normalize
curword = normalize("NFC",m[1])
My thinking:
Thoughts?
The text was updated successfully, but these errors were encountered: