-
Notifications
You must be signed in to change notification settings - Fork 7
ConcretePlanning
Alex Rudnick edited this page Oct 1, 2013
·
3 revisions
Alex discussing with Mike about the system and the LREC paper we want to write about it.
- input verification: is the user putting in obviously wrong text?
- character set identification
- unicode checks
- langid -- maybe with langid.py by default?
- sentence segmenters for your language: this should be pluggable.
- check out OmegaT's segmenters and how they work.
- Also text normalization routines. Let's write about GuaraniTextProcessing in a separate page...
- can we just use oauth or whatever? how does it work to log in to, eg, AskBot?
- Will they prevent spam?
- devil's advocate: could we just include a captcha?
- what do we want to keep track of, for each user?
- Can you anonymously upload documents?
- Can you anonymously provide translations?
- Other sites (installs of Guampa) might want to do political things. There's that site for leftist translators already -- what is it? Does Mike have the link?
- tag: who's the intended audience?
- tag: what is the genre of this document?
- maybe the interface should be in the document target language
- this should be configurable for sure. Is there support for l10n in angular builtin?
- For me and Mike, for example, it would be easy to translate Spanish articles into English, and easiest for us if the interface is in English.
- How do we prevent (or at least channel) edit wars?
- Well, one thing is Guampa is novel in part because it's FOSS and there doesn't seem to be a good FOSS tool for this.
- unlike tatoeba:
- we are for TM, MT, and CAT (in the long run). But we're explicitly about MT.
- We are for translating documents.
- unlike traduwiki:
- we are for MT and have proper sentence segmentation
- traduwiki is for translating a document too.
- deep question: what's the benefit for volunteer translators?
- Well, we do have activists and students...
- why is this different than Pootle?
- Pootle's interface is kind of nonobvious, and it's meant for UI strings, it seems like. Not documents? And it's not for reading.
- but it does have terminology come up...
- canned data?
- maybe we should use it ourselves, translate some wikipedia documents es->en maybe.