-
Notifications
You must be signed in to change notification settings - Fork 6
Text Cleaning
Yimin Jing edited this page Feb 18, 2022
·
7 revisions
import takin
zh_text = "中国是一个美丽的地方\n请告诉我你在哪儿。\n我一定会去找你\t在我的怀里\t在你的眼里"
en_text = "Today is sunday\nwe are happy\nwe are fun."
print(takin.delete_escape_character(zh_text, lang="zh", add_punc=False))
print(takin.delete_escape_character(zh_text, lang="zh", add_punc=True))
print(takin.delete_escape_character(en_text, lang="en", add_punc=False))
print(takin.delete_escape_character(en_text, lang="en", add_punc=True))
>>> 中国是一个美丽的地方请告诉我你在哪儿。我一定会去找你在我的怀里在你的眼里
>>> 中国是一个美丽的地方。请告诉我你在哪儿。我一定会去找你。在我的怀里。在你的眼里
>>> Today is sundaywe are happywe are fun.
>>> Today is sunday. we are happy. we are fun.
zh_text = "我 们 都非 常快 乐 。 "
en_text = "Takin , is very useful . "
print(takin.delete_extra_whitespace(zh_text, lang="zh"))
print(takin.delete_extra_whitespace(en_text, lang="en"))
>>> 我们都非常快乐。
>>> Takin, is very useful.