Skip to content

Commit

Permalink
more gentle russian word extraction (to extract "пациент" from "пацие…
Browse files Browse the repository at this point in the history
…нт1")
  • Loading branch information
Илья Лебедев committed Jul 29, 2019
1 parent 8da67ee commit 7bdaa71
Showing 1 changed file with 6 additions and 1 deletion.
7 changes: 6 additions & 1 deletion rozental_as_a_service/rozental.py
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,12 @@ def extract_words(raw_constants: List[str], min_word_length: int = 3, only_russi
})
processed_words = list(set(processed_words))
if only_russian:
processed_words = [w for w in processed_words if re.match(r'[а-я-]+', w)]
russian_words = []
for word in processed_words:
match = re.match(r'[а-яйё-]+', word)
if match:
russian_words.append(match.group())
processed_words = russian_words
return processed_words


Expand Down

0 comments on commit 7bdaa71

Please sign in to comment.