Skip to content

KOrean Rpc-based Handy Application for Language processing

Notifications You must be signed in to change notification settings

ingtranet/korhal

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Korhal

GitHub tag (latest SemVer) PyPI Travis (.com) branch Codacy branch grade

Korhal(KOrean Rpc-based Handy Application for Language-processing) is a python wrapper for several korean Part-Of-Speech taggers.

How to install

pip install korhal

Available taggers

  • KOMORAN with korhal.komoran
  • Hannanum with korhal.hannanum
  • Open-source Korean Text Processor with korhal.openkoreantext

How to use

from korhal.komoran import tokenize

result = tokenize("집에 가서 잠을 자고 싶다")
# result => Token(text=집,pos=NNG), Token(text=에,pos=JKB), Token(text=가,pos=VV), Token(text=아서,pos=EC), Token(text=잠,pos=NNG), Token(text=을,pos=JKO), Token(text=자,pos=VV), Token(text=고,pos=EC), Token(text=싶,pos=VX), Token(text=다,pos=EC)]
print(result.text) # => 집
print(result.pos) # => NNG

nouns = [token.text for token in result if token.pos.startswith('N')]

Asynchronous methods

With korhal.aio, you can use asynchronous methods. The performance of multi-core systems can be slightly improved when performing extensive processing.

from korhal.aio.opentextkorean import tokenize
 
texts = ['달디단 맛있는 케이크가 있었다', '솜사탕 같이 귀여운 구름']
futures = [tokenize(text) for text in texts]
results = [f.result() for f in futures]

Thanks to