vncorenlp-wrapper
is a Python wrapper for VnCoreNLP. It provides a simple API for text processing tasks such as Tokenization, Part of Speech Tagging, Named Entity Reconigtion, Constituency Parsing, Dependency Parsing.
Java 1.8+ (Check with command: java -version
) (Download Page)
VnCoreNLP (Download Page)
Py4j (Download Page)
Clone this repository, then put VnCoreNLP-1.1.jar and models directory of VnCoreNLP in the same working folder.
# Simple usage
from vncorenlp import VnCoreNLP
# simple usage
nlp = VnCoreNLP("/home/workspace/tokenizer/VnCoreNLP-1.1.jar")
sentence = 'xin chào các bạn!'
tokens = nlp.word_tokenizer(sentence)
print(tokens)
Output format:
# Tokenize
['xin', 'chào', 'các', 'bạn', '!']