Skip to content

Latest commit

 

History

History
41 lines (27 loc) · 1.27 KB

README.md

File metadata and controls

41 lines (27 loc) · 1.27 KB

vncorenlp-wrapper

PyPI GitHub release PyPI - Python Version

vncorenlp-wrapper is a Python wrapper for VnCoreNLP. It provides a simple API for text processing tasks such as Tokenization, Part of Speech Tagging, Named Entity Reconigtion, Constituency Parsing, Dependency Parsing.

Prerequisites

Java 1.8+ (Check with command: java -version) (Download Page)

VnCoreNLP (Download Page)

Py4j (Download Page)

Installation

Clone this repository, then put VnCoreNLP-1.1.jar and models directory of VnCoreNLP in the same working folder.

Example

Simple Usage

# Simple usage
from vncorenlp import VnCoreNLP

# simple usage
nlp = VnCoreNLP("/home/workspace/tokenizer/VnCoreNLP-1.1.jar")

sentence = 'xin chào các bạn!'

tokens = nlp.word_tokenizer(sentence)

print(tokens)

Output format:

# Tokenize
['xin', 'chào', 'các', 'bạn', '!']