Skip to content

Commit

Permalink
Merge pull request #39 from vunb/dev
Browse files Browse the repository at this point in the history
Add an example to build an NLP API Server
vunb authored May 30, 2018

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
2 parents b46a3de + 437e4eb commit 4254239
Showing 10 changed files with 195 additions and 6 deletions.
53 changes: 52 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -12,10 +12,20 @@ Vietnamese NLP Toolkit for Node
# Installation In A Nutshell

1. Install [Node.js](http://nodejs.org/)
2. Run: `$ npm install -g vntk`
2. Run: `$ npm install vntk --save`

If you are interested in contributing to **vntk**, or just hacking on it, then fork it away!

Jump to guide: [How to build an NLP API Server using Vntk](#nlp-api-server).

# CLI Utilities

Vntk cli will install nice and easy with:

> npm install -g @vntk/cli
Then you need to pay attention how to use these cli utilities to preprocess text from files, especially vietnamese that describe at the end of each apis usage. If you wish to improve the tool, please fork and make it better [here](https://github.com/vntk/vntk-cli).

# API Usage

* [1. Tokenizer](#1-tokenizer)
@@ -34,6 +44,7 @@ If you are interested in contributing to **vntk**, or just hacking on it, then f
* [Naive Bayes](#bayes-classifier)
* [fastText](#fasttext-classifier)
* [9. Language identification](#9-language-identification)
* [10. CRFSuite](#10-crfsuite)

## 1. Tokenizer

@@ -381,6 +392,46 @@ List of supported languages

> af als am an ar arz as ast av az azb ba bar bcl be bg bh bn bo bpy br bs bxr ca cbk ce ceb ckb co cs cv cy da de diq dsb dty dv el eml en eo es et eu fa fi fr frr fy ga gd gl gn gom gu gv he hi hif hr hsb ht hu hy ia id ie ilo io is it ja jbo jv ka kk km kn ko krc ku kv kw ky la lb lez li lmo lo lrc lt lv mai mg mhr min mk ml mn mr mrj ms mt mwl my myv mzn nah nap nds ne new nl nn no oc or os pa pam pfl pl pms pnb ps pt qu rm ro ru rue sa sah sc scn sco sd sh si sk sl so sq sr su sv sw ta te tg th tk tl tr tt tyv ug uk ur uz vec vep vi vls vo wa war wuu xal xmf yi yo yue zh

## 10. CRFSuite

For quick access to `CRFSuite` which shipped with `vntk` we can refer to it via following api.

> var crfsuite = require('vntk').crfsuite()
Then create a `Tagger` or `Trainer`:

```js
var crfsuite = require('vntk').crfsuite()
var tagger = new crfsuite.Tagger()
var trainer = new crfsuite.Trainer()
```

For detail documentation, click [here](https://github.com/vunb/node-crfsuite).

# NLP API Server

Follow these steps to quickly serve an NLP API server using vntk:

```bash
# Clone the repository
git clone https://github.com/vunb/vntk

# Move to source code folder
cd vntk

# Install dependencies
npm install

# Run NLP API server
npm run server

# Copy and paste the following link to your browser to see result in action
# http://localhost:3000/api/tok/Phó Thủ tướng Vương Đình Huệ yêu cầu điều chỉnh tên gọi “trạm thu giá” BOT
```

Detail checkout: [./server](./server)

# Contributing

Pull requests and stars are highly welcome.
11 changes: 11 additions & 0 deletions kites.config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
{
"extensionsLocationCache": true,
"httpPort": 3000,
"logger": {
"console": {
"transport": "console",
"level": "info"
}

}
}
11 changes: 11 additions & 0 deletions lib/vntk.js
Original file line number Diff line number Diff line change
@@ -74,6 +74,10 @@ exports.langid = (modelFileName) => {
}
};

/**
* Get vntk dictionary.
* @param {String} modelFileName path to new updated dictionary
*/
exports.dictionary = (modelFileName) => {
if(modelFileName && fs.existsSync(modelFileName)) {
return new require('@vntk/dictionary').Dictionary(modelFileName)
@@ -82,6 +86,13 @@ exports.dictionary = (modelFileName) => {
}
}

/**
* Get CRFSuite which shipped with vntk.
*/
exports.crfsuite = () => {
return require('crfsuite');
}

// exports class
// Use with CamelCase convention.
exports.TfIdf = require('./tfidf');
11 changes: 7 additions & 4 deletions package.json
Original file line number Diff line number Diff line change
@@ -1,13 +1,14 @@
{
"name": "vntk",
"version": "1.3.0",
"version": "1.4.0",
"description": "Vietnamese NLP Toolkit for Node",
"main": "index.js",
"bin": {
"vntk": "./bin/vntk.js"
},
"scripts": {
"start": "node server/app.js",
"server": "node server/app.js",
"start": "npm run server",
"test": "tape test/start.js | tap-spec"
},
"repository": {
@@ -27,12 +28,14 @@
},
"homepage": "https://github.com/vunb/vntk",
"dependencies": {
"@kites/engine": "^0.1.3",
"@kites/express": "^0.1.4",
"@vntk/dictionary": "^1.0.0",
"async": "2.0.1",
"commander": "2.9.0",
"crfsuite": "^0.9.3",
"crfsuite": "^0.9.4",
"debug": "^3.1.0",
"fasttext": "^0.2.2",
"fasttext": "^0.2.3",
"lodash": "4.15.0",
"title-case": "^2.1.1"
},
19 changes: 19 additions & 0 deletions server/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# NLP API Server

You can easily create a NLP API Server using `Vntk`. What you need to do is decide where to deploy server. Here are some helpful ways to do this.


## NPM Script

At the project directory, run:

* `$ npm start`

Then open your browser or `Postman` to test apis.

* http://localhost:3000/api/tok/xin%20chào%20các%20bạn
* http://localhost:3000/api/pos/xin%20chào%20các%20bạn
* http://localhost:3000/api/chunking/xin%20chào%20các%20bạn
* http://localhost:3000/api/ner/xin%20chào%20các%20bạn

## Docker
61 changes: 61 additions & 0 deletions server/api.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
'use strict';
const vntk = require('../lib/vntk');

const tokenizer = vntk.wordTokenizer();
const posTag = vntk.posTag();
const chunking = vntk.chunking();
const ner = vntk.ner();

// kites extension definition
module.exports = function (kites) {
kites.on('expressConfigure', (app) => {

/**
* API Homepage
*/
app.get('/', (req, res) => {
res.send('This is an example Vntk Server!')
})

/**
* Word Tokenizer
*/
app.get('/api/tok/:text', (req, res) => {
var text = req.param('text')
var format = req.param('format')
var result = tokenizer.tag(text, format)
res.ok(result)
});

/**
* POS Tagging
*/
app.get('/api/pos/:text', (req, res) => {
var text = req.param('text')
var format = req.param('format')
var result = posTag.tag(text, format)
res.ok(result)
})

/**
* Chunking
*/
app.get('/api/chunking/:text', (req, res) => {
var text = req.param('text')
var format = req.param('format')
var result = chunking.tag(text, format)
res.ok(result)
})

/**
* Named Entity Recognition
*/
app.get('/api/ner/:text', (req, res) => {
var text = req.param('text')
var format = req.param('format')
var result = ner.tag(text, format)
res.ok(result)
})

})
}
17 changes: 17 additions & 0 deletions server/app.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
'use strict'
const engine = require('@kites/engine');
const apiRoutes = require('./api')

engine({
loadConfig: true,
discover: true
})
.use(apiRoutes)
.init()
.then(function (kites) {
kites.logger.info('VNTK API Server has initialized!');
})
.catch(function (e) {
console.error(e.stack);
process.exit(1);
})
15 changes: 15 additions & 0 deletions test/specs/crfsuite.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
'use strict';
var test = require('tape'),
path = require('path'),
vntk = require('../../lib/vntk'),
crfsuite = vntk.crfsuite();

test('crfsuite load trained model', function (t) {
t.plan(1);

var tagger = crfsuite.Tagger();
var modelFilename = path.resolve(__dirname, './models/model.bin');
var result = tagger.open(modelFilename);

t.true(result, 'Open model file should be ok!');
})
2 changes: 1 addition & 1 deletion test/specs/ner.js
Original file line number Diff line number Diff line change
@@ -44,7 +44,7 @@ test('load custom model from file (2)', function (t) {
t.deepEqual(tags[7][3], 'I-PER', 'I-PER from new model');
});

test('chucking format text', function (t) {
test('ner format text', function (t) {
t.plan(1);

let text = 'Chưa tiết lộ lịch trình tới Việt Nam của Tổng thống Mỹ Donald Trump';
1 change: 1 addition & 0 deletions test/start.js
Original file line number Diff line number Diff line change
@@ -16,6 +16,7 @@ var dir = '../test/specs/';
'bayes_classifier',
'langid',
'dictionary',
'crfsuite',
].forEach((script) => {
require(path.join(dir, script));
});

0 comments on commit 4254239

Please sign in to comment.