Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Usage with docker image #1

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
*/sftp-config.json
*/*.sublime-*
.*
node_modules/
logs/*
scripts/
.vscode
*/dist
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
node_modules/
data
16 changes: 16 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
FROM node:7.9.0
MAINTAINER Hain Wang <[email protected]>

RUN apt-get update
RUN apt-get install openjdk-7-jdk -yy

RUN npm install -g cnpm --registry=https://registry.npm.taobao.org
RUN /bin/bash -c "mkdir -p /hanlp-api"
COPY . /hanlp-api
WORKDIR /hanlp-api
RUN cnpm install

ENTRYPOINT ["node"]
CMD ["app.js"]

EXPOSE 3001
111 changes: 100 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
[![Docker Pulls](https://img.shields.io/docker/pulls/samurais/hanlp-api.svg?maxAge=2592000)](https://hub.docker.com/r/samurais/hanlp-api/) [![Docker Stars](https://img.shields.io/docker/stars/samurais/hanlp-api.svg?maxAge=2592000)](https://hub.docker.com/r/samurais/hanlp-api/) [![Docker Layers](https://images.microbadger.com/badges/image/samurais/hanlp-api.svg)](https://microbadger.com/#/images/samurais/hanlp-api) [![](https://images.microbadger.com/badges/version/samurais/hanlp-api.svg)](https://microbadger.com/images/samurais/hanlp-api "Get your own version badge on microbadger.com")

HanLP 自然语言处理 for nodejs
=====
* 支持中文分词(N-最短路分词、CRF分词、索引分词、用户自定义词典、词性标注),命名实体识别(中国人名、音译人名、日本人名、地名、实体机构名识别),关键词提取,自动摘要,短语提取,拼音转换,简繁转换,文本推荐,依存句法分析(MaxEnt依存句法分析、CRF依存句法分析)
Expand All @@ -7,20 +9,109 @@ HanLP 自然语言处理 for nodejs
java 1.8
nodejs >= 6

### 安装
npm install
### docker

### 配置
* 配置文件路径 ./lib/src-java/hanLP.proerties
* 请修改 root 为您的目录路径
* build image
```
cd node-hanlp
./scripts/build-docker-image.sh
```

* 词典文件目录 ./data
* 请下载词典 https://pan.baidu.com/s/1pKUVNYF 放入 ./data 目录下
Or pull image
```
docker pull samurais/hanlp-api:1.0.0
```

### 使用
* start container
```
docker run -it --rm -p 3002:3000 samurais/hanlp-api:1.0.0
```

* access service

```
POST /tokenizer HTTP/1.1
Host: localhost:3002
Content-Type: application/json

{
"type": "nlp",
"content": "刘德华和张学友创作了很多流行歌曲"
}

RESPONSE
{
"status": "success",
"data": [
{
"word": "刘德华",
"nature": "nr",
"offset": 0
},
{
"word": "和",
"nature": "cc",
"offset": 0
},
{
"word": "张学友",
"nature": "nr",
"offset": 0
},
{
"word": "创作",
"nature": "v",
"offset": 0
},
{
"word": "了",
"nature": "ule",
"offset": 0
},
{
"word": "很多",
"nature": "m",
"offset": 0
},
{
"word": "流行歌曲",
"nature": "n",
"offset": 0
}
]
}
```

* Other APIs

- tokenizer 分词
- keyword 关键词
- summary 摘要
- phrase 短语提取
- query 关键词、摘要
- conversion 简、繁、拼音转换

[源码](/router.js)

### node module

* Install

```
npm install node-hanlp
```

* Config
- 配置文件路径 node_modules/node-hanlp/lib/src-java/hanLP.proerties
- **请修改root为您的目录路径**

- 词典文件目录 ./data
- 请下载词典 https://pan.baidu.com/s/1pKUVNYF 放入 ./data (约800MB文件) 目录下

* Usage

```js
const Hanlp = require("../lib/index");
const Hanlp = require("node-hanlp");
//分词库初始化及配置
const HanLP = new Hanlp({
CustomDict : true, //使用自定义词典
Expand All @@ -33,8 +124,6 @@ const HanLP = new Hanlp({
let words = HanLP.Tokenizer("商品和服务");
```

API
=====
### 标准分词 HanLP.Tokenizer( text )
@param String text [文本]
@ruten Object
Expand Down
4 changes: 4 additions & 0 deletions index.js
Original file line number Diff line number Diff line change
@@ -1 +1,5 @@
/**
* hanlp toolkit
*/

module.exports = require("./lib/index");
4 changes: 2 additions & 2 deletions lib/hanlp-1.3.2/src-java/hanLP.properties
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#本配置文件中的路径的根目录,根目录+其他路径=绝对路径
#Windows用户请注意,路径分隔符统一使用/
root=/Volumes/www/node-hanlp/
root=/node-hanlp
#核心词典路径
CoreDictionaryPath=data/dictionary/CoreNatureDictionary.txt
#2元语法词典路径
Expand All @@ -26,4 +26,4 @@ HMMSegmentModelPath=data/model/segment/HMMSegmentModel.bin
ShowTermNature=true
#IO适配器,实现com.hankcs.hanlp.corpus.io.IIOAdapter接口以在不同的平台(Hadoop、Redis等)上运行HanLP
#默认的IO适配器如下,该适配器是基于普通文件系统的。
#IOAdapter=com.hankcs.hanlp.corpus.io.FileIOAdapter
#IOAdapter=com.hankcs.hanlp.corpus.io.FileIOAdapter
2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "node-hanlp",
"version": "1.0.0",
"version": "1.0.2",
"description": "HanLP for nodejs",
"main": "index.js",
"dependencies": {},
Expand Down
22 changes: 22 additions & 0 deletions scripts/build-docker-image.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
#! /bin/bash
###########################################
# Build Docker Image
###########################################

# constants
baseDir=$(cd `dirname "$0"`;pwd)
# functions

# main
[ -z "${BASH_SOURCE[0]}" -o "${BASH_SOURCE[0]}" = "$0" ] || return
cd $baseDir/..

# Version key/value should be on his own line
PACKAGE_VERSION=$(cat package.json \
| grep version \
| head -1 \
| awk -F: '{ print $2 }' \
| sed 's/[",]//g' | xargs)

echo $PACKAGE_VERSION
docker build --force-rm=true --tag samurais/hanlp-api:$PACKAGE_VERSION .