Skip to content

This is a java version of Chinese tokenization descried in BERT.

License

Notifications You must be signed in to change notification settings

wzl789139/bert_tokenization_for_java

 
 

Repository files navigation

This is a java version of Chinese tokenization descried in BERT, including basic tokenization and wordpiece tokenization.

Motivation

In production, we usually deploy the BERT related model by tensorflow serving for high performance and flexibility. However, our application may not developed by python. Hence, we have to rewrite the tokenization module.

Usage

Just run Preprocess.java, you can get result. Now, it support single and pair sentence both.

Moreover, for Chinese natural language processing, we add full turn to half angle and uppercase to lowercase operation.

Reporting issues

Please let me know, if you encounter any problems.

About

This is a java version of Chinese tokenization descried in BERT.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 100.0%