Skip to content

jakartaresearch/word-segmentation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

Indonesian Word Segmentation


IWS is a research project that focuses on word segmentation in Bahasa. According to Wikipedia, word segmentation is also known as text segmentation is the process of separating the text into meaningful units, such as Words, Sentence, or Topics. The problem is non-trivial because while some written languages have explicit word boundary markers, such as the word spaces of written English and the distinctive initial, medial and final letter shapes of Arabic, such signals are sometimes ambiguous and not present in all written languages.

Bahasa has a straight boundary between words, but we found that some people write unconsciously for their writtings. Therefore, this research aims to help writers for better writings.

Why IWS: IWS

  • focus on seperating words without dictionary
  • Enables you to correct indonesian text with easy load

Project Trees:

|--data 
    |--raw
    |--clean
|--notebook
|--reports
|--models
|--requirements.txt
|--README.md

Instalation


Instalation Requirements:

pip install -r requirements.txt

About

research and dataset for word segmentation Indonesia

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published