Bahasa Indonesia Natural Language Processing (Indo NLP ) Resource
Collection of Bahasa Indonesia (Indonesian) Natural Language Processing (NLP) software libraries, dictionaries, and corpus.
Always welcome for pull requests.
Bahasa Indonesia NLP Libraries/Services
Library
Description
Programming Languages
License
Author & Link
bahasa
Pre-alpha development stage NLP toolkit for Bahasa Indonesia
Python
MIT License (MIT)
Sutrisno Efendi
Library
Description
Programming Languages
License
Author & Link
python-sentianalysis-id
Sentiment Analysis for Bahasa Indonesia
Python
yasirutomo
Part of Speech Tagging (POS Tagging)
Library
Description
Programming Languages
License
Author & Link
Open NLP
POS tagging with predefined training and test data
Java
yohanesgultom
Library
Description
Programming Languages
License
Author & Link
indonesia-ner
Named Entity Recoginition for Bahasa Indonesia
Java
MIT License (MIT)
yusufsyaifudin
Library
Description
Programming Languages
License
Author & Link
sastrawi
High quality stemmer library for Indonesian Language (Bahasa)
PHP
MIT License (MIT)
sastrawi
Library
Description
Programming Languages
License
Author & Link
indonesian-word-embedding
A web application that demonstrates Indonesian word embedding
Python
galuhsahid
Question Answering (Machine Comprehension)
Service
Description
Language
Author & Link
QA
Question Answering System for Bahasa Indonesia
Java
takin
Dictionaries / Translation Pairs / Parallel Corpus
Library
Description
Size
Features
License
Link
MALINDO_Morph
Morphological dictionary for Malay / Indonesian
English-Malay, English-Indonesian
CC BY-NC-SA 4.0 TH
english
TALPCo
The TUFS Asian Language Parallel Corpus
Japanese -> Indonesian
Creative Commons Attribution 4.0 International (CC BY 4.0) license
matbahasa
Library
Description
Size
Features
License
Link
Indonesian-annotated-conll17
CoNLL Universal Dependency Parsing
29.64 GB
Automatic segmentation, tokenization and morphological and syntactic annotations of raw texts, provided for the CoNLL 2017 Shared Task in UD Parsing.
CC BY-NC-SA 4.0 TH
LINDAT / CLARIN
ID-OpinionWords
List of Opinion Words (positive/negative) in Bahasa Indonesia for Sentiment Analysis
masdevid
freq-dist-id
Most Common Bahasa Words on Twitter, Wikipedia and other sources
ardwort
idn-tagged-corpus
Indonesian Manually Tagged Corpus
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License
famrashel
Library
Description
License
Link
WordNet Bahasa
Wordnet Bahasa, inspired by the Princeton WordNet and the Global WordNet Grid
Large scale, freely available, semantic dictionary
MIT License (MIT)
Universal Dependency Treebank Bahasa
Library
Description
License
Link
UD_Indonesian-GSD
The Indonesian UD is converted from the content head version of the universal dependency treebank v2.0
Query text by genre, domain
CC BY-NC-SA 3.0 US
Pre-trained Model
Description
Size
Dimensions
License
Link
fastText
Skip-Gram model trained on Wikipedia using fastText
300
CC BY-SA 3.0
Facebook + Bin & Text + Text Only
word2vec Indonesian
402MB
300
Indonesian
Grammar Resource Framework
Model
Description
License
Link
INDRA
Indonesian Resource Grammar (INDRA) - an implemented HPSG grammar for Indonesian
MIT license
INDRA
Not found? Try to look at another Bahasa Indonesia NLP Awesome List/Resource (Like this one)
https://github.com/kmkurn/id-nlp-resource