Skip to content

baseresearch/Bahasa-Indo-NLP-Dataset

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 

Repository files navigation

Bahasa Indonesia Natural Language Processing (Indo NLP ) Resource

Collection of Bahasa Indonesia (Indonesian) Natural Language Processing (NLP) software libraries, dictionaries, and corpus. Always welcome for pull requests.

Bahasa Indonesia NLP Libraries/Services

Natural Language Toolkit

Library Description Programming Languages License Author & Link
bahasa Pre-alpha development stage NLP toolkit for Bahasa Indonesia Python MIT License (MIT) Sutrisno Efendi

Sentiment Analysis

Library Description Programming Languages License Author & Link
python-sentianalysis-id Sentiment Analysis for Bahasa Indonesia Python yasirutomo

Part of Speech Tagging (POS Tagging)

Library Description Programming Languages License Author & Link
Open NLP POS tagging with predefined training and test data Java yohanesgultom

Named Entity Recognition

Library Description Programming Languages License Author & Link
indonesia-ner Named Entity Recoginition for Bahasa Indonesia Java MIT License (MIT) yusufsyaifudin

Stemmer

Library Description Programming Languages License Author & Link
sastrawi High quality stemmer library for Indonesian Language (Bahasa) PHP MIT License (MIT) sastrawi

Word Embedding

Library Description Programming Languages License Author & Link
indonesian-word-embedding A web application that demonstrates Indonesian word embedding Python galuhsahid

Question Answering (Machine Comprehension)

Service Description Language Author & Link
QA Question Answering System for Bahasa Indonesia Java takin

Dictionaries / Translation Pairs / Parallel Corpus

Library Description Size Features License Link
MALINDO_Morph Morphological dictionary for Malay / Indonesian English-Malay, English-Indonesian CC BY-NC-SA 4.0 TH english
TALPCo The TUFS Asian Language Parallel Corpus Japanese -> Indonesian Creative Commons Attribution 4.0 International (CC BY 4.0) license matbahasa

Downloadable Text Corpus

Library Description Size Features License Link
Indonesian-annotated-conll17 CoNLL Universal Dependency Parsing 29.64 GB Automatic segmentation, tokenization and morphological and syntactic annotations of raw texts, provided for the CoNLL 2017 Shared Task in UD Parsing. CC BY-NC-SA 4.0 TH LINDAT / CLARIN
ID-OpinionWords List of Opinion Words (positive/negative) in Bahasa Indonesia for Sentiment Analysis masdevid
freq-dist-id Most Common Bahasa Words on Twitter, Wikipedia and other sources ardwort
idn-tagged-corpus Indonesian Manually Tagged Corpus Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License famrashel

WordNet Bahasa

Library Description License Link
WordNet Bahasa Wordnet Bahasa, inspired by the Princeton WordNet and the Global WordNet Grid Large scale, freely available, semantic dictionary MIT License (MIT)

Universal Dependency Treebank Bahasa

Library Description License Link
UD_Indonesian-GSD The Indonesian UD is converted from the content head version of the universal dependency treebank v2.0 Query text by genre, domain CC BY-NC-SA 3.0 US

Pre-trained Word Vectors

Pre-trained Model Description Size Dimensions License Link
fastText Skip-Gram model trained on Wikipedia using fastText 300 CC BY-SA 3.0 Facebook + Bin & Text + Text Only
word2vec Indonesian 402MB 300 Indonesian

Grammar Resource Framework

Model Description License Link
INDRA Indonesian Resource Grammar (INDRA) - an implemented HPSG grammar for Indonesian MIT license INDRA

Not found? Try to look at another Bahasa Indonesia NLP Awesome List/Resource (Like this one)

https://github.com/kmkurn/id-nlp-resource

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published