Skip to content

Latest commit

 

History

History
12 lines (8 loc) · 633 Bytes

README.md

File metadata and controls

12 lines (8 loc) · 633 Bytes

Creating a Language Model for Singlish

Singlish is a mixture of english, chinese, malay and dialect. It is am unique and efficient way that Singaporeans used to communicate with each other. It does not adhere to any grammatical rules. Today, I attempt to build a Language model using the corpus of more than 67,000 SMS messages in sent in Singapore.

Data Source

A corpus of more than 67,000 SMS messages in Singapore English & Mandarin. (c) The National University of Singapore SMS Corpus https://www.kaggle.com/datasets/rtatman/the-national-university-of-singapore-sms-corpus?resource=download

Set up

Run application