Skip to content

Commit

Permalink
usage instructions, requirements
Browse files Browse the repository at this point in the history
  • Loading branch information
matbahasa authored May 4, 2021
1 parent ff883b9 commit 6cbdd64
Showing 1 changed file with 351 additions and 0 deletions.
351 changes: 351 additions & 0 deletions contoh_penggunaan.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,351 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Contoh penggunaan `morph_analyzer.py`\n",
"\n",
"Pakej yang diperlukan: `pyspellchecker` (Dapatkan di https://pypi.org/project/pyspellchecker/ jika tidak termasuk dalam sistem anda.)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"import pickle\n",
"with open(\"rootlist.pkl\", \"rb\") as f:\n",
" rootlist = pickle.load(f)\n",
"import morph_analyzer as ma"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Tanpa kamus MALINDO Morph"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{('kait', 'keberkaitananlah', '0', '-lah', 'ber--an+ke--an', '0'),\n",
" ('kait', 'keberkaitananlah', 'ber-', '-an+-lah', 'ke--an', '0'),\n",
" ('kait', 'keberkaitananlah', 'ke-', '-an+-lah', 'ber--an', '0')}"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ma.morph(\"keberkaitananlah\", rootlist)"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{('nyampa', 'nyampai', '0', '-i', '0', '0')}"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ma.morph(\"nyampai\", rootlist)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Parameter `Indo`\n",
"\n",
"Parameter `Indo` mengaktifkan awalan _N-_ (cth. _N-_ + _kopi_ = _ngopi_). "
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{('campa', 'nyampai', 'N-', '-i', '0', '0'),\n",
" ('sampai', 'nyampai', 'N-', '0', '0', '0')}"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ma.morph(\"nyampai\", rootlist, Indo=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Parameter `n`\n",
"\n",
"Parameter `n` mengawal jumlah calon yang dihasilkan. Nilai lalainya ialah 10."
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{('bibkah', 'mengebibkah', 'ke-+meN-', '0', '0', '0'),\n",
" ('ebib', 'mengebibkah', 'meN-', '-kah', '0', '0'),\n",
" ('ebibkah', 'mengebibkah', 'meN-', '0', '0', '0'),\n",
" ('kebib', 'mengebibkah', 'meN-', '-kah', '0', '0'),\n",
" ('ngebib', 'mengebibkah', 'meN-', '-kah', '0', '0')}"
]
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ma.morph(\"mengebibkah\", rootlist)"
]
},
{
"cell_type": "code",
"execution_count": 92,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{('bib', 'mengebibkah', 'meN-', '-kah', '0', '0'),\n",
" ('bibkah', 'mengebibkah', 'ke-+meN-', '0', '0', '0'),\n",
" ('ebib', 'mengebibkah', 'meN-', '-kah', '0', '0'),\n",
" ('ebibkah', 'mengebibkah', 'meN-', '0', '0', '0'),\n",
" ('kebib', 'mengebibkah', 'meN-', '-kah', '0', '0'),\n",
" ('kebibkah', 'mengebibkah', 'meN-', '0', '0', '0'),\n",
" ('mengebib', 'mengebibkah', '0', '-kah', '0', '0'),\n",
" ('mengebibkah', 'mengebibkah', '0', '0', '0', '0'),\n",
" ('ngebib', 'mengebibkah', 'meN-', '-kah', '0', '0'),\n",
" ('ngebibkah', 'mengebibkah', 'meN-', '0', '0', '0')}"
]
},
"execution_count": 92,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ma.morph(\"mengebibkah\", rootlist, n=15)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Bersama dengan kamus MALINDO Morph\n",
"\n",
"Walaupun `morph_analyser.py` boleh digunakan secara sendirian, adalah lebih realistik untuk menggunakannya bersama dengan kamus MALINDO Morph yang analisis morfologinya sudah diperiksan oleh manusia. Dalam contoh kod di bawah, `morph_analyser.py` digunakan hanya apabila perkataan yang ingin dianalisis tidak termasuk dalam kamus MALINDO Morph."
]
},
{
"cell_type": "code",
"execution_count": 105,
"metadata": {},
"outputs": [],
"source": [
"# Buat kamus daripada MALINDO Morph\n",
"with open(\"malindo_dic_20200917.tsv\", \"r\", encoding=\"utf-8\") as f: #Gunakan versi terkini MALINDO Moprh\n",
" katakata = []\n",
" for l in f:\n",
" items = l.strip().split(\"\\t\")\n",
" katakata.append(tuple(items[1:7])) #tanpa ID, dasar, lema \n",
"\n",
"kamus = dict()\n",
"for kata in katakata:\n",
" surface = kata[1]\n",
" if not surface in kamus.keys():\n",
" kamus[surface] = []\n",
" kamus[surface].append(kata)"
]
},
{
"cell_type": "code",
"execution_count": 100,
"metadata": {},
"outputs": [],
"source": [
"def analisis(w, Indo=False, n=5):\n",
" try:\n",
" return kamus[w][:n]\n",
" except:\n",
" return ma.morph(w, rootlist, Indo, n)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Perkataan yang ada dalam kamus MALINDO Morph "
]
},
{
"cell_type": "code",
"execution_count": 86,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[('pos', 'mengeposkan', 'meN-', '-kan', '0', '0')]"
]
},
"execution_count": 86,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"analisis(\"mengeposkan\")"
]
},
{
"cell_type": "code",
"execution_count": 103,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[('mereka', 'mereka', '0', '0', '0', '0'),\n",
" ('reka', 'mereka', 'meN-', '0', '0', '0')]"
]
},
"execution_count": 103,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"analisis(\"mereka\")"
]
},
{
"cell_type": "code",
"execution_count": 104,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[('mereka', 'mereka', '0', '0', '0', '0')]"
]
},
"execution_count": 104,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"analisis(\"mereka\", n=1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Perkataan yang tidak ada dalam kamus MALINDO Morph"
]
},
{
"cell_type": "code",
"execution_count": 101,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{('epob', 'mengepobkan', 'meN-', '-kan', '0', '0'),\n",
" ('epobk', 'mengepobkan', 'meN-', '-an', '0', '0'),\n",
" ('kepob', 'mengepobkan', 'meN-', '-kan', '0', '0'),\n",
" ('ngepob', 'mengepobkan', 'meN-', '-kan', '0', '0'),\n",
" ('pobkan', 'mengepobkan', 'ke-+meN-', '0', '0', '0')}"
]
},
"execution_count": 101,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"analisis(\"mengepobkan\")"
]
},
{
"cell_type": "code",
"execution_count": 102,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{('epob', 'mengepobkan', 'meN-', '-kan', '0', '0'),\n",
" ('epobk', 'mengepobkan', 'meN-', '-an', '0', '0'),\n",
" ('kepob', 'mengepobkan', 'meN-', '-kan', '0', '0')}"
]
},
"execution_count": 102,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"analisis(\"mengepobkan\", n=3)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.7"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

0 comments on commit 6cbdd64

Please sign in to comment.