Skip to content

Commit

Permalink
release 4.0
Browse files Browse the repository at this point in the history
  • Loading branch information
huseinzol05 committed Nov 16, 2020
1 parent e356fc9 commit 9a771e5
Show file tree
Hide file tree
Showing 15 changed files with 1,359 additions and 130 deletions.
2 changes: 1 addition & 1 deletion README-pypi.rst
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ Features
Provide Zero-shot classification interface using Transformer-Bahasa to recognize texts without any labeled training data.
- **Hybrid 8-bit Quantization**

Provide hybrid 8-bit quantization for all models to reduce speed inference up to 2x and model size up to 4x.
Provide hybrid 8-bit quantization for all models to reduce inference time up to 2x and model size up to 4x.

Pretrained Models
------------------
Expand Down
2 changes: 1 addition & 1 deletion README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ Features
Provide Zero-shot classification interface using Transformer-Bahasa to recognize texts without any labeled training data.
- **Hybrid 8-bit Quantization**

Provide hybrid 8-bit quantization for all models to reduce speed inference up to 2x and model size up to 4x.
Provide hybrid 8-bit quantization for all models to reduce inference time up to 2x and model size up to 4x.

Pretrained Models
------------------
Expand Down
2 changes: 1 addition & 1 deletion docs/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ Features
Provide Zero-shot classification interface using Transformer-Bahasa to recognize texts without any labeled training data.
- **Hybrid 8-bit Quantization**

Provide hybrid 8-bit quantization for all models to reduce speed inference up to 2x and model size up to 4x.
Provide hybrid 8-bit quantization for all models to reduce inference time up to 2x and model size up to 4x.

Pretrained Models
------------------
Expand Down
11 changes: 11 additions & 0 deletions docs/load-language-detection.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,17 @@
"</div>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<div class=\"alert alert-info\">\n",
"\n",
"This module trained on both standard and local (included social media) language structures, so it is save to use for both.\n",
" \n",
"</div>"
]
},
{
"cell_type": "code",
"execution_count": 1,
Expand Down
157 changes: 144 additions & 13 deletions docs/load-translation-en-ms.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -38,8 +38,8 @@
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 5.01 s, sys: 749 ms, total: 5.75 s\n",
"Wall time: 5.09 s\n"
"CPU times: user 5.26 s, sys: 1.01 s, total: 6.27 s\n",
"Wall time: 7.2 s\n"
]
}
],
Expand All @@ -60,6 +60,13 @@
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:root:tested on 77k EN-MY sentences.\n"
]
},
{
"data": {
"text/html": [
Expand All @@ -82,34 +89,38 @@
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Size (MB)</th>\n",
" <th>Quantized Size (MB)</th>\n",
" <th>BLEU</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>small</th>\n",
" <td>42.7</td>\n",
" <td>13.4</td>\n",
" <td>0.142</td>\n",
" </tr>\n",
" <tr>\n",
" <th>base</th>\n",
" <td>234.0</td>\n",
" <td>82.7</td>\n",
" <td>0.696</td>\n",
" </tr>\n",
" <tr>\n",
" <th>large</th>\n",
" <td>817.0</td>\n",
" <td>244.0</td>\n",
" <td>0.699</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Size (MB) BLEU\n",
"small 42.7 0.142\n",
"base 234.0 0.696\n",
"large 817.0 0.699"
" Size (MB) Quantized Size (MB) BLEU\n",
"small 42.7 13.4 0.142\n",
"base 234.0 82.7 0.696\n",
"large 817.0 244.0 0.699"
]
},
"execution_count": 2,
Expand Down Expand Up @@ -159,6 +170,34 @@
"transformer_large = malaya.translation.en_ms.transformer(model = 'large')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load Quantized model\n",
"\n",
"To load 8-bit quantized model, simply pass `quantized = True`, default is `False`.\n",
"\n",
"We can expect slightly accuracy drop from quantized model, and not necessary faster than normal 32-bit float model, totally depends on machine."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"WARNING:root:Load quantized model will cause accuracy drop.\n"
]
}
],
"source": [
"quantized_transformer = malaya.translation.en_ms.transformer(quantized = True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down Expand Up @@ -187,7 +226,7 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -196,7 +235,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 13,
"metadata": {},
"outputs": [
{
Expand All @@ -220,7 +259,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 14,
"metadata": {},
"outputs": [
{
Expand All @@ -246,7 +285,7 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 15,
"metadata": {
"scrolled": false
},
Expand All @@ -273,7 +312,7 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 16,
"metadata": {},
"outputs": [
{
Expand All @@ -296,7 +335,7 @@
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": 17,
"metadata": {},
"outputs": [
{
Expand All @@ -320,7 +359,7 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": 18,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -549,6 +588,44 @@
"pprint(transformer.translate([string_news1, string_news2, string_news3], beam_search = False))"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['KUALA LUMPUR 1 Julai - Datuk Seri Anwar Ibrahim tidak sesuai menjadi calon '\n",
" 'Perdana Menteri kerana beliau didakwa tidak \"popular\" dalam kalangan orang '\n",
" 'Melayu, Tun Dr Mahathir Mohamad mendakwa, bekas Perdana Menteri itu '\n",
" 'dilaporkan berkata Presiden PKR itu memerlukan seseorang seperti dirinya '\n",
" 'bagi mendapatkan sokongan daripada orang Melayu dan memenangi pilihan raya.',\n",
" '(CNN) Peguam Negara New York Letitia James pada hari Isnin memerintahkan '\n",
" 'Black Lives Matter Foundation - yang menurutnya tidak berafiliasi dengan '\n",
" 'gerakan Black Lives Matter yang lebih besar - untuk berhenti mengumpulkan '\n",
" 'sumbangan di New York. \"Saya memerintahkan Black Lives Matter Foundation '\n",
" 'untuk berhenti secara haram menerima sumbangan yang ditujukan untuk gerakan '\n",
" '#BlackLivesMatter. Yayasan ini tidak berafiliasi dengan gerakan itu, namun '\n",
" 'ia menerima banyak sumbangan dan muhibah yang ditipu,\" tweet James.',\n",
" 'Di antara inisiatif luas yang diusulkan adalah kerangka pelabelan makanan '\n",
" 'yang berkelanjutan, penyusunan semula makanan yang diproses, dan bab '\n",
" 'keberlanjutan dalam semua perjanjian perdagangan dua hala EU. EU juga '\n",
" 'berencana untuk menerbitkan proposal untuk kerangka perundangan untuk sistem '\n",
" 'makanan lestari pada tahun 2023 untuk memastikan semua makanan di pasar EU '\n",
" 'menjadi semakin lestari.']\n",
"CPU times: user 25.3 s, sys: 13.3 s, total: 38.6 s\n",
"Wall time: 10.3 s\n"
]
}
],
"source": [
"%%time\n",
"\n",
"pprint(quantized_transformer.translate([string_news1, string_news2, string_news3], beam_search = False))"
]
},
{
"cell_type": "code",
"execution_count": 12,
Expand Down Expand Up @@ -579,6 +656,36 @@
"pprint(transformer.translate([string_article1, string_article2], beam_search = False))"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['Halaman ini berkongsi artikel terbaik saya untuk dibaca mengenai topik '\n",
" 'seperti kesihatan, kebahagiaan, kreativiti, produktiviti dan banyak lagi. '\n",
" 'Soalan utama yang mendorong kerja saya adalah, \"Bagaimana kita dapat hidup '\n",
" 'lebih baik?\" Untuk menjawab soalan itu, saya suka menulis mengenai kaedah '\n",
" 'berasaskan sains untuk menyelesaikan masalah praktikal.',\n",
" 'Pemadanan kabur pada skala. Dari 3.7 jam hingga 0.2 saat. Cara melakukan '\n",
" 'pemadanan rentetan pintar dengan cara yang dapat meningkatkan bahkan set '\n",
" 'data terbesar. Data di dunia nyata tidak kemas. Berurusan dengan set data '\n",
" 'yang tidak kemas menyakitkan dan terbakar sepanjang masa yang dapat '\n",
" 'dihabiskan untuk menganalisis data itu sendiri.']\n",
"CPU times: user 17 s, sys: 9.56 s, total: 26.5 s\n",
"Wall time: 5.83 s\n"
]
}
],
"source": [
"%%time\n",
"\n",
"pprint(quantized_transformer.translate([string_article1, string_article2], beam_search = False))"
]
},
{
"cell_type": "code",
"execution_count": 13,
Expand All @@ -603,6 +710,30 @@
"pprint(transformer.translate([random_string1, random_string2], beam_search = False))"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['saya di sekolah perubatan.',\n",
" 'Emmerdale adalah album studio debut, lagu-lagu tidak dikeluarkan di A.S <> '\n",
" 'Lagu-lagu ini tidak dikeluarkan dalam edisi A.S. album tersebut dan '\n",
" 'sebelumnya tidak tersedia pada sebarang pelepasan A.S.']\n",
"CPU times: user 10.8 s, sys: 6.33 s, total: 17.1 s\n",
"Wall time: 3.63 s\n"
]
}
],
"source": [
"%%time\n",
"\n",
"pprint(quantized_transformer.translate([random_string1, random_string2], beam_search = False))"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down
Loading

0 comments on commit 9a771e5

Please sign in to comment.