Skip to content

Commit

Permalink
release version 1.7, added word-mover distance, text similarity and etc
Browse files Browse the repository at this point in the history
  • Loading branch information
huseinzol05 committed Feb 15, 2019
1 parent ed0dee5 commit 8057d61
Show file tree
Hide file tree
Showing 79 changed files with 42,637 additions and 7,514 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,5 @@ malaya/__pycache__
docs/_build
docs/_static
docs/_templates
siamese
skipthought
10 changes: 5 additions & 5 deletions docs/Api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -103,19 +103,19 @@ malaya.summarize
.. automodule:: malaya.summarize
:members:

malaya.topics_influencer
malaya.similarity
-------------------------

.. automodule:: malaya.topic_influencer
.. automodule:: malaya.similarity
:members:

.. autoclass:: malaya.topic_influencer._DEEP_SIAMESE_SIMILARITY()
.. autoclass:: malaya.similarity._DEEP_SIAMESE_SIMILARITY()
:members:

.. autoclass:: malaya.topic_influencer._DEEP_SIMILARITY()
.. autoclass:: malaya.similarity._DEEP_SIMILARITY()
:members:

.. autoclass:: malaya.topic_influencer._FAST_SIMILARITY()
.. autoclass:: malaya.similarity._FAST_SIMILARITY()
:members:

malaya.topic_model
Expand Down
18 changes: 17 additions & 1 deletion docs/Dataset.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,8 @@ Total size: 8.5 MB
`Gender <https://github.com/huseinzol05/Malaya-Dataset/blob/master/gender>`__
-----------------------------------------------------------------------------

Total size: 2.2 MB

1. Unknown
2. Male
3. Female
Expand Down Expand Up @@ -153,7 +155,7 @@ Total size: 496 KB
`Sentiment Twitter <https://github.com/huseinzol05/Malaya-Dataset/blob/master/twitter-sentiment>`__
---------------------------------------------------------------------------------------------------

Total size: 27.4 MB
Total size: 50.6 MB

1. Positive
2. Negative
Expand Down Expand Up @@ -226,6 +228,20 @@ Total size: 1.4 MB
1. Positive
2. Negative

`Toxicity <https://github.com/huseinzol05/Malaya-Dataset/blob/master/toxicity>`__
-----------------------------------------------------------------------------------------

Total size: 70 MB

Toxicity is multilabel, prefer to use sigmoid based.

1. toxic
2. severe toxic
3. obscene
4. threat
5. insult
6. identity hate

`Subtitle <https://github.com/huseinzol05/Malaya-Dataset/blob/master/subtitle>`__
---------------------------------------------------------------------------------

Expand Down
6 changes: 3 additions & 3 deletions docs/Topics.rst → docs/Mover.rst
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
Topics & Influencers Analysis
Word-Mover Distance
==============================

.. note::

This tutorial is available as an IPython notebook
`here <https://github.com/huseinzol05/Malaya/tree/master/example/topics-influencers>`_.
`here <https://github.com/huseinzol05/Malaya/tree/master/example/word-mover>`_.

.. include:: load-topics-influencers.rst
.. include:: load-word-mover-distance.rst
9 changes: 9 additions & 0 deletions docs/Similarity.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Text Similarity
==============================

.. note::

This tutorial is available as an IPython notebook
`here <https://github.com/huseinzol05/Malaya/tree/master/example/similarity>`_.

.. include:: load-similarity.rst
3 changes: 2 additions & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,15 +31,16 @@ Contents:
Num2word
Pos
Sentiment
Similarity
Spell
Stack
Stemmer
Subjective
Summarization
Topic
Topics
Toxic
Word2vec
Mover
Cluster
Api
Reference
136 changes: 68 additions & 68 deletions docs/load-emotion.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@
.. parsed-literal::
CPU times: user 10.4 s, sys: 640 ms, total: 11 s
Wall time: 11 s
CPU times: user 12 s, sys: 1.41 s, total: 13.4 s
Wall time: 17.1 s
.. code:: python
Expand Down Expand Up @@ -43,7 +43,7 @@ Load multinomial model
.. parsed-literal::
anger
{'anger': 0.27993946463423486, 'fear': 0.1482931513658756, 'joy': 0.1880009584798728, 'love': 0.21711876657658918, 'sadness': 0.1296730712078804, 'surprise': 0.03697458773554805}
{'anger': 0.30367763926253094, 'fear': 0.16709964152193366, 'joy': 0.17026521921403184, 'love': 0.18405977732934192, 'sadness': 0.1388341895665479, 'surprise': 0.03606353310561458}
Expand Down Expand Up @@ -73,31 +73,31 @@ Load xgb model
.. parsed-literal::
love
{'anger': 0.21755809, 'fear': 0.090371706, 'joy': 0.13347618, 'love': 0.47302967, 'sadness': 0.0770047, 'surprise': 0.008559667}
{'anger': 0.22918181, 'fear': 0.089252785, 'joy': 0.1318236, 'love': 0.46476611, 'sadness': 0.07200217, 'surprise': 0.012973559}
.. parsed-literal::
[{'anger': 0.21755809,
'fear': 0.090371706,
'joy': 0.13347618,
'love': 0.47302967,
'sadness': 0.0770047,
'surprise': 0.008559667},
[{'anger': 0.22918181,
'fear': 0.089252785,
'joy': 0.1318236,
'love': 0.46476611,
'sadness': 0.07200217,
'surprise': 0.012973559},
{'anger': 0.013483193,
'fear': 0.939588,
'joy': 0.01674833,
'love': 0.003220023,
'sadness': 0.022906518,
'surprise': 0.0040539484},
{'anger': 0.09142393,
'fear': 0.029400537,
'joy': 0.78257465,
'love': 0.02881839,
'sadness': 0.058004435,
'surprise': 0.009778041},
{'anger': 0.10506946,
'fear': 0.025150253,
'joy': 0.725915,
'love': 0.05211037,
'sadness': 0.078554265,
'surprise': 0.013200594},
{'anger': 0.11640434,
'fear': 0.097485565,
'joy': 0.24893147,
Expand All @@ -110,12 +110,12 @@ Load xgb model
'love': 0.022184724,
'sadness': 0.41255626,
'surprise': 0.006135965},
{'anger': 0.0714585,
'fear': 0.19790031,
'joy': 0.037659157,
'love': 0.0025473926,
'sadness': 0.00772799,
'surprise': 0.6827066}]
{'anger': 0.07513438,
'fear': 0.2525073,
'joy': 0.024355419,
'love': 0.002638406,
'sadness': 0.0059716892,
'surprise': 0.6393928}]
Expand Down Expand Up @@ -167,27 +167,27 @@ List available deep learning models
Testing fast-text model
love
['love', 'fear', 'joy', 'love', 'sadness', 'surprise']
[{'anger': 2.978304e-06, 'fear': 1.8461518e-10, 'joy': 1.0204276e-09, 'love': 0.999997, 'sadness': 1.3693535e-09, 'surprise': 2.6386826e-09}, {'anger': 1.2210384e-18, 'fear': 1.0, 'joy': 1.0015556e-19, 'love': 1.8750202e-24, 'sadness': 6.976661e-21, 'surprise': 3.2600536e-15}, {'anger': 2.47199e-19, 'fear': 2.3032567e-22, 'joy': 1.0, 'love': 5.1478095e-14, 'sadness': 4.464682e-20, 'surprise': 1.588908e-15}, {'anger': 4.1249185e-11, 'fear': 1.7474476e-10, 'joy': 0.00022258118, 'love': 0.9997774, 'sadness': 1.6592432e-11, 'surprise': 4.1854236e-09}, {'anger': 4.3972154e-08, 'fear': 2.1118221e-06, 'joy': 3.4898858e-07, 'love': 4.5489975e-12, 'sadness': 0.9999975, 'surprise': 4.8414757e-09}, {'anger': 1.1130476e-23, 'fear': 0.0003273876, 'joy': 5.694222e-17, 'love': 1.9363045e-25, 'sadness': 1.4252974e-26, 'surprise': 0.99967265}]
[{'anger': 2.538603e-07, 'fear': 4.1372344e-13, 'joy': 1.0892472e-08, 'love': 0.99999976, 'sadness': 3.8994935e-16, 'surprise': 2.439655e-08}, {'anger': 4.4489467e-24, 'fear': 1.0, 'joy': 1.3903143e-28, 'love': 1.7920514e-33, 'sadness': 1.01771616e-26, 'surprise': 6.799581e-18}, {'anger': 9.583714e-26, 'fear': 1.5029816e-24, 'joy': 1.0, 'love': 3.7527533e-13, 'sadness': 8.348174e-24, 'surprise': 2.080897e-16}, {'anger': 1.7409228e-13, 'fear': 3.2279754e-12, 'joy': 0.0005876841, 'love': 0.9994123, 'sadness': 1.8902605e-11, 'surprise': 9.9256076e-11}, {'anger': 1.2737708e-11, 'fear': 5.882562e-10, 'joy': 9.112171e-13, 'love': 7.7659496e-20, 'sadness': 1.0, 'surprise': 1.6035637e-16}, {'anger': 5.5730725e-37, 'fear': 0.16033638, 'joy': 1.2999706e-30, 'love': 0.0, 'sadness': 0.0, 'surprise': 0.8396636}]
Testing hierarchical model
joy
anger
['anger', 'fear', 'joy', 'joy', 'sadness', 'joy']
[{'anger': 0.39431405, 'fear': 0.13933083, 'joy': 0.17727984, 'love': 0.042310942, 'sadness': 0.22523886, 'surprise': 0.021525377}, {'anger': 0.004958992, 'fear': 0.9853917, 'joy': 0.006676573, 'love': 0.00023657709, 'sadness': 0.0017484307, 'surprise': 0.0009877522}, {'anger': 0.0013627211, 'fear': 0.0017271177, 'joy': 0.986464, 'love': 0.0039458317, 'sadness': 0.0021411367, 'surprise': 0.0043591294}, {'anger': 0.028909639, 'fear': 0.09853578, 'joy': 0.50412154, 'love': 0.26376858, 'sadness': 0.084195614, 'surprise': 0.02046885}, {'anger': 0.022849305, 'fear': 0.011993612, 'joy': 0.008679014, 'love': 0.002472554, 'sadness': 0.9502534, 'surprise': 0.003752149}, {'anger': 0.015510161, 'fear': 0.0571924, 'joy': 0.5819401, 'love': 0.21683867, 'sadness': 0.006425157, 'surprise': 0.12209346}]
[{'anger': 0.22394963, 'fear': 0.35022292, 'joy': 0.19895941, 'love': 0.013231089, 'sadness': 0.20033234, 'surprise': 0.013304558}, {'anger': 0.0056565125, 'fear': 0.9885886, 'joy': 0.0034398232, 'love': 0.00018917819, 'sadness': 0.0012037805, 'surprise': 0.00092218135}, {'anger': 0.01764421, 'fear': 0.01951682, 'joy': 0.8797468, 'love': 0.041130837, 'sadness': 0.013527576, 'surprise': 0.028433735}, {'anger': 0.028772388, 'fear': 0.07343067, 'joy': 0.48502314, 'love': 0.28668693, 'sadness': 0.10576224, 'surprise': 0.020324599}, {'anger': 0.021873059, 'fear': 0.014633018, 'joy': 0.01073073, 'love': 0.0012993184, 'sadness': 0.94936466, 'surprise': 0.0020992015}, {'anger': 0.020028168, 'fear': 0.17150529, 'joy': 0.3734562, 'love': 0.19241562, 'sadness': 0.008164915, 'surprise': 0.23442967}]
Testing bahdanau model
love
['love', 'fear', 'joy', 'love', 'sadness', 'surprise']
[{'anger': 0.44805261, 'fear': 0.18378404, 'joy': 0.02516251, 'love': 0.30925235, 'sadness': 0.027497768, 'surprise': 0.0062507084}, {'anger': 0.0010828926, 'fear': 0.9789995, 'joy': 0.0027138714, 'love': 0.00061593985, 'sadness': 0.0048968275, 'surprise': 0.011690898}, {'anger': 0.012288661, 'fear': 0.0025563037, 'joy': 0.85003525, 'love': 0.12451392, 'sadness': 0.0008497203, 'surprise': 0.009756153}, {'anger': 0.02319879, 'fear': 0.031080244, 'joy': 0.14820175, 'love': 0.7294624, 'sadness': 0.021997027, 'surprise': 0.046059813}, {'anger': 0.031083692, 'fear': 0.035790402, 'joy': 0.01741525, 'love': 0.00062268815, 'sadness': 0.9130492, 'surprise': 0.0020387478}, {'anger': 0.00159852, 'fear': 0.34762463, 'joy': 0.04318491, 'love': 0.0028805388, 'sadness': 0.00093575486, 'surprise': 0.6037757}]
['anger', 'fear', 'joy', 'love', 'sadness', 'surprise']
[{'anger': 0.53818357, 'fear': 0.14104106, 'joy': 0.010708541, 'love': 0.2570674, 'sadness': 0.047102023, 'surprise': 0.005897305}, {'anger': 0.0005677081, 'fear': 0.9770825, 'joy': 0.005677423, 'love': 0.0007302013, 'sadness': 0.0017472907, 'surprise': 0.014194911}, {'anger': 0.06975506, 'fear': 0.0069800974, 'joy': 0.5717373, 'love': 0.30618504, 'sadness': 0.011454151, 'surprise': 0.033888407}, {'anger': 0.0038130684, 'fear': 0.0053994465, 'joy': 0.10317592, 'love': 0.8656706, 'sadness': 0.0056833136, 'surprise': 0.016257582}, {'anger': 0.01122868, 'fear': 0.019208057, 'joy': 0.0024597098, 'love': 0.0002851458, 'sadness': 0.965973, 'surprise': 0.00084543176}, {'anger': 0.00083102344, 'fear': 0.23240082, 'joy': 0.033536877, 'love': 0.0011026214, 'sadness': 0.00037630452, 'surprise': 0.7317524}]
Testing luong model
love
['love', 'fear', 'joy', 'love', 'sadness', 'fear']
[{'anger': 0.044591118, 'fear': 0.063305356, 'joy': 0.33247164, 'love': 0.5347649, 'sadness': 0.0068765697, 'surprise': 0.017990304}, {'anger': 0.0064159264, 'fear': 0.9606779, 'joy': 0.012426791, 'love': 0.0013584964, 'sadness': 0.008015306, 'surprise': 0.011105636}, {'anger': 0.0036163705, 'fear': 5.7273093e-05, 'joy': 0.98739016, 'love': 0.0076421387, 'sadness': 0.00028883366, 'surprise': 0.0010052109}, {'anger': 0.017377134, 'fear': 0.0073309895, 'joy': 0.07374035, 'love': 0.3433876, 'sadness': 0.5455663, 'surprise': 0.012597541}, {'anger': 0.0007876828, 'fear': 0.0009606754, 'joy': 9.633098e-05, 'love': 0.00014691186, 'sadness': 0.9978861, 'surprise': 0.00012229013}, {'anger': 0.00045764598, 'fear': 0.37070635, 'joy': 0.0005788357, 'love': 0.00027592952, 'sadness': 0.00033797708, 'surprise': 0.6276433}]
['joy', 'fear', 'joy', 'sadness', 'sadness', 'surprise']
[{'anger': 0.057855386, 'fear': 0.040447887, 'joy': 0.29915547, 'love': 0.5720974, 'sadness': 0.00927453, 'surprise': 0.02116932}, {'anger': 0.0063275485, 'fear': 0.9673098, 'joy': 0.0065225014, 'love': 0.0008387138, 'sadness': 0.00706696, 'surprise': 0.011934649}, {'anger': 0.0014677589, 'fear': 0.0020899512, 'joy': 0.88741183, 'love': 0.076111265, 'sadness': 0.0038936164, 'surprise': 0.029025558}, {'anger': 0.013268307, 'fear': 0.0035831807, 'joy': 0.056010414, 'love': 0.21701123, 'sadness': 0.69225526, 'surprise': 0.017871574}, {'anger': 0.0018013288, 'fear': 0.0012173079, 'joy': 5.611221e-05, 'love': 9.00831e-05, 'sadness': 0.9967213, 'surprise': 0.000113809925}, {'anger': 0.00015200193, 'fear': 0.36670414, 'joy': 0.0003732592, 'love': 0.00011813393, 'sadness': 0.000118975, 'surprise': 0.63253355}]
Testing bidirectional model
surprise
['anger', 'anger', 'anger', 'anger', 'anger', 'fear']
[{'anger': 0.613231, 'fear': 0.21215951, 'joy': 0.00012107872, 'love': 0.007714424, 'sadness': 0.0029091935, 'surprise': 0.16386479}, {'anger': 0.7650685, 'fear': 0.12844206, 'joy': 0.00046135965, 'love': 0.0025065169, 'sadness': 0.012999088, 'surprise': 0.09052232}, {'anger': 0.7017255, 'fear': 0.12622964, 'joy': 0.00019186054, 'love': 0.0041279723, 'sadness': 0.0051922314, 'surprise': 0.16253278}, {'anger': 0.83330584, 'fear': 0.099247426, 'joy': 0.0007255099, 'love': 0.0023077168, 'sadness': 0.016625375, 'surprise': 0.047788195}, {'anger': 0.77445495, 'fear': 0.11811776, 'joy': 0.00019311535, 'love': 0.002333317, 'sadness': 0.004926041, 'surprise': 0.09997472}, {'anger': 0.28467438, 'fear': 0.3107746, 'joy': 0.0009574863, 'love': 0.039786864, 'sadness': 0.0549624, 'surprise': 0.3088443}]
love
['fear', 'fear', 'anger', 'joy', 'sadness', 'surprise']
[{'anger': 0.031539902, 'fear': 0.44634053, 'joy': 0.0022038615, 'love': 0.24390388, 'sadness': 0.00030186496, 'surprise': 0.27570996}, {'anger': 0.0028205896, 'fear': 0.9787958, 'joy': 0.016622344, 'love': 0.00041048063, 'sadness': 0.0004424488, 'surprise': 0.00090834824}, {'anger': 0.4523394, 'fear': 0.32489082, 'joy': 0.04712723, 'love': 0.01679146, 'sadness': 0.039135754, 'surprise': 0.1197153}, {'anger': 0.04196525, 'fear': 0.08604635, 'joy': 0.65291435, 'love': 0.049389884, 'sadness': 0.077201255, 'surprise': 0.09248292}, {'anger': 0.06327597, 'fear': 0.058998022, 'joy': 0.041568566, 'love': 0.002343863, 'sadness': 0.8224733, 'surprise': 0.011340328}, {'anger': 1.5136379e-05, 'fear': 0.002162331, 'joy': 3.5301118e-06, 'love': 0.006482973, 'sadness': 2.4173462e-06, 'surprise': 0.99133366}]
Testing bert model
anger
Expand Down Expand Up @@ -367,39 +367,39 @@ will try to evolve it.
.. parsed-literal::
[{'anger': 0.055561937,
'fear': 0.034661848,
'joy': 0.20765074,
'love': 0.65774184,
'sadness': 0.0210206,
'surprise': 0.023363067},
{'anger': 1.5065236e-05,
'fear': 0.9998666,
'joy': 6.3056427e-06,
'love': 2.9068442e-06,
'sadness': 3.6798014e-05,
'surprise': 7.235542e-05},
{'anger': 0.00097060547,
'fear': 5.1922354e-05,
'joy': 0.99052715,
'love': 0.0024538564,
'sadness': 0.0005109437,
'surprise': 0.005485538},
{'anger': 0.00014133049,
'fear': 0.0004463539,
'joy': 0.12486383,
'love': 0.87307847,
'sadness': 0.0013382707,
'surprise': 0.0001317923},
{'anger': 0.0077239843,
'fear': 0.014800851,
'joy': 0.008525367,
'love': 0.0013007816,
'sadness': 0.9655128,
'surprise': 0.0021361646},
{'anger': 0.0003960413,
'fear': 0.6634573,
'joy': 0.0014801685,
'love': 0.00056572456,
'sadness': 0.000516784,
'surprise': 0.33358407}]
[{'anger': 0.07479232,
'fear': 0.012134718,
'joy': 0.034137156,
'love': 0.85221285,
'sadness': 0.006336733,
'surprise': 0.020386234},
{'anger': 1.6892743e-08,
'fear': 0.99999964,
'joy': 6.260633e-08,
'love': 3.2111713e-10,
'sadness': 3.542872e-08,
'surprise': 2.2207877e-07},
{'anger': 0.00012469916,
'fear': 9.6892345e-06,
'joy': 0.9917463,
'love': 0.006561422,
'sadness': 0.00040069615,
'surprise': 0.0011572224},
{'anger': 5.0021445e-05,
'fear': 0.0010109642,
'joy': 0.049688663,
'love': 0.94577587,
'sadness': 0.0032941191,
'surprise': 0.00018034693},
{'anger': 0.0010146926,
'fear': 0.00020020001,
'joy': 5.2909185e-05,
'love': 2.640257e-06,
'sadness': 0.99870074,
'surprise': 2.8823646e-05},
{'anger': 0.0057854424,
'fear': 0.8317998,
'joy': 0.017287944,
'love': 0.008883897,
'sadness': 0.0070799366,
'surprise': 0.12916291}]
Binary file modified docs/load-emotion_files/load-emotion_14_0.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/load-emotion_files/load-emotion_18_0.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 8057d61

Please sign in to comment.