Releases: mesolitica/malaya
Releases · mesolitica/malaya
Version 4.0
- Added quantized models to all Malaya models, reduce inference time by 2x and model size by 4x.
- Retrain constituency parsing, improved accuracy slightly by ~1-2%.
- Added vectorization interface for sentence / word level for all classification models.
Version 3.8.1
- Released constituency parsing
Version 3.8
- Improved spelling correction.
- Improved normalizer.
- Improved EN-MS translation, now support longer texts and US style texts.
Version 3.7
- Added translation EN to MS and MS to EN modules.
- Added paraphrase module.
- Added keyword extraction module.
Version 3.4
release 3.4
Version 2.7
- BERT-Bahasa interface available.
- Added BERT-Multilanguage, BERT-Base and BERT-small for emotion analysis.
- Added BERT-Multilanguage, BERT-Base and BERT-small for Naming Entity Recognition.
- Added BERT-Multilanguage, BERT-Base and BERT-small for Part-Of-Speech.
- Added BERT-Multilanguage and BERT-Base for relevancy analysis.
- Added BERT-Multilanguage, BERT-Base and BERT-small for sentiment analysis.
- Added encoder interface for text similarity, can use skip-thought / BERT / XLNET as encoder model.
- Added tree plot visualization for text similarity.
- Added BERT-Multilanguage, BERT-Base and BERT-small for subjectivity analysis.
- Added encoder interface for text summarization, can use skip-thought / BERT / XLNET as encoder model.
- Added BERT / XLNET interface for topic modeling.
- Added BERT-Multilanguage, BERT-Base and BERT-small for toxicity analysis.
- Remove siamese models for text similarity.
- Remove fast-text-char models, replace by BERT model.
- Malaya no longer support training interface.
- XLNET-Bahasa interface available.
- Sequence models now no longer improve by Malaya, we move on using Attention model.
Version 2.6
- Added deep siamese network, https://malaya.readthedocs.io/en/latest/Similarity.html#deep-siamese-network.
- Added BERT deep siamese network, https://malaya.readthedocs.io/en/latest/Similarity.html#bert-model
- Added Doc2Vec to calculate semantic similarity, https://malaya.readthedocs.io/en/latest/Similarity.html#calculate-similarity-using-doc2vec
- Now all extractive summarization is use TextRank algorithm as scoring algorithm.
- Added Doc2Vec for extractive summarization, https://malaya.readthedocs.io/en/latest/Summarization.html#load-doc2vec-summarization
Version 2.4
- Added relevancy analysis, to study an article or a piece of text is relevant, tendency to become a fake news. https://malaya.readthedocs.io/en/latest/Relevancy.html
- Added visualization dashboard for emotion analysis, relevancy analysis, sentiment analysis, subjectivity analysis and toxicity analysis. Very easy to use, call
predict_words
function and it will popup. - Added neutral class for relevancy analysis, sentiment analysis and subjectivity analysis.
- Use Malaya preprocessing for all deep learning models classification.
Version 1.9
- Fix some english loading bugs
- Added clustering visualization, https://malaya.readthedocs.io/en/latest/Cluster.html
- Added text augmentation, https://malaya.readthedocs.io/en/latest/Generator.html
- Normalizer and Spelling now able to detect english words.
Version 1.7
- Added text similarity and released partial topics related, https://malaya.readthedocs.io/en/latest/Similarity.html
- Added word-mover distance interface, https://malaya.readthedocs.io/en/latest/Mover.html
- Added pretrained fast-text based on wikipedia, https://malaya.readthedocs.io/en/latest/Fasttext.html
- Improve sentiment analysis, trained on more than 800k sentences and more sensitive towards social media texts.
- Remove n-grams for all fast-text models to reduce dimension curse.
- Remove sparse limit for all fast-text-char models to improve n-grams sensitivity.