This repository is an implementation of "Short Text Clustering with A Deep Multi-Embedded Self-Supervised Model ". The implementation is based DEC-keras and SIFAuto.
conda install --yes --file requirements.txt
We release the data of stackoverflow now. The word2vec embedding is from STCC . The Sbert embedding is calculated by us , shown in stackoverflow.npy.
We use four datasets, which are stackoverflow, SerchSnippets, Tweet89 and 20ngnewsshort. Our data including different embeddings will be released.
python DMESSM.py --dataset stackoverflow -- maxiter 2600 --ae_weights data/stackoverflow/results/ae_weights.hs --save_dir data/stackoverflow/results
We release the complete code!