-
Notifications
You must be signed in to change notification settings - Fork 3
HongyuGong/Abusive-Language-Detection-Categorization
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This is the TensorFlow implementation of the paper ``Supervised Neural Attention Improves Abusive Language Detection and Categorization''. Requirements: 1. Python3 2. Numpy 3. Sklearn 4. TensorFlow 5. [CMUTweetTagger](https://github.com/ianozsvald/ark-tweet-nlp-python). Put CMUTweetTagger.py and CMUTweetTokenizer.py in src/data_util/. 6. Preprocessor (install using ``pip install tweet-preprocessor'') Running instructions: 1. Data preparation (1) Create data folder data/ and dump folder dump/ in the same level of the folder src/. (2) Download [datasets](https://drive.google.com/file/d/149nQCmB6EAflW_22wltZnjmf_EPzg2N7/view?usp=sharing). Put Anonymized_Sentences_Classified.csv and Anonymized_Comments_Categorized.csv in data/ in the same level of src/. (3)Download pre-trained word2vec embedding. (4) Tag with CMUTweetTagger 2. Preprocess datasets ``` python3 data_util/preproc_classification_data.py ''' ``` python data_util/preproc_categorization_data.py ''' Build word and part-of-speech tag vocabulary, and word embeddings ``` python3 data_util/build_vocab_emb.py --wiki_embed_fn [EMBED_FN] ''' [EMBED_FN]: the path of downloaded word2vec embedding. 3. Train abusive language classifier ``` CUDA_VISIBLE_DEVICES=0 python3 abuse_classification/train.py ''' Test abusive language classifier ``` CUDA_VISIBLE_DEVICES=0 python3 abuse_classification/test.py ''' 4. Train abusive language categorizer ``` CUDA_VISIBLE_DEVICES=0 python3 abuse_categorization/train.py ''' Test abusive language categorizer ``` CUDA_VISIBLE_DEVICES=0 python3 abuse_categorization/test.py '''
About
Abusive Language Detection and Categorization
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published