This project aims to predict spam messages in a social messaging platform named Discord.
- Purpose
- Technologies and Libraries
- Data Sources
- Setup and Installation
- Usage
- Results
- Contributors
- Future Work
The purpose of this project is to detect and filter out spam messages on Discord to improve the user experience and maintain the integrity of communication on the platform.
- Python
- Scikit-learn
- Pandas
- Numpy
- Count Vectorizer
- TFIDF Transform
- Multinomial Naive Bayes Algorithm
This dataset contains the spam and ham messages used for the purpose of spam classification. This dataset is taken from UCI machine learning repository. The dataset can be in CSV format with at least two columns: message
and label
.
-
Clone the repository:
git clone https://github.com/yourusername/discord-spam-detection.git cd discord-spam-detection
-
Create and activate a conda environment:
conda create --name discord-spam-detection python=3.8 conda activate discord-spam-detection
-
Install the required libraries:
pip install -r requirements.txt
-
Add your Discord bot token in the env file:
SECOND_BOT_TOKEN = 'your-discord-bot-token-here'
-
Train the model:
python spam.py
This script will train the Count Vectorizer, TFIDF Transform, and Multinomial Naive Bayes Algorithm on your dataset and save the model.
-
Run the bot:
python discord_bot.py
This script will start the Discord bot, which will use the trained model to predict and filter spam messages in real-time.
The performance of the spam detection model can be evaluated using metrics such as accuracy, precision, recall, and F1-score.
- Enhance the model with more advanced techniques like deep learning.
- Collect and use a larger dataset to improve accuracy.
- Implement real-time feedback and retraining mechanism.