This project uses natural language processing (NLP) and machine learning to classify social media messages for disaster events.
Objectives: (1) build a command line application that creates messages classifier based on input dataset; and (2) build a web application that visualizes the dataset and predicts categories of a new input message.
Dataset: The dataset of social media messages for disaster events and 36 categories was given by Udacity Nanodegree program in partner with Figure Eight.
-
app
template
master.html
: homepage of web appgo.html
: classification result page of web app
run.py
: Flask file that runs web apputils.py
: customized functions and transformers that supportrun.py
-
data
disaster_categories.csv
: message categories data to processdisaster_messages.csv
: messages data to processprocess_data.py
: ETL pipeline to clean data and store data into SQLite databaseETL Pipeline Preparation.ipynb
: ETL pipeline prepared in a notebookDisasterResponse.db
: database of cleaned data
-
models
train_classifier.py
: machine learning pipeline to train a classification model and store the trained model into a pickle fileML Pipeline Preparation.ipynb
: machine learning pipeline prepared in a notebookutils.py
: customized functions and transformers that supporttrain_classifier.py
classifier.pkl
: pickle file of trained model
-
README.md
-
Command line application
Run the following commands in the project's root directory to set up the database and model.
- To run ETL pipeline that cleans data and stores in database
python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db
. - To run machine learning pipeline that trains classifier and saves
python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl
.
- To run ETL pipeline that cleans data and stores in database
-
Web application
Run the following command in the
app
directory to launch the web app:python run.py
.
Go to http://0.0.0.0:3001/.
The code was developed using the Anaconda distribution of Python version 3.6. The following dependencies were used.
pandas
sqlalchemy
nltk
sklearn
plotly
spacy
(en_core_web_sm
and en_core_web_lg
)
flask