This project is part of the Data Science Nanodegree Program by Udacity. The goal of this project is to analyze disaster data from Figure Eight to build a model for an API that classifies disaster messages.
-
ETL Pipeline:
- Loads the messages and categories datasets
- Merges the two datasets
- Cleans the data
- Stores it in a SQLite database
-
ML Pipeline:
- Loads data from the SQLite database
- Splits the dataset into training and test sets
- Builds a text processing and machine learning pipeline
- Trains and tunes a model using GridSearchCV
- Outputs the results on the test set
- Exports the final model as a pickle file
-
Flask Web App:
- Data visualization using Plotly
- Classify messages in real-time
- app
- template
- master.html # main page of web app
- go.html # classification result page of web app
- run.py # Flask file that runs the app
- template
- data
- disaster_categories.csv # data to process
- disaster_messages.csv # data to process
- process_data.py
- DisasterResponse.db # database to save clean data to
- models
- train_classifier.py
- classifier.pkl # saved model
- README.md
-
Run the following commands in the project's root directory to set up your database and model.
- To run ETL pipeline that cleans data and stores in database
python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db
- To run ML pipeline that trains classifier and saves
python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl
- To run ETL pipeline that cleans data and stores in database
-
Run the following command in the app's directory to run your web app.
python run.py
-
Go to http://0.0.0.0:3001/
- Udacity
- Figure Eight
- Stack Overflow
- Flask
- Plotly
- Scikit-learn
- SQLite
- Pandas
- NumPy
- NLTK
- SQLAlchemy
- Joblib
- Matplotlib
- Seaborn
- Pickle
- Jupyter Notebook
- Python
- HTML
- CSS
- JavaScript
- Bootstrap
This project is licensed under the MIT License - see the LICENSE file for details.
An Dinh Ngoc - andythetechnerd03