Udacity Machine Learning Nano Degree Capstone Project
This project is separated into 3 directories
- docs - proposal, report, and all associated images
- code - all code, including data_prep and model
- data - both raw and processed data
Begin in the code/data_prep directory. If you checked out all of the raw data, then you will not need to download any files, however there are Jupyter Notebook cells that will download the raw data if needed.
The CVEs are the first data files to be processed, fire up the cve notebook, execute the imports cell, you may skip the download cell if you have all of the required nvdcve-1.0-20xx.json files in data/raw. Otherwise, execute the cell to download them all. Run the rest of the cells to produce a data/processed/cves.json file
Next the Metasploit database needs to be processed and appended to the CVEs file. Load up the metasploit notebook, execute the imports cell and files cell like before. Skip the download if you already have raw/metasploit.json, otherwise download it. Execute the rest of the cells here to produce data/processed/cves_metasploit.json
The last data processing step is to one-hot encode the data. Load the encode notebook and execute all of the cells here to create data/processed/cves_metasploit_encoded.json
Next, run the models in the code/model directory.
Fire up the regression notebook and execute all cells. This will produce some graphics that are saved off to docs/img, as well as produce some results.
Lastly, load the decisiontree notebook and execute all cells.
This completes the 'walk-through' portion of the data notebook analysis