2014-07-22, Scott Hendrickson
A short and basic introduction to the ideas and mathematics behind Naive Bayes Classifiers and a short example of document classification with the the sklearn
API interface.
This session was built using:
- Python 2.7
- IPython 1.2
- sklearn 0.14
The capability of the full sklearn
package is pretty mind-blowing; this Notebook aims for the lowest hanging fruit, because the same framework is used for the advanced use-cases. This is certainly one of the strengths of sklearn
. Note that these materials do not go into explaining what the various estimators are doing or how the algorithm works. For those discussions, definitely see the other materials in this repository and the official documentation.
The TfIdf vectorizer from is powerful and conveinent and used here with no explanation. tfidf
The Wikipedia article on Naive Bayes Classifiers is pretty solid. Wikipedia
If you want to explore the IPython Notebook without running Python on your own machine, you can also view it at nbviewer.
Enjoy!