Naive Reverend

The Naive Reverend is an HTTP service for Bayesian textual classification using a bag of words model. Despite their simplicity, bag-of-words models can perform remarkably well for tasks like spam filtering and sentiment detection. They're fast and easy to implement. And maybe above all, writing one from scratch is a good way to drill Bayes' Theorem into your head and be forced to wrestle with some of the subtleties of floating point math on a computer.

About the name

In addition to being a statistician and philosopher, Thomas Bayes was a Presbyterian minister. In a bag of words classifier, we make the "naive" assumption that all features, in our case, words, are conditionally independent. In other words the probability of a word occuring in a class is independent of the words around it. Of course, that isn't true, but we can still build pretty good classifiers if we let ourselves make that assumption. We can then evaluate the accuracy of the classifier using a hold out test set.

Endpoints

/classify

/train

Store backends

Redis

In-memory

LevelDB

Should I use it?

Probably not. Aside from the fact that it has almost no tests, you can likely get much more accurate classification with a backoff language model using a library like kenlm, berkeleylm, or irstlm. All of these libraries use data structures that have been highly optimized for read performance and space efficiency. But they're significantly more expensive to update and retrain than just incrementing counts in a key value store.

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
app		app
corpus		corpus
counter		counter
distribution		distribution
model		model
scripts		scripts
store		store
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Naive Reverend

About the name

Endpoints

/classify

/train

Store backends

Redis

In-memory

LevelDB

Should I use it?

About

Releases

Packages

Languages

goldenberg/naive_reverend

Folders and files

Latest commit

History

Repository files navigation

Naive Reverend

About the name

Endpoints

/classify

/train

Store backends

Redis

In-memory

LevelDB

Should I use it?

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages