stockclassifier

Classification: James Thornton and Dylan Hurwitz Web Scraping: Thomas Thornton

This is a Naive Bayes text classifier. It classifies the Management Discussion and Analysis of a company's 10-Q, a quarterly report filed with the SEC.

To run the code, start the Python shell, import evaluation, and run eval("training_set.data"). This will split the training data (10-Qs for the Dow Jones Industrial Average) into five parts, and use cross-validation to gauge the accuracy of the classifier (training on four parts, testing on the fifth, and rotating the test segment until all possibilities have been measured). It will then generate nine more splits, do the same on each, and return the average accuracy from all of these results. Generally, we get about 65% accuracy for the basic classifier.

Note that holds are treated somewhat ficticiously, as no documents are pre-classified as "HOLD," but rather the classifier is allowed to "not bet" on reports that it is unsure of, and this is the purpose of the hold classification.

Feel free, also to modify the list of tickers at the end of scraper.py and use it to generate new training sets.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
BayesClassifier.py		BayesClassifier.py
BayesClassifierHold.py		BayesClassifierHold.py
DataReader.py		DataReader.py
README.md		README.md
evaluation.py		evaluation.py
scraper.py		scraper.py
training_set.data		training_set.data

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

stockclassifier

About

Releases

Packages

Languages

jamespeterthornton/stockclassifier

Folders and files

Latest commit

History

Repository files navigation

stockclassifier

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages