Skip to content

Latest commit

 

History

History
80 lines (60 loc) · 4.37 KB

README.markdown

File metadata and controls

80 lines (60 loc) · 4.37 KB

Classifier Reborn

Gem Version Build Status

Getting Started

Classifier Reborn is a general classifier module to allow Bayesian and other types of classifications. It is a fork of cardmagic/classifier under more active development. Currently, it has Bayesian Classifier and Latent Semantic Indexer (LSI) implemented.

Here is a quick illustration of the Bayesian classifier.

$ gem install classifier-reborn
$ irb
irb(main):001:0> require 'classifier-reborn'
irb(main):002:0> classifier = ClassifierReborn::Bayes.new 'Ham', 'Spam'
irb(main):003:0> classifier.train "Ham", "Sunday is a holiday. Say no to work on Sunday!"
irb(main):004:0> classifier.train "Spam", "You are the lucky winner! Claim your holiday prize."
irb(main):005:0> classifier.classify "What's the plan for Sunday?"
#=> "Ham"

Now, let's build an LSI, classify some text, and find a cluster of related documents.

irb(main):006:0> lsi = ClassifierReborn::LSI.new
irb(main):007:0> lsi.add_item "This text deals with dogs. Dogs.", :dog
irb(main):008:0> lsi.add_item "This text involves dogs too. Dogs!", :dog
irb(main):009:0> lsi.add_item "This text revolves around cats. Cats.", :cat
irb(main):010:0> lsi.add_item "This text also involves cats. Cats!", :cat
irb(main):011:0> lsi.add_item "This text involves birds. Birds.", :bird
irb(main):012:0> lsi.classify "This text is about dogs!"
#=> :dog
irb(main):013:0> lsi.find_related("This text is around cats!", 2)
#=> ["This text revolves around cats. Cats.", "This text also involves cats. Cats!"]

There is much more that can be done using Bayes and LSI beyond these quick examples. For more information read the following documentation topics.

Notes on JRuby support

gem 'classifier-reborn-jruby', platforms: :java

While experimental, this gem should work on JRuby without any kind of additional changes. Unfortunately, you will not be able to use C bindings to GNU/GSL or similar performance-enhancing native code. Additionally, we do not use fast_stemmer, but rather an implementation of the Porter Stemming algorithm. Stemming will differ between MRI and JRuby, however you may choose to disable stemming and do your own manual preprocessing (or use some other popular Java library).

If you encounter a problem, please submit your issue with [JRuby] in the title.

Code of Conduct

In order to have a more open and welcoming community, Classifier Reborn adheres to the Jekyll code of conduct adapted from the Ruby on Rails code of conduct.

Please adhere to this code of conduct in any interactions you have in the Classifier community. If you encounter someone violating these terms, please let Chase Gilliam know and we will address it as soon as possible.

Authors and Contributors

The Classifier Reborn library is released under the terms of the GNU LGPL-2.1.