A simple-but-useful kNN library for NodeJS, comparing JSON Objects using Euclidean distances, returning top k closest objects.
Supports Normalized Weighted Euclidean distances. Normalize attributes by Standard Deviation. See here.
Features key
and filter
attributes to do the data assembly for you, Lisp style!
subject: vantage point object - will consider each attribute present in this object as a feature
objects: array of objects that should all have at least the attributes of subject
options:
- k: (default = unlimited) specifies how many objects to return
- standardize: (default = false) if true, will apply standardization across all attributes using stdvs - set this to true if your attributes do not have the same scale
- weights: (default = {}) a hash describing the weights of each attribute
- key: (default = none) a key function to map over objects, to be used if the subject attributes are nested within key
e.g. if subject is {a:0} and objects are [{x: {a: 0}}, {x: {a: 2}}], then provide key: function(o) {return o.x}
- filter: (default = none) a filter function that returns true for items to be considered
e.g. to only consider objects with non-negative a: function(o) {return o.a >= 0})
- debug: (default = false) if true, for every object will return distances of individual attributes as well as the overall distance from the subject under a property called 'debug'
e.g. if subject is {a:0, b:0} and object is {a:3, b:4}, the returned object will be {a: 3, b: 4, debug: {distance:25, details: {a: 9, b: 16}}}
Given John Foo's taste for movies:
Attributes | Value | Weight |
---|---|---|
explosions | 8 | 10% |
romance | 3 | 30% |
length | 6 | 5% |
humor | 5 | 5% |
pigeons | 10 | 50% |
John Foo would like to rent a movie tonight that most closely matches his movie tastes. He collected a DB of movies with numerical values ranging from 1 to 10 for each of the 5 attributes listed above (don't ask how).
John Foo loves his pigeons. It is the most important attribute to him, hence carries 50% of the weight. He does not like romance and wants to make sure that he avoids sappy movies. Even though he likes mid-length movies with explosions and semi-funny movies, he doesn't care as much, as long as the movie features peaceful pigeons.
Perfect case for Alike!
To install and add it to your package.json
$ npm install alike --save
Now you can load up the module and use it like so:
knn = require('alike');
options = {
k: 10,
weights: {
explosions: 0.1,
romance: 0.3,
length: 0.05,
humour: 0.05,
pigeons: 0.5
}
}
movieTaste = {
explosions: 8,
romance: 3,
length: 5,
humour: 6,
pigeons: 10
}
knn(movieTaste, movies, options)
Where movies
is an array of objects that have at least those 5 attributes. Returns the top 10 movies from the array. Enjoy! :)
Alike is written in CoffeeScript in the coffee/
folder. You may use make coffee
to compile and watch for changes. Unit tests are in the coffee/test/
folder. You can run the tests with npm test
or if you are developing, you may use make watch-test
to watch while you TDD. :)
Run it with coffee benchmark/
takes about 1m on a Macbook Air.
The benchmarks are designed to reflect realistically sized sets of data. They don't ship with the npm
package to keep things light.
Alike is licensed under the terms of the GNU Lesser General Public License, known as the LGPL.