Skip to content

hu437/dami

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

dami

Scalable algorithms in data mining.

dami is writen in Java. Our goal is to make algorithms that can handle hundreds of millions of data with a limited memory PC

Currently we have :

  • utility: Asynchronous vector buffer, High performance and simple text parser. More tests needs

  • classification: SGD for logistic regressions

  • recommendation: SlopeOne, SVD, RSVD, itemneighborhood-SVD (see movielens_converter.py)

  • significant test: swap randomization

  • graph: Pagerank.

Future:

  • similarity: simhash

2012/10/22 Release Notes:

  • L1 & L2 logistic regression
  • memory cost estimation
  • simple commandline integration for LR

2012/7/22 Release Notes:

  • Asynchronous vector buffer for dataset IO
  • High performance and simple text parser(only for digital related chars)
  • small refactoring.

2012/7/12 Release Notes:

  • code refactoring for recommendation and IO
  • To run RMSE for recommendation, you first need to see movielens_convert.py for converting and/or splitting movielens data, and see CFDataConverter and TestSVD

To achieve computation efficiency and memory utilization, two ways we have just adopted.

1: Using "id" as index of array for fetching data.

2: Only maintaining model in memory and saving data to converted bytes for IO

So it's highly recommemded you use continuous ids for the algorithms :)

My Chinese blog : http://blog.csdn.net/lgnlgn
E-mail : gnliang10 [at] 126.com

About

algorithms of data mining

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published