Skip to content
This repository has been archived by the owner on Oct 8, 2019. It is now read-only.
Makoto YUI edited this page Jan 7, 2016 · 121 revisions

Welcome to Hivemall, the scalable machine learning library for Hive.

logo

Getting Started

Tips for Effective Hivemall

Advanced topics

General Hive/Hadoop tips

Troubleshooting


Feature Engineering

Feature Transformation

Evaluation

  1. Statistical evaluation of a prediction model

Dataset generation

  1. classification/logistic regression

Binary Classification

a9a binary classification

  1. Logistic Regression
  2. Logistic Regression w/ Mini-batch Gradient Descent
  3. Iterative training using distributed cache

news20 binary classification

  1. Perceptron, Passive Aggressive
  2. CW, AROW, SCW
  3. AdaGradRDA, AdaGrad, AdaDelta

KDD2010a/b binary classification

  1. PA/CW/AROW/SCW
  1. AROW

Webspam binary classification

  1. PA1,AROW,SCW

Titanic survivor binary classification

  1. RandomForest

Multiclass Classification

news20 multiclass classification

  1. PA
  2. CW, AROW, SCW
  3. Ensemble learning
  4. one vs the rest classifier

Iris dataset multiclass classification

  1. SCW
  2. RandomForest

Regression

E2006 tfidf regression

  1. Passive Aggressive, AROW

KDDCup 2012 track 2 CTR prediction

  1. Logistic Regression, Passive Aggressive
  2. Logistic Regression with Amplifier
  3. AdaGrad, AdaDelta

Recommendation

News20 multiclass related article recommendation

  1. LSH/Minhash

MovieLens movie recommendation

  1. Matrix Factorization
  2. 10-fold Cross Validation (Matrix Factorization)
  3. Factorization Machine

Nearest Neighbor

News20 multiclass similar article search

  1. LSH/Minhash and Brute-Force Search
  2. kNN search using b-Bits Minhash

Anomaly Detection

  1. Outlier Detection using Local Outlier Factor (LOF)

Natural Language Processing

  1. English/Japanese Text Tokenizer
Clone this wiki locally