This repository has been archived by the owner on Oct 8, 2019. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 153
Home
Makoto YUI edited this page Jan 7, 2016
·
121 revisions
Welcome to Hivemall, the scalable machine learning library for Hive.
- Installation
- Installing Hivemall UDFs as permanent functions
- Input formats for training (Please read this!)
- Using explicit addBias() for better prediction
- Use rand_amplify() to better prediction results
- Building a Real-time Prediction by Integrating with RDBMS
- List of Supported Algorithms for each Hivemall Version
- Ensemble learning for stable prediction
- Iterative training using distributed cache (#1 Logistic Regression on a9a dataset) (#2 Classification using Confidence Weight on news20 binary/multiclass dataset)
- Mixing models for a better prediction convergence (MIX server) [experimental]
- Run Hivemall on Amazon Elastic MapReduce
- Hivemall on Pig [experimental]
- Hivemall on Spark [experimental]
- Adding rowid for each row
- Hadoop tuning for Hivemall
- Compressing a large training table
- Efficient Top-K query processing on Hive using Hivemall
- OutOfMemoryError in training
- SemanticException Generate Map Join Task Error: Cannot serialize object
- Asterisk argument for UDTF does not work
- The number of mappers is less than input splits in Hadoop 2.x
- Map-side Join causes ClassCastException (LazyBinaryArray cannot be cast to [Ljava.lang.Object) on Tez
- Vectorize Features
- Quantify non-number features
- Polynomial Feature for Non-Linear Regression/Classification (a.k.a. Feature Pairing)