Skip to content

Meta Algorithms

I-Huei Ho edited this page Feb 6, 2018 · 2 revisions

Introduction

Meta-algorithms are approaches to combine several machine learning techniques into one predictive model in order to decrease the variance (bagging), bias (boosting), or improving the predictive force (stacking alias ensemble). In this WIKI page, we are only talking about bagging and boosting.

There is a quick comparison of these three approaches:

Bagging Boosting Stacking
Partition of the data into subsets Random Gives misclassified samples higher preference Various
Goal to achieve Minimize variance Increase predictive force Both
Methods where this is used Random subspace gradient descent Blending
Function to combine single models (Weighted) average Weighted majority vote Logistic Regression

Bagging

A way to decrease the variance of your prediction by generating additional data for training from your original dataset using combinations with repetitions to produce multisets of the same size as your original data. By increasing the size of your training set you can't improve the model predictive force, but just decrease the variance, narrowly tuning the prediction to expected outcome.

For example, Random Forest is a bagging algorithm, which reduces variance.

If you have very unreliable models, e.g. decision trees, you can build a robust model through bagging by creating different models with resampling the data to make the result more robust. Random forest is a bagging algorithm applied to decision trees to make the model more stable.

Boosting

A two-step approach, where one first uses subsets of the original data to produce a series of averagely performing models and then boosts their performance by combining them together using a particular cost function (i.e. majority vote). Unlike bagging, in the classical boosting the subset creation is not random and depends upon the performance of the previous models: every new subsets contains the elements that were misclassified by previous models.

Boosting reduces variance by using multiple models (bagging), and it also reduces bias by training the subsequent model with telling it what errors the previous models made (boosting).

  • Adaboost: Tell subsequent models to punish more heavily observations mistaken by the previous models
  • Gradient boosting: Train each subsequent model using the residuals (the difference between the predicted and true values), e.g. XGBoost

Discussion of these Ensembles

  • To use these ensembles, your base learner must be weak.

If the model overfits the data, there won't be any residuals or errors for the subsequent models to build upon (the residuals will be 0).

  • Using gradient boosting because it is very easy to use different loss functions even when the derivative is not convex.

  • Don't mess up with random forest and gradient boosting trees (XGBoost).

People sometimes confuse random forest and gradient boosting trees, just because both use decision trees, but they are two very different families of ensembles!

XGBoost

XGBoost is a very fast, scalable implementation of gradient boosting.

Installation

Read the tutorial from XGBoost github or read the following quick tutorial for Mac users on Python 3.6:

  1. Obtain gcc-7.x.x with Anaconda:
$ conda install -c anaconda gcc
  1. Clone the XGBoost repository
$ git clone --recursive https://github.com/dmlc/xgboost
  1. Build XGBoost
$ cd xgboost; cp make/config.mk ./config.mk; make -j4

Python Package Installation

  1. Install system-widely
$ cd python-package; sudo python setup.py install
  1. Only set the environment variable PYTHONPATH to tell python where to find the library.
$ export PYTHONPATH=~/xgboost/python-package

Quick test

Try the following codes from XGBoost github and test your XGBoost: (The required data are in XGBoost repo)

import xgboost as xgb

# read in data
dtrain = xgb.DMatrix('demo/data/agaricus.txt.train')
dtest = xgb.DMatrix('demo/data/agaricus.txt.test')
# specify parameters via map
param = {'max_depth':2, 'eta':1, 'silent':1, 'objective':'binary:logistic' }
num_round = 2
bst = xgb.train(param, dtrain, num_round)
# make prediction
preds = bst.predict(dtest)
print(preds)

Reference: