In this homework you will implement AdaBoost.
We will use the HC Temperature data set found on the course’s webpage.
It contains 130 data points. The label (1 and -1) will be the gender,
and the temperature and heartrate define the 2-dimensional point.
The hypothesis class is the set of axis-parallel rectangles for which the inside is positive and
the outside is negative.
Note that a rectangle can defined by 2, 3 or 4 points.
Write a function called Rectangle, which given a set of labelled and weighted points,
finds a rectangle which minimizes the weighted error on the points,
that is the sum of weights of wrongly placed points.
Now write the AdaBoost algorithm on a training set of size n. The pseudocode is as follows:
Initialize each point weight to be 1/n: D_0(x_i) = 1/n
For round t in range(r):
1.Use Rectangle to find a rectangle with minimum weighted error ε
and call this rectangle h_t
2.Compute the weight
3.Compute new weights for the points:
i. For an error on point x_i: D_t(x_i) = D_t-1(x_i) exp(ά_t)
ii. Not an error on point x_i: D_t(x_i) = D_t-1(x_i) exp(-ά_t)
4.Normalize these weights: D_t(x_i) = D_t(x_i) / ∑_j(D_t(x_j))
Run the algorithm 100 times for each of r=1,…,8.
For each run, randomly divide the points into 65 training points R and 65 test points T.
Then run AdaBoost on R, and after computing the final hypothesis, find its error T.
Recall that the final hypothesis on each test point x in T is:
Average the error for each r over the 100 runs, and print out this average error for each r.
Now create a function called Circle, which given a set of labelled points with weights, finds a
circle which minimizes the error on the points.
A circle is defined by two points – the center,
and another point whose distance from the center determines the radius.
Run Adaboost again as before, now on circles instead of rectangles, and print out this average error for
each r.