Understanding the AdaBoost

오답노트의 힘

R
ML
Author

Changjun Lee

Published

March 29, 2023

Overview

The AdaBoost algorithm is a type of ensemble learning algorithm that combines multiple “weak” classifiers to create a “strong” classifier. A weak classifier is one that performs only slightly better than random guessing (i.e., its accuracy is slightly better than 50%). In contrast, a strong classifier is one that performs well on the classification task.

The basic idea behind AdaBoost is to iteratively train a sequence of weak classifiers, and then combine their predictions using a weighted majority vote to obtain a strong classifier. In each iteration, the algorithm assigns higher weights to the misclassified samples, so that the subsequent classifiers focus more on the difficult samples.

Algorithm Steps

Here are the steps of the AdaBoost algorithm:

  1. Initialize the sample weights \(w_i\) to 1/n, where n is the number of samples.

  2. For t in 1:T, where T is the number of iterations:

  • Train a weak classifier \(h_t(x)\) on the training data using the current weights.

  • Calculate the error rate \(ε_t\) of \(h_t(x)\) on the training data. The error rate is defined as the weighted sum of the misclassified samples:

\[ ε_t = Σ_i w_i \times I(y_i \neq h_t(x_i)) \]

where \(y_i\) is the true class label of sample i, \(h_t(x_i)\) is the predicted class label of \(h_t(x)\) for sample i, and \(I()\) is the indicator function that returns 1 if the argument is true and 0 otherwise.

  • Calculate the weight \(α_t\) of \(h_t(x)\) as \(α_t = \frac{log((1 - ε_t)}{ε_t}\). The weight \(α_t\) measures the “importance” of the weak classifier \(h_t(x)\) in the ensemble. The weight is larger for classifiers that perform well (i.e., have a low error rate) and smaller for classifiers that perform poorly (i.e., have a high error rate).
Note

Note that \(ε_t\) must be strictly less than 0.5 to ensure that \(α_t\) is positive.

  • Update the sample weights as \(w_i = w_i \times exp(α_t)\). The weight update gives higher weight to the misclassified samples and lower weight to the correctly classified samples. The weight update is equivalent to:

if y_i = h_t(x_i), then w_i = w_i * exp(-α_t)

if y_i ≠ h_t(x_i), then w_i = w_i * exp(α_t)

  • Normalize the weights so that they sum to 1. The normalization ensures that the weights are valid probability distributions.
  1. Combine the weak classifiers using the weighted majority vote rule to obtain the final prediction. The final prediction is given by:

\[ H(x) = sign(Σ_t (α_t \times h_t(x))), \]

where sign() is the sign function that returns -1 for negative values and 1 for positive values.

Implementing AdaBoost in R

To implement AdaBoost in R, we can use the adabag package, which provides an implementation of the algorithm. Here’s an example of how to use adabag to train an AdaBoost classifier on the iris dataset:

library(adabag)
Loading required package: rpart
Loading required package: caret
Loading required package: ggplot2
Loading required package: lattice
Loading required package: foreach
Loading required package: doParallel
Loading required package: iterators
Loading required package: parallel
# Load the iris dataset
data(iris)

# Convert the species to a binary variable
iris$Species <- as.factor(ifelse(iris$Species == "setosa", 1, 0))

# Split the dataset into training and testing sets
train_idx <- sample(1:nrow(iris), size = 100, replace = FALSE)
train_data <- iris[train_idx, ]
test_data <- iris[-train_idx, ]

# Train an AdaBoost classifier with 50 iterations
ada_model <- boosting(Species ~ ., 
                      data = train_data, 
                      boos = TRUE, 
                      mfinal = 50)


# Make predictions on the testing data
pred <- predict.boosting(ada_model, newdata = test_data)

# Calculate the accuracy of the classifier
acc <- sum(pred$class == test_data$Species) / nrow(test_data)
print(paste0("Accuracy: ", acc))
[1] "Accuracy: 1"

In this example, we first load the adabag package and the iris dataset. We then convert the species variable to a binary variable (-1 for “setosa” and 1 for “versicolor” and “virginica”). We split the dataset into a training set (100 samples) and a testing set (50 samples).

Next, we train an AdaBoost classifier with 50 iterations using the boosting() function from adabag. We specify the formula (Species ~ .) and the training data (train_data), and set the boos parameter to TRUE to enable AdaBoost.

We then make predictions on the testing data using the predict.boosting() function, and calculate the accuracy of the classifier by comparing the predicted class labels to the true class labels.

You can modify this example by changing the number of iterations (mfinal) or the dataset to fit your specific needs. Additionally, you can try using other weak classifiers, such as decision trees or logistic regression, and compare their performance to AdaBoost.

Similarities & Difference with Random Forest

Similarities

  • Both Random Forest and AdaBoost are ensemble learning algorithms that combine multiple “weak” models to create a “strong” model.

  • Both algorithms use a form of bootstrap sampling to generate multiple training sets, which helps to reduce overfitting and improve the generalization performance of the models.

  • Both algorithms are widely used in machine learning and can be applied to a wide range of classification and regression tasks.

Differences

  • Random Forest combines multiple decision trees, each trained on a different subset of the features and samples, and uses a majority vote to make predictions. In contrast, AdaBoost combines multiple weak models, with each model trained on the same dataset but with different weights assigned to the samples.

  • Random Forest places equal weight on all the samples, while AdaBoost assigns higher weights to the misclassified samples in each iteration, so that the subsequent models focus more on the difficult samples.

  • Random Forest uses a simple majority vote to make predictions, while AdaBoost combines the predictions of the weak models using weighted majority vote, with each model weighted by its importance in the ensemble.

  • Random Forest can handle a wide range of datasets and is less sensitive to outliers and noise, while AdaBoost is more sensitive to noisy and unbalanced datasets and may require more data preprocessing.

  • Random Forest typically performs well with large feature sets and high-dimensional data, while AdaBoost may require careful feature selection or dimensionality reduction to prevent overfitting and improve performance.

Summary

The AdaBoost algorithm is a powerful ensemble learning algorithm that can improve the performance of weak classifiers by combining their predictions. It works by iteratively training a sequence of weak classifiers, and then combining their predictions using a weighted majority vote to obtain a strong classifier. The algorithm assigns higher weights to the misclassified samples in each iteration, so that the subsequent classifiers focus more on the difficult samples. The weight updates and normalization ensure that the subsequent classifiers focus more on the difficult samples, and the final prediction is based on the weighted majority vote rule, which gives more weight to the predictions of the strong classifiers.

Overall, AdaBoost is a powerful and widely used algorithm in machine learning, particularly for classification problems. It is relatively simple to implement and can be applied to a wide range of classification tasks. Additionally, it has been shown to perform well even with noisy and unbalanced datasets.