The bootstrap was introduced in 1979 by Bradley Efron. It is a method for assigning measures of accuracy to sample estimates. For example, if I only have 20 samples from one school’s math test score. I can caculate the mean and variance of samples. But does the result reflect the population (whole students) mean and variance? It depends on if you have known the population size. When the population size is very large, the answer is no. Bootstrap can help to solve the problem.

The bootstrap works by treating inference of the true probability distribution , given the original data, as being analogous to inference of the empirical distribution of , given the resampled data. The accuracy of inferences regarding using the resampled data can be assessed because we know . If is a reasonable approximation to , then the quality of inference on can in turn be inferred.

The function boot.iid() in the animation package has provided an illustration of bootstrapping for the i.i.d data.

This is a naive version of bootstrapping but may be useful for novices. In the top plot, the circles denote the original dataset, while the red sunflowers (probably) with leaves denote the points being resampled; the number of leaves just means how many times these points are resampled, as bootstrap samples with replacement. The bottom plot shows the distribution of . The whole process has illustrated the steps of resampling, computing the statistic and plotting its distribution based on bootstrapping.

library(animation)
par(mar = c(1.5, 3, 1, 0.1), cex.lab = 0.8, cex.axis = 0.8, mgp = c(2, 
  0.5, 0), tcl = -0.3)
boot.iid()


Published

08 May 2013

Tags