One of the things that blew my mind the first time I learned about it is bootstrapping. Although very counterintuitive at first glance, it makes perfect sense to estimate the accuracy of your sample estimates, given that you have a representative sample. In this blog post I explain how you can calculate confidence intervals for any difference in estimate between two samples, using the simpleboot R package.
First we load in the packages. I load in the simpleboot package for performing the two-sample bootstrap and I will use ggplot2 for demonstration.
Next, I generate two samples of data where we can perform the bootstrap on. I also plot their densities to show how these two samples differ from each other.
sample1 <- data.frame(sample = 'first sample', values = rnorm(1200,24,5)) sample2 <- data.frame(sample = 'second sample', values = rnorm(1600,20,5)) samples <- rbind(sample1,sample2) ggplot(samples,aes(x = values, col = sample, fill = sample, group = sample)) + geom_density(alpha=0.2)
Next up, I bootstrap the mean difference of the two samples. I resample (with replacement of course) 2500 times.
bootstrapped <- two.boot(sample1$values, sample2$values, mean, 2500)
The bootstrapped object is a list. The most interesting elements are the initial sample mean (t0), the sample differences (t), the amount of resamples (R), the data, the seed, etc.
In the next part of the code, I turn the vector of sample differences into a data frame so that I can use it in ggplot. I plot the differences as a histogram and add three vertical lines: the initial sample mean and the confidence interval boundaries (p = 0.05).
bootstrapped_mean_diff <- data.frame(bootstrapped$t) colnames(bootstrapped_mean_diff) <- 'mean_diffs' ggplot(bootstrapped_mean_diff, aes(x=mean_diffs)) + geom_histogram(bins=20, alpha=0.8) + geom_vline(xintercept = bootstrapped$t0, size = 1.5) + geom_vline(xintercept = quantile(bootstrapped_mean_diff$mean_diffs, 0.05), size = 1, linetype = 'dashed') + geom_vline(xintercept = quantile(bootstrapped_mean_diff$mean_diffs, 0.95), size = 1, linetype = 'dashed') + xlab('sample mean') + ylab('bootstrapped count')
Given this information we got from bootstrapping our samples, we can reject the null hypothesis that both means are the same. On a p = 0.05 level, we can see that both samples have a different mean.