R7. Topics in bootstrappingΒΆ
This recitation was prepared by Cecelia Andrews with help from Justin Bois. Some modifications were made by Sanjana Kulkarni.
C. elegans egg cross-sectional area dataset download
Phone repair times dataset download
[1]:
# Colab setup ------------------
import os, sys, subprocess
if "google.colab" in sys.modules:
cmd = "pip install --upgrade iqplot bebi103 colorcet watermark"
process = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout, stderr = process.communicate()
data_path = "https://s3.amazonaws.com/bebi103.caltech.edu/data/"
else:
data_path = "../data/"
# ------------------------------
import pandas as pd
import numpy as np
import numba
import scipy.stats as st
import bebi103
import iqplot
import timeit
import colorcet
import bokeh.io
import bokeh.plotting
import bokeh.layouts
bokeh.io.output_notebook()
/Users/bois/opt/anaconda3/lib/python3.8/site-packages/arviz/__init__.py:317: UserWarning: Trying to register the cmap 'cet_gray' which already exists.
register_cmap("cet_" + name, cmap=cmap)
/Users/bois/opt/anaconda3/lib/python3.8/site-packages/arviz/__init__.py:317: UserWarning: Trying to register the cmap 'cet_gray_r' which already exists.
register_cmap("cet_" + name, cmap=cmap)
This this recitation, we will cover
Review of bootstrapping concepts from Lesson 16.
BCa Confidence Intervals
Studentized Bootstrap
An excellent reference, if dated, for bootstrapping calculations is Efron and Tibshiraniβs An Introduction to the Bootstrap, 1994 (Amazon).
1. ReviewΒΆ
The plug-in principleΒΆ
This was covered in lesson 12.
Simplified definition: Say we have a sample of repeated measurements, and we know there is a generative distribution for these measurements. But, we donβt know what that generative distribution is. We can estimate a parameter, \(\theta = T(F)\), of the generative distribution by computing the statistical functional using the empirical CDF rather than the actual CDF.
Call the CDF of our generative distribution \(F(x)\). We want to compute a statistical functional, \(T(F)\), of the CDF. \(T(F)\) could be the mean, variance, median, etc. The parameter \(\theta\) is defined by the functional, so \(\theta = T(F)\).
However, we only have a sample from the generative distribution (our egg length measurements). We can compute the ECDF of our measurements, called \(\hat{F}\). The plug-in principle simply states we can estimate our parameter of interest, \(\theta\), by calculating:
\begin{align} \hat{\theta} &= T(\hat{F}) \end{align}
Essentially, the plug-in principle says we can estimate the parameter \(\theta\) by computing the statistical functional on the empirical CDF rather than the actual CDF.
Confidence IntervalsΒΆ
These were covered in lesson 12.
βIf an experiment is repeated over and over again, the estimate I compute for a parameter, \(\hat{\theta}\) , will lie between the bounds of the 95% confidence interval for 95% of the experiments.β This is the frequentist interpretation. The center of the interval is treated as a fixed value (population test statistic), and the bounds are random variables that change based on the simulation.
In Bayesian statistics, the analogous interval is called a credible interval. The center is the random variable, while the bounds are treated as fixed. Therefore, the interpretation is that the population statistic has a 95% chance of lying within a 95% credible interval.
BootstrappingΒΆ
Also covered in Lesson 16.
The basic idea of bootstrapping is to simulate many repetitions of the experiment. It is a type of resampling, in which you draw samples from your sample to generate a bootstrap distribution. For example, if you have a sample of 10 data points, you can samples by randomly choosing from the original set of 10 points, with replacement.
Bootstrapping allows us to get a better plug-in estimate for the statistical functional as if we did the experiment many times. To see the power of bootstrapping in action, letβs consider the example given in Efron and Tibshiraniβs An Introduction to the Bootstrap.
The CSV file law_school_data.csv
, available here, contains the population of measurements (LSAT, GPA) for N=82 American law schools for the entering classes of 1973. LSAT is the average score of the class on the LSAT (national law exam), and GPA is the average undergraduate GPA. In this example, we have a census, so we know the actual values of the summary statistics we may want to compute. Here we load in the data
and view the first 10 observations.
[2]:
df_law = pd.read_csv(os.path.join(data_path, 'law_school_data.csv'))
df_law.head(10)
[2]:
School | LSAT | GPA | |
---|---|---|---|
0 | 1 | 622 | 3.23 |
1 | 2 | 542 | 2.83 |
2 | 3 | 579 | 3.24 |
3 | 4 | 653 | 3.12 |
4 | 5 | 606 | 3.09 |
5 | 6 | 576 | 3.39 |
6 | 7 | 620 | 3.10 |
7 | 8 | 615 | 3.40 |
8 | 9 | 553 | 2.97 |
9 | 10 | 607 | 2.91 |
Now, letβs pretend we are a researcher interested in estimating the mean LSAT and GPA for American law schools. It would be difficult to survey every law school in the USA and collect LSAT and GAP scores of each class, so in practice, researchers collect a sample, a subset of the population, and make inferences about the population. Say we collect a random sample of 15 law schools and want to estimate the population mean from our sample. We will simulate this by drawing a random sample of size n=15:
[3]:
# Set the random seed
rg = np.random.default_rng(seed=3252)
# Draw 15 random samples and make a new dataframe
indices = rg.choice(np.arange(82), replace=False, size=15)
df_sample = df_law.loc[indices]
df_sample
[3]:
School | LSAT | GPA | |
---|---|---|---|
21 | 22 | 614 | 3.19 |
51 | 52 | 580 | 3.07 |
11 | 12 | 596 | 3.24 |
17 | 18 | 646 | 3.47 |
2 | 3 | 579 | 3.24 |
10 | 11 | 558 | 3.11 |
33 | 34 | 591 | 3.02 |
3 | 4 | 653 | 3.12 |
13 | 14 | 581 | 3.22 |
36 | 37 | 615 | 3.37 |
34 | 35 | 578 | 3.03 |
52 | 53 | 594 | 2.96 |
16 | 17 | 599 | 3.23 |
70 | 71 | 570 | 3.01 |
49 | 50 | 555 | 3.00 |
How does this sample compare to the population? Letβs compare the population mean LSAT and GPA to the sample means:
[4]:
# Print means
print(f'Sample mean LSAT: {np.mean(df_sample.LSAT)}')
print(f'Sample mean GPA: {np.mean(df_sample.GPA)}')
print(f'Population mean LSAT: {np.mean(df_law.LSAT)}')
print(f'Population mean GPA: {np.mean(df_law.GPA)}')
Sample mean LSAT: 593.9333333333333
Sample mean GPA: 3.1519999999999997
Population mean LSAT: 597.5487804878048
Population mean GPA: 3.1347560975609756
The accuracy of the mean LSAT and GPA of the small sample will vary based on the sample you draw, but they can be fairly far away from the mean of the population such that the standard error does not capture the actual mean.
Letβs compare the ECDFs as well. To overlay the sample and population ECDFs, we will use manually compute the ECDFs of the samples and populations and overlay them on the population plots. This is easier to do since the sample and population are of different sizes.
[5]:
def compute_ecdf(data):
"""Returns ECDF values of data"""
return np.arange(1, len(data) + 1) / len(data)
# Initialize two bokeh plots using a list comprehension
labels = ["LSAT", "GPA"]
p_LSAT, p_GPA = [
bokeh.plotting.figure(
height=300, width=400, x_axis_label=labels[i], y_axis_label="ECDF"
)
for i in range(2)
]
# Compute and plot the ECDFs for LSAT
p_LSAT.circle(
np.sort(df_law.LSAT), compute_ecdf(df_law.LSAT), legend_label="Population"
)
p_LSAT.circle(
np.sort(df_sample.LSAT),
compute_ecdf(df_sample.LSAT),
legend_label="Sample",
color="darkorange",
size=6,
)
# Compute and plot the ECDFs for GPA
p_GPA.circle(
np.sort(df_law.GPA), compute_ecdf(df_law.GPA), legend_label="Population",
)
p_GPA.circle(
np.sort(df_sample.GPA),
compute_ecdf(df_sample.GPA),
legend_label="Sample",
color="darkorange",
size=6,
)
p_LSAT.legend.location = p_GPA.legend.location = "top_left"
bokeh.io.show(bokeh.layouts.row(p_LSAT, p_GPA))
The ECDFs match up fairly well, but if we zoom in on the 50th percentile we can see the medians may not overlap, and the means are probably off a bit, too. Letβs see if we can compute a confidence interval through bootstrapping. Weβll use the function draw_bs_sample()
to draw bootstrap samples of the same size as our sample (n=15), and compute the means of each to create 1000 replicates. Remember, our bootstrap samples are always the same size as the original sample because we want to
simulate repeating the same experiment.
[6]:
@numba.njit
def draw_bs_sample(data):
"""Draw a bootstrap sample from a 1D data set."""
return np.random.choice(data, size=len(data))
# Number of bootstrap replicates we want to draw
N = 1000
sample_lsat_mean_reps = np.empty(N)
sample_gpa_mean_reps = np.empty(N)
# Turn the sample LSAT and GPA scores into arrays for use with numba
sample_lsat = np.array(df_sample.LSAT)
sample_gpa = np.array(df_sample.GPA)
for i in range(N):
sample_lsat_mean_reps[i] = np.mean(draw_bs_sample(sample_lsat))
sample_gpa_mean_reps[i] = np.mean(draw_bs_sample(sample_gpa))
print(f"Sample mean LSAT: {np.mean(sample_lsat)}")
print(f"Sample mean GPA: {np.mean(sample_gpa)}")
print("\n")
print(f"Mean LSAT 95% C.I.: {np.percentile(sample_lsat_mean_reps, [2.5, 97.5])}")
print(f"Mean GPA 95% C.I.: {np.percentile(sample_gpa_mean_reps, [2.5, 97.5])}")
print("\n")
print(f"Population mean LSAT: {np.mean(df_law.LSAT)}")
print(f"Population mean GPA: {np.mean(df_law.GPA)}")
Sample mean LSAT: 593.9333333333333
Sample mean GPA: 3.1519999999999997
Mean LSAT 95% C.I.: [580.99333333 608.26666667]
Mean GPA 95% C.I.: [3.08596667 3.22136667]
Population mean LSAT: 597.5487804878048
Population mean GPA: 3.1347560975609756
The 95% confidence interval captures the true mean for both the LSAT and GPA data. Letβs compare the ECDFs again. We will create 500 bootstrap sample ECDFs and overlay them over the population ECDF. Note: if 500 samples takes too long to render on your machine, you can change the variable n
to something like 100. You will still see the same overall effect.
[7]:
# Initialize two bokeh plots using a list comprehension
labels = ["LSAT", "GPA"]
p_LSAT, p_GPA = [
bokeh.plotting.figure(
height=300, width=400, x_axis_label=labels[i], y_axis_label="ECDF"
)
for i in range(2)
]
n = 500
# Make n bootstrap samples and plot them
for i in range(n):
lsat = draw_bs_sample(sample_lsat)
gpa = draw_bs_sample(sample_gpa)
# Add semitransparent ECDFs to the plot
p_LSAT.circle(
np.sort(lsat),
compute_ecdf(lsat),
color=colorcet.b_glasbey_category10[0],
alpha=0.05,
)
p_GPA.circle(
np.sort(gpa),
compute_ecdf(gpa),
color=colorcet.b_glasbey_category10[0],
alpha=0.05,
)
p_LSAT.circle(
np.sort(df_law.LSAT),
compute_ecdf(df_law.LSAT),
color=colorcet.b_glasbey_category10[1],
alpha=2,
)
p_GPA.circle(
np.sort(df_law.GPA),
compute_ecdf(df_law.GPA),
color=colorcet.b_glasbey_category10[1],
alpha=2,
)
bokeh.io.show(bokeh.layouts.row(p_LSAT, p_GPA))
What do you notice about the bootstrapped ECDFs? Consider the approximate shape and mean.
The boostrapped samples are a pretty good approximation of the overall population distribution.
Now that weβve shown the power of bootstrapping, letβs explore some more bootstrapping methods! First weβll review pairs bootstrapping, which we learned about in class, then go into some new methods.
Pairs BootstrapΒΆ
Also covered in Lesson 16. See the lesson for a coding example.
We use pairs bootstrapping when we have paired data, and we will often want to compute the correlation. The important thing to remember with pairs bootstrapping is that we draw a random sample of pairs, not 2 random samples of each array. For example, if we have paired data in 2 arrays: \(\mathbf{x} = [x_1, x_2,\ldots, x_n]\) and \(\mathbf{y} = [y_1, y_2,\ldots, y_n]\), we must draw the same indices from both arrays so we end up with paired samples:
\begin{align} &\mathbf{\hat{x}} = [x_i, x_j, x_k,\ldots, x_n], \text{ and}\\[1em] &\mathbf{\hat{y}} = [y_i, y_j, y_k,\ldots, y_n]. \end{align}
Question: Why do we have to keep pairs together to estimate correlation? What result will you likely get for the correlation if your pairs get separated?
2. BCa Confidence IntervalsΒΆ
The BCa confidence interval is the bias-corrected accelerated bootstrap confidence interval. This concept was introduced by Bradley Efron. The BCa interval is designed to automatically create confidence intervals that adjust for underlying higher order effects. In simple terms, it corrects for bias and skew in the distribution of bootstrap estimates.
In this recitation, we will focus on the nonparametric case and show a coding example. If you want to dive into this concept even more, which I highly recommend, take a look at Efron, βBetter Bootstrap Confidence Intervals.β Journal of the American Statistical Association 82, no. 397 (1987): 171-85.
Letβs consider the C. elegans egg data example again. We have a dataset of repeated measurements of the areas C. elegans eggs. We know there is a generative distribution for this data, but we donβt know what it is. We want to compute a confidence interval of the mean. For the general case, weβll call the statistic we want to estimate \(\theta\). As we know, the standard confidence interval is given by:
\begin{align} \theta \in [\hat{\theta} + \hat{\sigma}z^{(\alpha)}, \hat{\theta} + \hat{\sigma}z^{(1 - \alpha)}] \tag{2.1} \end{align}
where \(\hat{\theta}\) is our sample estimate of \(\theta\), \(\hat{\sigma}\) is our estimate of the standard deviation of the distribution of \(\hat{\theta}\), and \(z^{(\alpha)}\) is the \(100 \times \alpha\) percentile point of a standard normal variate.
This standard confidence interval (Equation 2.1) is based on taking the asymptotic normal approximation
\begin{align} (\hat{\theta} - \theta)/\hat{\sigma} \sim N(0, 1) \tag{2.2} \end{align}
In our case, and in many cases, we assume there is a generative distribution, \(g\), for our data. Thus, we can achieve normality and constant standard error by considering the monotone transformation \(\hat{\phi} = g(\hat{\theta})\) and \(\phi = g(\theta)\) instead of \(\hat{\theta}\) and \(\theta\). We assume this transformation is both normalizing and variance stabilizing (defined momentarily). Through this method, we can now compute the confidence interval for \(\phi\) based on:
\begin{align} (\hat{\phi} - \phi)/\tau \sim N(-z_0, 1) \tag{2.3} \end{align}
where \(\tau\) is the constant standard error of \(\hat{\phi}\) and \(z_0\) is the bias constant. To get the confidence interval for \(\theta\), we take the inverse transformation \(\theta = g^{-1}(\phi)\). The great part: all of this is achieved automatically simply through bootstrapping! We do not need to know \(g\) or \(z_0\).
How can we improve on this? Thatβs where BCa comes in. Like we described previously, through bootstrapping alone we are assuming the transformation \(g\) is both normalizing and variance stabilizing. A variance stabilizing transformation is a transformation such that the variance is no longer related to the mean. For some distributions (ex. Binomial), the variance is dependent on the mean. A variance stabilizing transformation removes this dependence. But, since we do not know the transformation, we do not know for sure that it is variance stabilizing. In the BCa confidence interval, we only need to know that \(g\) is normalizing. This makes it more accurate in many cases!
The BCa incorporates some bias constant, \(z_0\), and some acceleration constant, \(a\), such that the BCa confidence interval is based on:
\begin{align} &(\hat{\phi} - \phi)/\tau \sim N(-z_0\sigma_\phi, \sigma_\phi^2),\\[1em] &\sigma_\phi = 1 + a\phi. \tag{2.4} \end{align}
We can determine \(a\) and \(z_0\) by resampling our data. Here, we are considering the nonparametric case. For the nonparametric case, the acceleration constant for the mean is given by
\begin{align} a = \frac{\sum_i (x_i - \bar{x})^3}{6\left(\sum_i (x_i - \bar{x})^2\right)^{3/2}}. \tag{2.5} \end{align}
We can compute \(z_0\) from our bootstrap replicates. The bias-correction constant, \(z_0\), is:
\begin{align} z_0 = \phi^{-1}(\hat{G}(\hat{\theta})) \tag{2.6} \end{align}
From the plug-in principle, we know that for the nonparametric case we can approximate the CDF \(\hat{G}(\hat{\theta})\) as the ECDF of our bootstrap replicates. So, \(\phi^{-1}(\hat{G}(\hat{\theta}))\) is given by the inverse ECDF of our bootstrap replicates.
This may all be a little confusing now, so letβs practice with the worm egg example! I will review how to compute each parameter as we go.
Letβs load in our worm egg area dataset, digitized from Harvey and Orbidansβs 2011 PLoS One paper.
[8]:
df_worms = pd.read_csv(os.path.join(data_path, 'c_elegans_egg_xa.csv'), comment='#')
df_worms.head()
[8]:
food | area (sq. um) | |
---|---|---|
0 | high | 1683 |
1 | high | 2061 |
2 | high | 1792 |
3 | high | 1852 |
4 | high | 2091 |
And we show an ECDF to get a feel for the data.
[9]:
p = iqplot.ecdf(data=df_worms, q="area (sq. um)", cats="food")
bokeh.io.show(p)
It looks like worms that received more food have smaller eggs. Any hypotheses for why that may be the case?
Weβll convert the data weβre interested into arrays, separated by how much food the worms received:
[10]:
fed_worms = df_worms.loc[df_worms['food'] == 'high']['area (sq. um)'].values
hungry_worms = df_worms.loc[df_worms['food'] == 'low']['area (sq. um)'].values
Here we write some helper functions to calculate the parameters for the BCa interval. We will combine them in our main function, compute_bca_ci()
.
[11]:
# Define "statfunction" as the function we want to apply to the replicates.
def compute_jackknife_reps(data, statfunction=np.mean):
"""Returns jackknife resampled replicates for
the given data and statistical function"""
# Set up empty array to store jackknife replicates
jack_reps = np.empty(len(data))
# For each observation in the dataset, compute the statistical function on the sample
# with that observation removed
for i in range(len(data)):
jack_sample = np.delete(data, i)
jack_reps[i] = statfunction(jack_sample)
return jack_reps
def compute_a(jack_reps):
"""Returns the acceleration constant a"""
mean = np.mean(jack_reps)
return (1 / 6) * np.divide(
np.sum(mean - jack_reps) ** 3, (np.sum(mean - jack_reps) ** 2) ** (3 / 2)
)
def bootstrap_replicates(data, n_reps=1000, statfunction=np.mean):
"""Computes n_reps number of bootstrap replicates for
given data and statistical function"""
boot_reps = np.empty(n_reps)
for i in range(n_reps):
boot_reps[i] = statfunction(draw_bs_sample(data))
return boot_reps
def compute_z0(data, boot_reps, statfunction=np.mean):
"""Computes z0 for given data and statistical function"""
s = statfunction(data)
return st.norm.ppf(np.sum(boot_reps < s) / len(boot_reps))
def compute_bca_ci(data, alpha_level, n_reps=1000, statfunction=np.mean):
"""Returns BCa confidence interval for given data at given alpha level"""
# Compute bootstrap and jackknife replicates
boot_reps = bootstrap_replicates(data, n_reps, statfunction)
jack_reps = compute_jackknife_reps(data, statfunction)
# Compute a and z0
a = compute_a(jack_reps)
z0 = compute_z0(data, boot_reps, statfunction)
# Compute confidence interval indices
alphas = np.array([alpha_level / 2.0, 1 - alpha_level / 2.0])
zs = z0 + st.norm.ppf(alphas).reshape(alphas.shape + (1,) * z0.ndim)
avals = st.norm.cdf(z0 + zs / (1 - a * zs))
ints = np.round((len(boot_reps) - 1) * avals)
ints = np.nan_to_num(ints).astype("int")
# Compute confidence interval
boot_reps = np.sort(boot_reps)
ci_low = boot_reps[ints[0]]
ci_high = boot_reps[ints[1]]
return (ci_low, ci_high)
We can now compute the BCa confidence intervals.
[12]:
print(f"Fed worms mean egg area 95% C.I.: {compute_bca_ci(fed_worms, 0.05)}")
print(f"Hungry worms mean egg area 95% C.I.: {compute_bca_ci(hungry_worms, 0.05)}")
Fed worms mean egg area 95% C.I.: (1791.4545454545455, 1869.7954545454545)
Hungry worms mean egg area 95% C.I.: (2068.245614035088, 2162.964912280702)
3. Studentized BootstrapΒΆ
The Studentized bootstrap error aims to reduce coverage error of our resampling. It is a good idea to use Studentized bootstrapping to approximate your statistic of interest when you suspect the distribution of your sample may not be close to the actual population distribution. The Studentized bootstrapping method works because even when this is the case, the distribution of the sample statistic divided by estimated standard error could be close to the actual population statistic divided by the population standard error. This the same principle employed in the Student-t test.
The Studentized bootstrap involves taking inner and outer bootstrap samples. The algorithm is as follows:
Compute \(\hat{\theta}\), the sample test statistic
A large number of times (number of reps β1000), draw a bootstrap sample. Call this the outer sample:
Compute the statistical functional on the sample
Resample your bootstrap sample β50 to 100 times. Call this the inner sample:
Compute the statistical function on the resample
For your inner sample replicates, compute the standard deviation
Compute your statistic of interest from your replicates: t = (outer sample replicates) / (standard deviation of the inner sample replicates)
Compute
s
= standard deviation of your outer sample replicatesMultiply
t
bys
Compute the confidence interval from the \(\alpha/2\) and \((1 - \alpha)/2\) percentiles of t*s
Letβs try this out on a dataset of the repair times of landline phones in New York City. The New York Public Utilities Commission monitors the response time for repairing landline phones in the state, and repair times are dependent on the time of year, type of repair, and geographical location. The data here are for one class of repairs at one time period for a specific Incumbent Local Exchange Carrier, which is a telephone company which once held the regional monopoly on landline service.
This times come from a particular time and region of New York City, so we expect them to not be representative of the population of all phone repair times ever recorded in New York City. First, we read in the data (which I obtained from the online textbook for UC Berkeleyβs Data 100 class found here).
[13]:
df_times = pd.read_csv(os.path.join(data_path, "ilec.csv"))
p = iqplot.ecdf(data=df_times, q="17.5", x_axis_label="Phone Repair Times")
bokeh.io.show(p)
We see that the distribution is heavily positively skewed, making this a good dataset from which to draw studentized boostrap replicates. For computing the confidence intervals of the means, weβll want to convert the data of interest into arrays, and take a look at the sample mean:
[14]:
times = np.array(df_times["17.5"])
print(f"Mean average repair time: {np.mean(times)} hours")
Mean average repair time: 8.406145520144317 hours
Now, weβll employ the Studentized bootstrap to compute the confidence interval of the average landline repair time.
[15]:
def studentized_bootstrap_ci(
data,
statfunction=np.mean,
n_bs_samples=1000,
n_inner_samples=100,
ptiles=(2.5, 97.5),
):
thetas = np.empty(n_bs_samples)
ts = np.empty(n_bs_samples)
# Outer sample (regular bootstrap sample)
for i in range(n_bs_samples):
sample = draw_bs_sample(data)
theta = statfunction(sample)
thetas[i] = theta
# Inner sample
resample_stats = np.empty(n_inner_samples)
for j in range(n_inner_samples):
resample = draw_bs_sample(sample)
resample_stats[j] = statfunction(resample)
# Compute statistic of interest
ts[i] = np.sum(resample_stats - theta) / np.std(resample_stats)
ts = np.sort(ts)
s = np.std(thetas)
p1, p2 = np.percentile(s * ts, ptiles)
ci_high = np.mean(data) - (s * p1)
ci_low = np.mean(data) - (s * p2)
# Return the confidence interval
return (ci_low, ci_high)
To time small processes, I use the timeit
module, which has a method called default_timer()
that initializes a timer in seconds when called. The time is constantly counting up, and you can store the time at any given instant in a variable.
[16]:
# Get the start time before the computation
start = timeit.default_timer()
studentized_ci = studentized_bootstrap_ci(times)
# Get the end time after
end = timeit.default_timer()
print(f"Average Repair Time 95% C.I.: {studentized_ci}")
Average Repair Time 95% C.I.: (6.043505027603063, 10.859360255642642)
We check how long the computation took for 1000 outer bootstrap samples and 100 inner samples. We will also compare this to the time required to get 1000 bootstrap replicates using the percentile method. The studentized method is longer both because of the inner samples and the computations themselves.
[17]:
print(f"Studentized confidence interval took {end-start} seconds to run.")
Studentized confidence interval took 2.3449103959999995 seconds to run.
For comparison, we can compute the non-Studentized confidence intervals.
[18]:
@numba.njit
def draw_bs_reps_mean(data, size=1):
"""Draw boostrap replicates of the mean from 1D data set."""
out = np.empty(size)
for i in range(size):
out[i] = np.mean(draw_bs_sample(data))
return out
# Get the start time before the computation
start = timeit.default_timer()
times_bs = draw_bs_reps_mean(times, size=N)
times_ci = np.percentile(times_bs, [2.5, 97.5])
# Get the end time after
end = timeit.default_timer()
print(f"Average Repair Time 95% C.I.: {times_ci}")
Average Repair Time 95% C.I.: [7.73403217 9.18199218]
[19]:
print(f"Percentile confidence interval took {end-start} seconds to run.")
Percentile confidence interval took 0.23111390500000084 seconds to run.
The non-Studentized confidence interval is smaller, and we can more easily see the difference in the following plot. We also note that the studentized computation takes longer than the percentile bootstrap even with just 1000 bootstrap samples.
[20]:
percentile_dict = {
"estimate": np.mean(times),
"conf_int": times_ci,
"label": "percentile",
}
studentized_dict = {
"estimate": np.mean(times),
"conf_int": studentized_ci,
"label": "studentized",
}
kwargs = {
"title": "95% Confidence Intervals",
"x_axis_label": "Average Phone Repair Time (hours)",
}
p = bebi103.viz.confints(
[percentile_dict, studentized_dict],
palette=colorcet.b_glasbey_category10[0],
**kwargs
)
bokeh.io.show(p)
Studentized confidence intervals are most useful for very small datasets (so less likely to be representative of the population) or datasets that are highly skewed. The extra (inner) sampling attempts to correct for the lack of information with a more rigorous treatment.
The main drawback is that it takes much longer than the percentile bootstrapping method. For datasets that are not skewed or very small, the two methods should give approximately equal confidence intervals. If you like, you can repeat this analysis on the bees and pesticides data in lesson 16. This sample distribution is closer to the population distribution, so the two methods give similar confidence intervals.
Computing environmentΒΆ
[21]:
%load_ext watermark
%watermark -v -p numpy,scipy,numba,bokeh,iqplot,bebi103,colorcet,jupyterlab
Python implementation: CPython
Python version : 3.8.11
IPython version : 7.26.0
numpy : 1.20.3
scipy : 1.6.2
numba : 0.53.1
bokeh : 2.3.3
iqplot : 0.2.3
bebi103 : 0.1.8
colorcet : 2.0.6
jupyterlab: 3.1.7