R7. Topics in bootstrappingΒΆ

This recitation was prepared by Cecelia Andrews with help from Justin Bois. Some modifications were made by Sanjana Kulkarni.

Law school dataset download

C. elegans egg cross-sectional area dataset download

Phone repair times dataset download


[1]:
# Colab setup ------------------
import os, sys, subprocess
if "google.colab" in sys.modules:
    cmd = "pip install --upgrade iqplot bebi103 colorcet watermark"
    process = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    stdout, stderr = process.communicate()
    data_path = "https://s3.amazonaws.com/bebi103.caltech.edu/data/"
else:
    data_path = "../data/"
# ------------------------------


import pandas as pd
import numpy as np
import numba
import scipy.stats as st
import bebi103

import iqplot
import timeit
import colorcet

import bokeh.io
import bokeh.plotting
import bokeh.layouts
bokeh.io.output_notebook()
/Users/bois/opt/anaconda3/lib/python3.8/site-packages/arviz/__init__.py:317: UserWarning: Trying to register the cmap 'cet_gray' which already exists.
  register_cmap("cet_" + name, cmap=cmap)
/Users/bois/opt/anaconda3/lib/python3.8/site-packages/arviz/__init__.py:317: UserWarning: Trying to register the cmap 'cet_gray_r' which already exists.
  register_cmap("cet_" + name, cmap=cmap)
Loading BokehJS ...

This this recitation, we will cover

  1. Review of bootstrapping concepts from Lesson 16.

  2. BCa Confidence Intervals

  3. Studentized Bootstrap

An excellent reference, if dated, for bootstrapping calculations is Efron and Tibshirani’s An Introduction to the Bootstrap, 1994 (Amazon).

1. ReviewΒΆ

The plug-in principleΒΆ

This was covered in lesson 12.

Simplified definition: Say we have a sample of repeated measurements, and we know there is a generative distribution for these measurements. But, we don’t know what that generative distribution is. We can estimate a parameter, \(\theta = T(F)\), of the generative distribution by computing the statistical functional using the empirical CDF rather than the actual CDF.

Thorough definition:
Say we have a data set of repeated measurements of the lengths of C. elegans eggs. We know there is a generative distribution for these lengths, but we don’t know any parameters of the distribution. How can we estimate a parameter of the distribution?

Call the CDF of our generative distribution \(F(x)\). We want to compute a statistical functional, \(T(F)\), of the CDF. \(T(F)\) could be the mean, variance, median, etc. The parameter \(\theta\) is defined by the functional, so \(\theta = T(F)\).

However, we only have a sample from the generative distribution (our egg length measurements). We can compute the ECDF of our measurements, called \(\hat{F}\). The plug-in principle simply states we can estimate our parameter of interest, \(\theta\), by calculating:

\begin{align} \hat{\theta} &= T(\hat{F}) \end{align}

Essentially, the plug-in principle says we can estimate the parameter \(\theta\) by computing the statistical functional on the empirical CDF rather than the actual CDF.

Confidence IntervalsΒΆ

These were covered in lesson 12.

β€œIf an experiment is repeated over and over again, the estimate I compute for a parameter, \(\hat{\theta}\) , will lie between the bounds of the 95% confidence interval for 95% of the experiments.” This is the frequentist interpretation. The center of the interval is treated as a fixed value (population test statistic), and the bounds are random variables that change based on the simulation.

In Bayesian statistics, the analogous interval is called a credible interval. The center is the random variable, while the bounds are treated as fixed. Therefore, the interpretation is that the population statistic has a 95% chance of lying within a 95% credible interval.

BootstrappingΒΆ

Also covered in Lesson 16.

The basic idea of bootstrapping is to simulate many repetitions of the experiment. It is a type of resampling, in which you draw samples from your sample to generate a bootstrap distribution. For example, if you have a sample of 10 data points, you can samples by randomly choosing from the original set of 10 points, with replacement.

Bootstrapping allows us to get a better plug-in estimate for the statistical functional as if we did the experiment many times. To see the power of bootstrapping in action, let’s consider the example given in Efron and Tibshirani’s An Introduction to the Bootstrap.

The CSV file law_school_data.csv, available here, contains the population of measurements (LSAT, GPA) for N=82 American law schools for the entering classes of 1973. LSAT is the average score of the class on the LSAT (national law exam), and GPA is the average undergraduate GPA. In this example, we have a census, so we know the actual values of the summary statistics we may want to compute. Here we load in the data and view the first 10 observations.

[2]:
df_law = pd.read_csv(os.path.join(data_path, 'law_school_data.csv'))
df_law.head(10)
[2]:
School LSAT GPA
0 1 622 3.23
1 2 542 2.83
2 3 579 3.24
3 4 653 3.12
4 5 606 3.09
5 6 576 3.39
6 7 620 3.10
7 8 615 3.40
8 9 553 2.97
9 10 607 2.91

Now, let’s pretend we are a researcher interested in estimating the mean LSAT and GPA for American law schools. It would be difficult to survey every law school in the USA and collect LSAT and GAP scores of each class, so in practice, researchers collect a sample, a subset of the population, and make inferences about the population. Say we collect a random sample of 15 law schools and want to estimate the population mean from our sample. We will simulate this by drawing a random sample of size n=15:

[3]:
# Set the random seed
rg = np.random.default_rng(seed=3252)

# Draw 15 random samples and make a new dataframe
indices = rg.choice(np.arange(82), replace=False, size=15)
df_sample = df_law.loc[indices]
df_sample
[3]:
School LSAT GPA
21 22 614 3.19
51 52 580 3.07
11 12 596 3.24
17 18 646 3.47
2 3 579 3.24
10 11 558 3.11
33 34 591 3.02
3 4 653 3.12
13 14 581 3.22
36 37 615 3.37
34 35 578 3.03
52 53 594 2.96
16 17 599 3.23
70 71 570 3.01
49 50 555 3.00

How does this sample compare to the population? Let’s compare the population mean LSAT and GPA to the sample means:

[4]:
# Print means
print(f'Sample mean LSAT: {np.mean(df_sample.LSAT)}')
print(f'Sample mean GPA: {np.mean(df_sample.GPA)}')
print(f'Population mean LSAT: {np.mean(df_law.LSAT)}')
print(f'Population mean GPA: {np.mean(df_law.GPA)}')
Sample mean LSAT: 593.9333333333333
Sample mean GPA: 3.1519999999999997
Population mean LSAT: 597.5487804878048
Population mean GPA: 3.1347560975609756

The accuracy of the mean LSAT and GPA of the small sample will vary based on the sample you draw, but they can be fairly far away from the mean of the population such that the standard error does not capture the actual mean.

Let’s compare the ECDFs as well. To overlay the sample and population ECDFs, we will use manually compute the ECDFs of the samples and populations and overlay them on the population plots. This is easier to do since the sample and population are of different sizes.

[5]:
def compute_ecdf(data):
    """Returns ECDF values of data"""
    return np.arange(1, len(data) + 1) / len(data)


# Initialize two bokeh plots using a list comprehension
labels = ["LSAT", "GPA"]
p_LSAT, p_GPA = [
    bokeh.plotting.figure(
        height=300, width=400, x_axis_label=labels[i], y_axis_label="ECDF"
    )
    for i in range(2)
]

# Compute and plot the ECDFs for LSAT
p_LSAT.circle(
    np.sort(df_law.LSAT), compute_ecdf(df_law.LSAT), legend_label="Population"
)

p_LSAT.circle(
    np.sort(df_sample.LSAT),
    compute_ecdf(df_sample.LSAT),
    legend_label="Sample",
    color="darkorange",
    size=6,
)

# Compute and plot the ECDFs for GPA
p_GPA.circle(
    np.sort(df_law.GPA), compute_ecdf(df_law.GPA), legend_label="Population",
)

p_GPA.circle(
    np.sort(df_sample.GPA),
    compute_ecdf(df_sample.GPA),
    legend_label="Sample",
    color="darkorange",
    size=6,
)

p_LSAT.legend.location = p_GPA.legend.location = "top_left"

bokeh.io.show(bokeh.layouts.row(p_LSAT, p_GPA))

The ECDFs match up fairly well, but if we zoom in on the 50th percentile we can see the medians may not overlap, and the means are probably off a bit, too. Let’s see if we can compute a confidence interval through bootstrapping. We’ll use the function draw_bs_sample() to draw bootstrap samples of the same size as our sample (n=15), and compute the means of each to create 1000 replicates. Remember, our bootstrap samples are always the same size as the original sample because we want to simulate repeating the same experiment.

[6]:
@numba.njit
def draw_bs_sample(data):
    """Draw a bootstrap sample from a 1D data set."""
    return np.random.choice(data, size=len(data))


# Number of bootstrap replicates we want to draw
N = 1000

sample_lsat_mean_reps = np.empty(N)
sample_gpa_mean_reps = np.empty(N)

# Turn the sample LSAT and GPA scores into arrays for use with numba
sample_lsat = np.array(df_sample.LSAT)
sample_gpa = np.array(df_sample.GPA)

for i in range(N):
    sample_lsat_mean_reps[i] = np.mean(draw_bs_sample(sample_lsat))
    sample_gpa_mean_reps[i] = np.mean(draw_bs_sample(sample_gpa))

print(f"Sample mean LSAT: {np.mean(sample_lsat)}")
print(f"Sample mean GPA: {np.mean(sample_gpa)}")
print("\n")
print(f"Mean LSAT 95% C.I.: {np.percentile(sample_lsat_mean_reps, [2.5, 97.5])}")
print(f"Mean GPA 95% C.I.: {np.percentile(sample_gpa_mean_reps, [2.5, 97.5])}")
print("\n")
print(f"Population mean LSAT: {np.mean(df_law.LSAT)}")
print(f"Population mean GPA: {np.mean(df_law.GPA)}")
Sample mean LSAT: 593.9333333333333
Sample mean GPA: 3.1519999999999997


Mean LSAT 95% C.I.: [580.99333333 608.26666667]
Mean GPA 95% C.I.: [3.08596667 3.22136667]


Population mean LSAT: 597.5487804878048
Population mean GPA: 3.1347560975609756

The 95% confidence interval captures the true mean for both the LSAT and GPA data. Let’s compare the ECDFs again. We will create 500 bootstrap sample ECDFs and overlay them over the population ECDF. Note: if 500 samples takes too long to render on your machine, you can change the variable n to something like 100. You will still see the same overall effect.

[7]:
# Initialize two bokeh plots using a list comprehension
labels = ["LSAT", "GPA"]
p_LSAT, p_GPA = [
    bokeh.plotting.figure(
        height=300, width=400, x_axis_label=labels[i], y_axis_label="ECDF"
    )
    for i in range(2)
]

n = 500
# Make n bootstrap samples and plot them
for i in range(n):
    lsat = draw_bs_sample(sample_lsat)
    gpa = draw_bs_sample(sample_gpa)
    # Add semitransparent ECDFs to the plot
    p_LSAT.circle(
        np.sort(lsat),
        compute_ecdf(lsat),
        color=colorcet.b_glasbey_category10[0],
        alpha=0.05,
    )
    p_GPA.circle(
        np.sort(gpa),
        compute_ecdf(gpa),
        color=colorcet.b_glasbey_category10[0],
        alpha=0.05,
    )

p_LSAT.circle(
    np.sort(df_law.LSAT),
    compute_ecdf(df_law.LSAT),
    color=colorcet.b_glasbey_category10[1],
    alpha=2,
)
p_GPA.circle(
    np.sort(df_law.GPA),
    compute_ecdf(df_law.GPA),
    color=colorcet.b_glasbey_category10[1],
    alpha=2,
)

bokeh.io.show(bokeh.layouts.row(p_LSAT, p_GPA))

What do you notice about the bootstrapped ECDFs? Consider the approximate shape and mean.

The boostrapped samples are a pretty good approximation of the overall population distribution.

Now that we’ve shown the power of bootstrapping, let’s explore some more bootstrapping methods! First we’ll review pairs bootstrapping, which we learned about in class, then go into some new methods.

Pairs BootstrapΒΆ

Also covered in Lesson 16. See the lesson for a coding example.

We use pairs bootstrapping when we have paired data, and we will often want to compute the correlation. The important thing to remember with pairs bootstrapping is that we draw a random sample of pairs, not 2 random samples of each array. For example, if we have paired data in 2 arrays: \(\mathbf{x} = [x_1, x_2,\ldots, x_n]\) and \(\mathbf{y} = [y_1, y_2,\ldots, y_n]\), we must draw the same indices from both arrays so we end up with paired samples:

\begin{align} &\mathbf{\hat{x}} = [x_i, x_j, x_k,\ldots, x_n], \text{ and}\\[1em] &\mathbf{\hat{y}} = [y_i, y_j, y_k,\ldots, y_n]. \end{align}

Question: Why do we have to keep pairs together to estimate correlation? What result will you likely get for the correlation if your pairs get separated?

2. BCa Confidence IntervalsΒΆ

The BCa confidence interval is the bias-corrected accelerated bootstrap confidence interval. This concept was introduced by Bradley Efron. The BCa interval is designed to automatically create confidence intervals that adjust for underlying higher order effects. In simple terms, it corrects for bias and skew in the distribution of bootstrap estimates.

In this recitation, we will focus on the nonparametric case and show a coding example. If you want to dive into this concept even more, which I highly recommend, take a look at Efron, β€œBetter Bootstrap Confidence Intervals.” Journal of the American Statistical Association 82, no. 397 (1987): 171-85.

Let’s consider the C. elegans egg data example again. We have a dataset of repeated measurements of the areas C. elegans eggs. We know there is a generative distribution for this data, but we don’t know what it is. We want to compute a confidence interval of the mean. For the general case, we’ll call the statistic we want to estimate \(\theta\). As we know, the standard confidence interval is given by:

\begin{align} \theta \in [\hat{\theta} + \hat{\sigma}z^{(\alpha)}, \hat{\theta} + \hat{\sigma}z^{(1 - \alpha)}] \tag{2.1} \end{align}

where \(\hat{\theta}\) is our sample estimate of \(\theta\), \(\hat{\sigma}\) is our estimate of the standard deviation of the distribution of \(\hat{\theta}\), and \(z^{(\alpha)}\) is the \(100 \times \alpha\) percentile point of a standard normal variate.

This standard confidence interval (Equation 2.1) is based on taking the asymptotic normal approximation

\begin{align} (\hat{\theta} - \theta)/\hat{\sigma} \sim N(0, 1) \tag{2.2} \end{align}

In our case, and in many cases, we assume there is a generative distribution, \(g\), for our data. Thus, we can achieve normality and constant standard error by considering the monotone transformation \(\hat{\phi} = g(\hat{\theta})\) and \(\phi = g(\theta)\) instead of \(\hat{\theta}\) and \(\theta\). We assume this transformation is both normalizing and variance stabilizing (defined momentarily). Through this method, we can now compute the confidence interval for \(\phi\) based on:

\begin{align} (\hat{\phi} - \phi)/\tau \sim N(-z_0, 1) \tag{2.3} \end{align}

where \(\tau\) is the constant standard error of \(\hat{\phi}\) and \(z_0\) is the bias constant. To get the confidence interval for \(\theta\), we take the inverse transformation \(\theta = g^{-1}(\phi)\). The great part: all of this is achieved automatically simply through bootstrapping! We do not need to know \(g\) or \(z_0\).

How can we improve on this? That’s where BCa comes in. Like we described previously, through bootstrapping alone we are assuming the transformation \(g\) is both normalizing and variance stabilizing. A variance stabilizing transformation is a transformation such that the variance is no longer related to the mean. For some distributions (ex. Binomial), the variance is dependent on the mean. A variance stabilizing transformation removes this dependence. But, since we do not know the transformation, we do not know for sure that it is variance stabilizing. In the BCa confidence interval, we only need to know that \(g\) is normalizing. This makes it more accurate in many cases!

The BCa incorporates some bias constant, \(z_0\), and some acceleration constant, \(a\), such that the BCa confidence interval is based on:

\begin{align} &(\hat{\phi} - \phi)/\tau \sim N(-z_0\sigma_\phi, \sigma_\phi^2),\\[1em] &\sigma_\phi = 1 + a\phi. \tag{2.4} \end{align}

We can determine \(a\) and \(z_0\) by resampling our data. Here, we are considering the nonparametric case. For the nonparametric case, the acceleration constant for the mean is given by

\begin{align} a = \frac{\sum_i (x_i - \bar{x})^3}{6\left(\sum_i (x_i - \bar{x})^2\right)^{3/2}}. \tag{2.5} \end{align}

We can compute \(z_0\) from our bootstrap replicates. The bias-correction constant, \(z_0\), is:

\begin{align} z_0 = \phi^{-1}(\hat{G}(\hat{\theta})) \tag{2.6} \end{align}

From the plug-in principle, we know that for the nonparametric case we can approximate the CDF \(\hat{G}(\hat{\theta})\) as the ECDF of our bootstrap replicates. So, \(\phi^{-1}(\hat{G}(\hat{\theta}))\) is given by the inverse ECDF of our bootstrap replicates.

This may all be a little confusing now, so let’s practice with the worm egg example! I will review how to compute each parameter as we go.

Let’s load in our worm egg area dataset, digitized from Harvey and Orbidans’s 2011 PLoS One paper.

[8]:
df_worms = pd.read_csv(os.path.join(data_path, 'c_elegans_egg_xa.csv'), comment='#')
df_worms.head()
[8]:
food area (sq. um)
0 high 1683
1 high 2061
2 high 1792
3 high 1852
4 high 2091

And we show an ECDF to get a feel for the data.

[9]:
p = iqplot.ecdf(data=df_worms, q="area (sq. um)", cats="food")

bokeh.io.show(p)

It looks like worms that received more food have smaller eggs. Any hypotheses for why that may be the case?

We’ll convert the data we’re interested into arrays, separated by how much food the worms received:

[10]:
fed_worms = df_worms.loc[df_worms['food'] == 'high']['area (sq. um)'].values
hungry_worms = df_worms.loc[df_worms['food'] == 'low']['area (sq. um)'].values

Here we write some helper functions to calculate the parameters for the BCa interval. We will combine them in our main function, compute_bca_ci().

[11]:
# Define "statfunction" as the function we want to apply to the replicates.


def compute_jackknife_reps(data, statfunction=np.mean):
    """Returns jackknife resampled replicates for
    the given data and statistical function"""
    # Set up empty array to store jackknife replicates
    jack_reps = np.empty(len(data))

    # For each observation in the dataset, compute the statistical function on the sample
    # with that observation removed
    for i in range(len(data)):
        jack_sample = np.delete(data, i)
        jack_reps[i] = statfunction(jack_sample)
    return jack_reps


def compute_a(jack_reps):
    """Returns the acceleration constant a"""
    mean = np.mean(jack_reps)
    return (1 / 6) * np.divide(
        np.sum(mean - jack_reps) ** 3, (np.sum(mean - jack_reps) ** 2) ** (3 / 2)
    )


def bootstrap_replicates(data, n_reps=1000, statfunction=np.mean):
    """Computes n_reps number of bootstrap replicates for
    given data and statistical function"""
    boot_reps = np.empty(n_reps)
    for i in range(n_reps):
        boot_reps[i] = statfunction(draw_bs_sample(data))
    return boot_reps


def compute_z0(data, boot_reps, statfunction=np.mean):
    """Computes z0 for given data and statistical function"""
    s = statfunction(data)
    return st.norm.ppf(np.sum(boot_reps < s) / len(boot_reps))


def compute_bca_ci(data, alpha_level, n_reps=1000, statfunction=np.mean):
    """Returns BCa confidence interval for given data at given alpha level"""
    # Compute bootstrap and jackknife replicates
    boot_reps = bootstrap_replicates(data, n_reps, statfunction)
    jack_reps = compute_jackknife_reps(data, statfunction)

    # Compute a and z0
    a = compute_a(jack_reps)
    z0 = compute_z0(data, boot_reps, statfunction)

    # Compute confidence interval indices
    alphas = np.array([alpha_level / 2.0, 1 - alpha_level / 2.0])
    zs = z0 + st.norm.ppf(alphas).reshape(alphas.shape + (1,) * z0.ndim)
    avals = st.norm.cdf(z0 + zs / (1 - a * zs))
    ints = np.round((len(boot_reps) - 1) * avals)
    ints = np.nan_to_num(ints).astype("int")

    # Compute confidence interval
    boot_reps = np.sort(boot_reps)
    ci_low = boot_reps[ints[0]]
    ci_high = boot_reps[ints[1]]
    return (ci_low, ci_high)

We can now compute the BCa confidence intervals.

[12]:
print(f"Fed worms mean egg area 95% C.I.: {compute_bca_ci(fed_worms, 0.05)}")
print(f"Hungry worms mean egg area 95% C.I.: {compute_bca_ci(hungry_worms, 0.05)}")
Fed worms mean egg area 95% C.I.: (1791.4545454545455, 1869.7954545454545)
Hungry worms mean egg area 95% C.I.: (2068.245614035088, 2162.964912280702)

3. Studentized BootstrapΒΆ

The Studentized bootstrap error aims to reduce coverage error of our resampling. It is a good idea to use Studentized bootstrapping to approximate your statistic of interest when you suspect the distribution of your sample may not be close to the actual population distribution. The Studentized bootstrapping method works because even when this is the case, the distribution of the sample statistic divided by estimated standard error could be close to the actual population statistic divided by the population standard error. This the same principle employed in the Student-t test.

The Studentized bootstrap involves taking inner and outer bootstrap samples. The algorithm is as follows:

  1. Compute \(\hat{\theta}\), the sample test statistic

  2. A large number of times (number of reps β‰ˆ1000), draw a bootstrap sample. Call this the outer sample:

    1. Compute the statistical functional on the sample

    2. Resample your bootstrap sample β‰ˆ50 to 100 times. Call this the inner sample:

      1. Compute the statistical function on the resample

    3. For your inner sample replicates, compute the standard deviation

  3. Compute your statistic of interest from your replicates: t = (outer sample replicates) / (standard deviation of the inner sample replicates)

  4. Compute s = standard deviation of your outer sample replicates

  5. Multiply t by s

  6. Compute the confidence interval from the \(\alpha/2\) and \((1 - \alpha)/2\) percentiles of t*s

Let’s try this out on a dataset of the repair times of landline phones in New York City. The New York Public Utilities Commission monitors the response time for repairing landline phones in the state, and repair times are dependent on the time of year, type of repair, and geographical location. The data here are for one class of repairs at one time period for a specific Incumbent Local Exchange Carrier, which is a telephone company which once held the regional monopoly on landline service.

This times come from a particular time and region of New York City, so we expect them to not be representative of the population of all phone repair times ever recorded in New York City. First, we read in the data (which I obtained from the online textbook for UC Berkeley’s Data 100 class found here).

[13]:
df_times = pd.read_csv(os.path.join(data_path, "ilec.csv"))

p = iqplot.ecdf(data=df_times, q="17.5", x_axis_label="Phone Repair Times")

bokeh.io.show(p)

We see that the distribution is heavily positively skewed, making this a good dataset from which to draw studentized boostrap replicates. For computing the confidence intervals of the means, we’ll want to convert the data of interest into arrays, and take a look at the sample mean:

[14]:
times = np.array(df_times["17.5"])
print(f"Mean average repair time: {np.mean(times)} hours")
Mean average repair time: 8.406145520144317 hours

Now, we’ll employ the Studentized bootstrap to compute the confidence interval of the average landline repair time.

[15]:
def studentized_bootstrap_ci(
    data,
    statfunction=np.mean,
    n_bs_samples=1000,
    n_inner_samples=100,
    ptiles=(2.5, 97.5),
):

    thetas = np.empty(n_bs_samples)
    ts = np.empty(n_bs_samples)

    # Outer sample (regular bootstrap sample)
    for i in range(n_bs_samples):
        sample = draw_bs_sample(data)
        theta = statfunction(sample)
        thetas[i] = theta

        # Inner sample
        resample_stats = np.empty(n_inner_samples)
        for j in range(n_inner_samples):
            resample = draw_bs_sample(sample)
            resample_stats[j] = statfunction(resample)

        # Compute statistic of interest
        ts[i] = np.sum(resample_stats - theta) / np.std(resample_stats)

    ts = np.sort(ts)
    s = np.std(thetas)

    p1, p2 = np.percentile(s * ts, ptiles)
    ci_high = np.mean(data) - (s * p1)
    ci_low = np.mean(data) - (s * p2)

    # Return the confidence interval
    return (ci_low, ci_high)

To time small processes, I use the timeit module, which has a method called default_timer() that initializes a timer in seconds when called. The time is constantly counting up, and you can store the time at any given instant in a variable.

[16]:
# Get the start time before the computation
start = timeit.default_timer()
studentized_ci = studentized_bootstrap_ci(times)

# Get the end time after
end = timeit.default_timer()

print(f"Average Repair Time 95% C.I.: {studentized_ci}")
Average Repair Time 95% C.I.: (6.043505027603063, 10.859360255642642)

We check how long the computation took for 1000 outer bootstrap samples and 100 inner samples. We will also compare this to the time required to get 1000 bootstrap replicates using the percentile method. The studentized method is longer both because of the inner samples and the computations themselves.

[17]:
print(f"Studentized confidence interval took {end-start} seconds to run.")
Studentized confidence interval took 2.3449103959999995 seconds to run.

For comparison, we can compute the non-Studentized confidence intervals.

[18]:
@numba.njit
def draw_bs_reps_mean(data, size=1):
    """Draw boostrap replicates of the mean from 1D data set."""
    out = np.empty(size)
    for i in range(size):
        out[i] = np.mean(draw_bs_sample(data))
    return out

# Get the start time before the computation
start = timeit.default_timer()
times_bs = draw_bs_reps_mean(times, size=N)
times_ci = np.percentile(times_bs, [2.5, 97.5])

# Get the end time after
end = timeit.default_timer()

print(f"Average Repair Time 95% C.I.: {times_ci}")
Average Repair Time 95% C.I.: [7.73403217 9.18199218]
[19]:
print(f"Percentile confidence interval took {end-start} seconds to run.")
Percentile confidence interval took 0.23111390500000084 seconds to run.

The non-Studentized confidence interval is smaller, and we can more easily see the difference in the following plot. We also note that the studentized computation takes longer than the percentile bootstrap even with just 1000 bootstrap samples.

[20]:
percentile_dict = {
    "estimate": np.mean(times),
    "conf_int": times_ci,
    "label": "percentile",
}
studentized_dict = {
    "estimate": np.mean(times),
    "conf_int": studentized_ci,
    "label": "studentized",
}

kwargs = {
    "title": "95% Confidence Intervals",
    "x_axis_label": "Average Phone Repair Time (hours)",
}

p = bebi103.viz.confints(
    [percentile_dict, studentized_dict],
    palette=colorcet.b_glasbey_category10[0],
    **kwargs
)
bokeh.io.show(p)

Studentized confidence intervals are most useful for very small datasets (so less likely to be representative of the population) or datasets that are highly skewed. The extra (inner) sampling attempts to correct for the lack of information with a more rigorous treatment.

The main drawback is that it takes much longer than the percentile bootstrapping method. For datasets that are not skewed or very small, the two methods should give approximately equal confidence intervals. If you like, you can repeat this analysis on the bees and pesticides data in lesson 16. This sample distribution is closer to the population distribution, so the two methods give similar confidence intervals.

Computing environmentΒΆ

[21]:
%load_ext watermark
%watermark -v -p numpy,scipy,numba,bokeh,iqplot,bebi103,colorcet,jupyterlab
Python implementation: CPython
Python version       : 3.8.11
IPython version      : 7.26.0

numpy     : 1.20.3
scipy     : 1.6.2
numba     : 0.53.1
bokeh     : 2.3.3
iqplot    : 0.2.3
bebi103   : 0.1.8
colorcet  : 2.0.6
jupyterlab: 3.1.7