(c) 2016 Justin Bois. This work is licensed under a Creative Commons Attribution License CC-BY 4.0. All code contained herein is licensed under an MIT license.
This tutorial was generated from a Jupyter notebook. You can download the notebook here.
import numpy as np
import scipy.integrate
import scipy.optimize
import scipy.stats
import pandas as pd
import pymc3
import theano
import emcee
# Import pyplot for plotting
import matplotlib.pyplot as plt
# Import corner for plotting MCMC results
import corner
# BE/Bi 103 utilities
import bebi103
# Some pretty Seaborn settings
import seaborn as sns
rc={'lines.linewidth': 2, 'axes.labelsize': 14, 'axes.titlesize': 14}
sns.set(rc=rc)
# Make Matplotlib plots appear inline
%matplotlib inline
# Import Bokeh modules for interactive plotting
import bokeh.io as bi
import bokeh.plotting as bp
# Set up Bokeh for inline viewing
bi.output_notebook()
To give you an idea of what we would like in your homework responses, I show in this tutorial with an short homework problem. The problem has both a theoretical part and a practical part to demonstrate how we would like each done. (Note, we will not have many theoretical problems in the class, but I just wanted to show an example here for the few we will have.) It is important to show and discuss all reasoning and code that goes into the solution.
A quick note: Don't freak out that you don't understand everything in this problem. You will by the end of the course. This is just to show what kinds of solutions we expect on the homework.
Peter and Rosemary Grant have been working on the Galápagos island of Daphne for over forty years. During this time, they have collected lots and lots of data about physiological features of finches. Last year, they published a book with a summary of some of their major results (Grant P. R., Grant B. R., Data from: 40 years of evolution. Darwin's finches on Daphne Major Island, Princeton University Press, 2014). They made their data from the book publicly available via the Dryad Digital Repository.
We will investigate their data on the heritability of beak depth (the distance, top to bottom, of a closed beak) in the ground finch Geospiza fortis. The data set consists of the maternal beak depth, the paternal beak depth, and the mean beak depth of their offspring. You can download the data set, which I got from Dryad and tidied up into a CSV file, here.
a) The narrow sense heritability, denoted $h^2$, of a trait is defined as the ratio of the additive genetic variance in offspring to the total phenotypic variance in the parent. The additive genetic variance of a population describes the variability in a trait resulting from the sum of all effects of the alleles. In other words, $h^2$ is the ratio of the parent-offspring covariance in the trait value to the variance in the parents.
\begin{align} h^2 = \frac{\sigma_{po}^2}{\sigma_p^2}, \end{align}where the subscripts $o$ and $p$ indicate offspring and parents, respectively.
In practice, $h^2$ is computed by plotting the average beak depth of the offspring against the average beak depth of the two parents, and then preforming a linear regression, with $h^2$ being given by the slope. Show that this gives $h^2$ as we have defined it.
Note: in this analysis, we are neglecting confounding issues and just assuming the environment is unchanging and that the mean trait values we are using for the parents and the offspring is indicative of the differences.
b) Plot the mean parental beak depth versus the mean offspring beak depth, including a linear regression. Report the regression parameters with error bars.
a) We are first asked to prove that $h^2$ is the slope of a linear regression line between of the plot of parental beak depth versus offspring beak depth. We are actually seeking to prove something more general: the slope of a linear regression between two variables, say $x$ and $y$, is the ratio of the covariance to the variance in the $x$-variable. To prove, this we set up the linear regression, as in class, by writing the posterior probability distribution. Beginning with Bayes's theorem,
\begin{align} P(m, b, \sigma \mid D) = \frac{P(D\mid m, b, \sigma)\,P(m, b, \sigma)}{P(D)}. \end{align}We assume the data follow $y = mx + b$ and that the errors are Gaussian distributed, all with the same error bar, $\sigma$. This gives a likelihood of
\begin{align} P(D\mid m, b, \sigma) = \prod_{i\in D} \frac{1}{\sqrt{2\pi\sigma^2}}\, \exp\left\{-\frac{(y_i - m x_i - b)^2}{2\sigma^2}\right\}. \end{align}We assume a Jeffreys prior for $\sigma$, and an uninformative prior for the linear regression parameters $m$ and $b$ as discussed in class.
\begin{align} P(m, b, \sigma) \propto \frac{1}{\sigma(1 + m^2)^{\frac{3}{2}}}. \end{align}The log posterior is then
\begin{align} \ln P(m, b, \sigma \mid D) \propto -(n + 1) \ln \sigma - \frac{3}{2}\,\ln\left(1 + m^2\right) - \frac{1}{2\sigma^2}\sum_{i\in D} \left(y_i - m x_i - b\right)^2, \end{align}where $n = |D|$.
To find the most probable slope and intercept, we differentiate the log posterior and find where the gradient is zero.
\begin{align} \frac{\partial}{\partial m}\,\ln P(m, b, \sigma \mid D) &= -\frac{3m}{1+m^2} + 2\sum_{i\in D} \left( y_i - m x_i - b\right)x_i = 0, \\[1em] \frac{\partial}{\partial b}\,\ln P(m, b, \sigma \mid D) &= 2\sum_{i\in D} \left( y_i - m x_i - b\right) = 0. \end{align}We can multiply the each equation by $1/2n$ to get
\begin{align} -\frac{3m}{2n\left(1+m^2\right)} + \langle xy \rangle - m\langle x^2 \rangle - b\langle x \rangle &= 0,\\[1em] \frac{1}{n}\sum_{i\in D} \left( y_i - m x_i - b\right) = \langle y\rangle - m\langle x\rangle - b &= 0. \end{align}We solve the bottom equation for $b$
\begin{align} b = \langle y \rangle - m \langle x \rangle, \end{align}which we then insert into the top equation, giving
\begin{align} -\frac{3m}{2n\left(1+m^2\right)} + \langle xy \rangle - m\langle x^2 \rangle - \left(\langle y \rangle - m \langle x\right\rangle)\langle x \rangle &= 0. \end{align}This can be rearranged to give
\begin{align} m = \frac{\langle xy \rangle - \langle x \rangle\langle y\rangle}{\left\langle x^2 - \langle x \rangle^2\right\rangle} -\frac{3m}{2n\left(1+m^2\right)\left\langle x^2 - \langle x\rangle^2\right\rangle}. \end{align}This is a cubic equation in $m$, for which an analytical solution exists, but is messy. Instead, we make the approximation that if $n$ is large and the covariance is not too small, the second term is negligible, which is equivalent to assuming the prior for $m$ is uniform, giving
\begin{align} m \approx \frac{\langle xy\rangle - \langle x\rangle \langle y \rangle}{\left\langle x^2 - \langle x \rangle^2\right\rangle}. \end{align}We recognize the numerator as the sample covariance between $x$ and $y$ and the denominator as the sample variance of $x$. Thus, we have proven what we set out to prove.
Interestingly, we found that if we want to compute $h^2$ by linear regression, we should assume a uniform prior on the slope.
b) We start by loading in the data set and adjusting column headings for convenience.
# Read data
df = pd.read_csv('data/grant_and_grant_2014.csv')
# Rename columns
df.columns = ('offspring_bd', 'male_bd', 'female_bd')
We will need the mean parental beak depth for our regression, so we create a new column in the DataFrame
that has that.
# Make a new column with the mean of the male and female parent columns
df['parent_bd'] = (df.male_bd + df.female_bd) / 2
We'll first plot the data to see what we are dealing with. We make the plot interactive with Bokeh to enable zooming. (Note: In your homeworks, you can use Matplotlib; I'm just using Bokeh here for show, and because I like it.)
p = bp.figure(width=500, height=400, x_axis_label='parental beak depth(mm)',
y_axis_label='offspring beak depth (mm)',
tools='pan,wheel_zoom,box_zoom,resize,reset')
p.circle(df['parent_bd'], df['offspring_bd'])
bi.show(p)
By eye, we see correlation between the parents and the offspring. To find the most probable parameter values for the slope and intercept, we can be cute and use the result derived above and just compute the sample covariance matrix.
# Compute covariance matrix
cov = df.cov()
# Compute the regression line
h_2 = cov['parent_bd']['offspring_bd'] / cov['parent_bd']['parent_bd']
intercept = df.offspring_bd.mean() - h_2 * df.parent_bd.mean()
# Print the results to the screen
print("""
heritability = {0:.2f}
intercept = {1:.2f} mm
""".format(h_2, intercept))
To get error bars and to do a more complete analysis, we will sample the posterior distribution using Markov chain Monte Carlo. We will use the emcee
package, so we only need to define a function for the unnormalized log of the posterior. (The prior and likelihood are both normalized.) It is the product of Gaussians of data points with a Jeffreys prior.
def log_prior(p, pmin, pmax):
"""
Log of prior for linear regression with unknown constant
error bars on measurments.
Parameters
----------
p : array_like, shape (3,)
Array of parameters `slope`, `intercept`, and `sigma`
pmin : array_like, shape (3,)
Array of minimum values for parameters `slope`,
`intercept`, and `sigma`
pmax : array_like, shape (3,)
Array of maximum values for parameters `slope`,
`intercept`, and `sigma`
Returns
-------
output : float
Log of prior probability with uniform priors on slope and
intercept and Jeffreys prior on `sigma`.
"""
if np.any(p < pmin) or np.any(p > pmax):
return -np.inf
return -np.sum(np.log(pmax[:2] - pmin[:2])) \
- np.log(p[-1] * np.log(pmax[-1] / pmin[-1]))
def log_likelihood(p, x, y):
"""
Log of likelihood for linear regression with unknown constant
error bars on measurments.
Parameters
----------
p : array_like, shape (3,)
Array of parameters `slope`, `intercept`, and `sigma`
x : array_like
Array of x-values
y : array_like
Array of y-values
Returns
-------
output : float
Log of Gaussian likelihood.
"""
# Unpack parameters
slope, intercept, sigma = p
return -len(x) * np.log(sigma) \
- np.sum((y - slope * x - intercept)**2) / 2.0 / sigma**2
def log_posterior(p, pmin, pmax, x, y):
"""
Log of posterior for linear regression with unknown constant
error bars on measurments.
Parameters
----------
p : array_like, shape (3,)
Array of parameters `slope`, `intercept`, and `sigma`
pmin : array_like, shape (3,)
Array of minimum values for parameters `slope`,
`intercept`, and `sigma`
pmax : array_like, shape (3,)
Array of maximum values for parameters `slope`,
`intercept`, and `sigma`
x : array_like
Array of x-values
y : array_like
Array of y-values
Returns
-------
output : float
Log of posterior probability with Jeffreys prior on `sigma`.
"""
lp = log_prior(p, pmin, pmax)
if lp == -np.inf:
return lp
return lp + log_likelihood(p, x, y)
Now we set the maximum and minimum values for the parameters.
# Min and max slope, intercept, sigma
pmin = np.array([-100, -100, 1e-6])
pmax = np.array([100, 100, 1e6])
Now that we have defined these functions, we can proceed to set up the parameters for the MCMC calculation.
# Specify paremeters for emcee
n_dim = 3 # number of parameters in the model
n_walkers = 10 # number of MCMC walkers
n_burn = 1000 # "burn-in" period to let chains stabilize
n_steps = 5000 # number of MCMC steps to take after burn-in
# Seed random number generator for reproducibility
np.random.seed(42)
# Generate random starting points for walkers.
# p0[i,j] is the starting point for walk i along variable j.
p0 = np.empty((n_walkers, n_dim))
p0[:,0] = np.random.uniform(-1.0, 1.0, n_walkers) # slope
p0[:,1] = np.random.uniform(-10.0, 10.0, n_walkers) # intercept
p0[:,2] = np.random.exponential(1.0, n_walkers) # sigma
We can now run the sampler!
# Arguments passed to log posterior
args = (pmin, pmax, df.parent_bd.values, df.offspring_bd.values)
# Set up the EnsembleSampler instance
sampler = emcee.EnsembleSampler(n_walkers, n_dim, log_posterior, args=args)
# Do burn-in
pos, prob, state = sampler.run_mcmc(p0, n_burn, storechain=False)
# Sample again, starting from end burn-in state
_ = sampler.run_mcmc(pos, n_steps)
Now that we have the MCMC traces, we can compute the mode/HPD. We'll use the hpd()
function given in class.
# Compute HPD credible regions
herit_hpd = bebi103.hpd(sampler.flatchain[:,0], 0.95)
herit_mode = sampler.flatchain[:,0][np.argmax(sampler.flatlnprobability)]
intercept_mode = sampler.flatchain[:,1][np.argmax(sampler.flatlnprobability)]
# Print the results
print("""
heritability 95% HPD with mode
------------------------------
{:.2f} {:.2f} {:.2f}
""".format(herit_hpd[0], herit_mode, herit_hpd[1]))
The mode is, as expected, the same result we got with our calculation based on the covariance of the data. We can use our sample trace to compute the 95% credible region on the linear regression line.
# Extract slopes and intercepts
slope, intercept = sampler.flatchain[:,0], sampler.flatchain[:,1]
# Get sorted data
x = np.sort(df.parent_bd.values)
# Get array of possible values of regression line
reg_vals = np.outer(slope, x) + np.outer(intercept, np.ones_like(x))
# Compute upper and lower bound for 95% credible region for each datum
lower, upper = np.percentile(reg_vals, (2.5, 97.5), axis=0)
# Make a line of the most probably regression line for plotting
x_most_prob = np.array([x.min(), x.max()])
y_most_prob = herit_mode * x_most_prob + intercept_mode
# Build plot
p = bp.figure(width=500, height=400, x_axis_label='parental beak depth(mm)',
y_axis_label='offspring beak depth (mm)',
tools='pan,wheel_zoom,box_zoom,resize,reset')
p.patch(np.concatenate((x, x[::-1])), np.concatenate((upper, lower[::-1])),
color='lightgray')
p.line(x_most_prob, y_most_prob, line_width=4, color='gray')
p.circle(df['parent_bd'], df['offspring_bd'])
bi.show(p)
And that should do it for the homework solution. Notice that all logic was clearly presented, all code was explained, and the graphics were clearly presented. I chose to use Bokeh for interactive graphics because there were enough data points that it is worth zooming in and out.
For your edification, I provide some auxiliary material for this problem. This would be welcomed in a homework submission, but is not necessary.
For fun, let's repeat our MCMC analysis using PyMC3 and its fast no U-turn (NUTS) sampler. This sampler works well for very large problems (ours is small, only three parameters).
PyMC3 is flexible in that you can rapidly define your models with built-in functions, especially if you are using standard probability distributions. It is fast because it uses Theano to do the sampling using compiled code. The use of Theano does make PyMC3 a little bit more difficult to use than emcee
, since you are restricted to Theano's built-in operators (though there are some ways around that).
Anyhow, the code below does MCMC sampling of the posterior for the linear regression using PyMC3.
# Make custom Jeffreys prior, returns log prior
def jeffreys(value):
return -theano.tensor.log(value)
# Instantial model
basic_model = pymc3.Model()
with basic_model:
# Priors for unknown model parameters
sigma = pymc3.DensityDist('sigma', jeffreys, testval=1)
slope = pymc3.Uniform('slope', lower=-1, upper=1, testval=0.5)
intercept = pymc3.Uniform('intercept', lower=-10, upper=10, testval=0)
# Expected value of outcome
mu = slope * df.parent_bd + intercept
# Likelihood (sampling distribution) of observations
y_obs = pymc3.Normal('y_obs', mu=mu, sd=sigma, observed=df.offspring_bd)
# obtain starting values via MAP
start = pymc3.find_MAP(fmin=scipy.optimize.fmin_powell)
# instantiate sampler
step = pymc3.NUTS()
# draw 1000 posterior samples for burn-in
trace = pymc3.sample(1000, step, progressbar=False, njobs=None)
# Start next run of 5000 samples at the last sampled position.
step = pymc3.NUTS(scaling=trace[-1])
trace = pymc3.sample(5000, step, start=trace[-1], njobs=None)
We can use the pymc3.summary
function to print out the results.
# Print summary of MCMC results
pymc3.summary(trace, ['slope', 'intercept', 'sigma'])
We see that our mean heredity is 0.72, with an HPD that is the same as we got with emcee
.
We may also think of the experimental data as being sampled out of a two-dimensional distribution. If we approximate this distribution as Gaussian, we can make a plot of the distribution with the sample points. This is a nice way of looking at the data and visualizing the heredity (indicated by the covariance of the 2D Gaussian).
# Set up array of mu's and covariance matrix (already calc'd)
mu = np.array([df.parent_bd.mean(), df.offspring_bd.mean()])
cov_mat = np.array(
[[cov.parent_bd['parent_bd'], cov.parent_bd['offspring_bd']],
[cov.parent_bd['offspring_bd'], cov.offspring_bd['offspring_bd']]])
# Set up sampling for plotting
x = np.linspace(7, 12, 100)
y = np.linspace(7, 12, 100)
xx, yy = np.meshgrid(x, y)
pos = np.empty(xx.shape + (2,))
pos[:, :, 0] = xx
pos[:, :, 1] = yy
# Instantiate multivariate normal instance to get the PDF for plotting
rv = scipy.stats.multivariate_normal(mu, cov_mat)
# Generate the plot
plt.contourf(x, y, rv.pdf(pos), cmap=plt.cm.Blues, alpha=0.5)
plt.contour(x, y, rv.pdf(pos), colors='k')
plt.plot(df.parent_bd, df.offspring_bd, 'k.', markersize=4)
plt.xlabel('parental beak depth (mm)')
plt.ylabel('offspring beak depth (mm)')
# Set the aspect ratio to be one-to-one for apples-to-apples comparison
plt.axes().set_aspect('equal')
In light of the above plot, we may wish to take a different approach. Suppose instead that we assume the offspring and parental beak depth are drawn from a two-dimensional Gaussian distribution, and we wish to estimate the parameters of this distribution. The parameters are the mean and standard deviation parental beak width, $\mu_p$ and $\sigma_p$, the mean and standard deviation offspring beak width, $\mu_o$ and $\sigma_o$, and the square root of the covariance between the two, $\sigma_{po}$. Couched in term of Bayes's theorem, we wish to compute the posterior distribution
\begin{align} P(\mu_p, \mu_o, \sigma_p, \sigma_o, \sigma_{po} \mid D) = \frac{P(D \mid \mu_p, \mu_o, \sigma_p, \sigma_o, \sigma_{po})\,P(\mu_p, \mu_o, \sigma_p, \sigma_o, \sigma_{po})}{P(D)}. \end{align}We take our likelihood to be a bivariate Gaussian of independent measurements.
\begin{align} P(D \mid \mu_p, \mu_o, \sigma_p, \sigma_o, \sigma_{po}) = \prod_{i\in D} \frac{1}{2\pi \det (\sigma^2)} \exp\left\{-\frac{1}{2}\,\mathbf{x}^\mathsf{T}\cdot \sigma^{-2} \cdot \mathbf{x}\right\}, \end{align}where
\begin{align} \mathbf{x} &= \begin{pmatrix} p_i - \mu_p \\ o_i - \mu_o \end{pmatrix}, \\[1em] \text{and } \sigma^2 &= \begin{pmatrix} \sigma_p^2 & \sigma_{po} \\ \sigma_{po} & \sigma_o^2 \end{pmatrix}. \end{align}Here, $p_i$ and $o_i$ are respectively the parental and offspring beak depth for sample $i$. For the prior, we assume Jeffreys priors on all standard deviations and uniform priors on $\mu_o$ and $\mu_p$. We can code this posterior up to use with emcee
.
For the purposes of coding up the bivariate Gaussian, it is useful to write the log likelihood by explicitly inverting the covariance matrix and evaluating the sum of quadratic forms in the exponential.
\begin{align} &\ln P(D \mid \mu_p, \mu_o, \sigma_p, \sigma_o, \sigma_{po}) = \text{constant} - n\ln \left(\sigma_p^2 \sigma_o^2 - \sigma_{po}^2\right) \\ &\;\;\;\;-\frac{n}{2(\sigma_p^2 \sigma_o^2 - \sigma_{po}^2)}\left[\left\langle (p - \mu_p)^2\right \rangle\sigma_o^2 + \left\langle (o - \mu_o)^2\right \rangle\sigma_p^2 - 2 \left\langle (o - \mu_o)(p - \mu_p)\right \rangle \sigma_{po}\right]. \end{align}def log_posterior_bigauss(p, parent_bd, offspring_bd):
"""
Log posterior for model of beak width with a bivariate Gaussian.
"""
# Disallow negative parameter values (all are impossible)
if np.any(p <= 0):
return -np.inf
# Unpack parameters
mu_p, mu_o, sigma_p, sigma_o, sigma_po = p
# Compute determinant of covariance matrix
det = (sigma_p * sigma_o)**2 - sigma_po**4
# Matrix must be positive definite
if det <= 0:
return -np.inf
# Compute means
p_var = ((parent_bd - mu_p)**2).mean()
o_var = ((offspring_bd - mu_o)**2).mean()
covar = ((parent_bd - mu_p) * (offspring_bd - mu_o)).mean()
# Number of data points
n = len(parent_bd)
# Compute log prior
log_prior = -np.log(sigma_p * sigma_o * sigma_po)
# Return log likelihood
log_like = -n * np.log(det) - n / 2 / det * (p_var * sigma_o**2
+ o_var * sigma_p**2
- 2 * covar * sigma_po**2)
return log_like + log_prior
With the log posterior in hand, we can set up our MCMC calculation.
# Specify paremeters for emcee
n_dim = 5 # number of parameters in the model
n_walkers = 10 # number of MCMC walkers
n_burn = 5000 # "burn-in" period to let chains stabilize
n_steps = 5000 # number of MCMC steps to take after burn-in
# Seed random number generator for reproducibility
np.random.seed(42)
# Generate random starting points for walkers.
p0 = np.empty((n_walkers, n_dim))
p0[:,0] = np.random.uniform(9, 10, n_walkers) # mu_p
p0[:,1] = np.random.uniform(9, 10, n_walkers) # mu_o
p0[:,2] = np.random.uniform(0.3, 0.5, n_walkers) # sigma_p
p0[:,3] = np.random.uniform(0.3, 0.5, n_walkers) # sigma_o
p0[:,4] = np.random.uniform(0.3, 0.5, n_walkers) # sigma_op
# Set up the EnsembleSampler instance
sampler_bigauss = emcee.EnsembleSampler(n_walkers, n_dim, log_posterior_bigauss,
args=(df.parent_bd, df.offspring_bd))
# Do burn-in
pos, prob, state = sampler_bigauss.run_mcmc(p0, n_burn, storechain=False)
# Sample again, starting from end burn-in state
_ = sampler_bigauss.run_mcmc(pos, n_steps)
We can extract the most probable values and 95% HPDs from the trace.
# Get index of parameter values with highest posterior probability
ind = np.argmax(sampler_bigauss.flatlnprobability)
# Get modes and hpds
modes = tuple([sampler_bigauss.flatchain[ind, i] for i in range(5)])
hpds = np.array([bebi103.hpd(sampler_bigauss.flatchain[:,i], 0.95)
for i in range(5)])
fmt = tuple(hpds[:,0]) + modes + tuple(hpds[:,1])
print("""
95% HPDs with modes (all units mm)
----------------------------------
µ_p : [{0:.2f} {5:.2f} {10:.2f}]
µ_o : [{1:.2f} {6:.2f} {11:.2f}]
σ_p : [{2:.2f} {7:.2f} {12:.2f}]
σ_o : [{3:.2f} {8:.2f} {13:.2f}]
σ_op : [{4:.2f} {9:.2f} {14:.2f}]
""".format(*fmt))
We are really after $P(h^2\mid D)$. We can easily compute this from the posterior sample. First, we'll compute the mode and HPD. For comparison to a Gaussian distribution later, we will also compute the mean and standard deviation of the distribution.
# Construct samples of heritability from sample of covariance matrix
herit_samples = sampler_bigauss.flatchain[:,4]**2 \
/ sampler_bigauss.flatchain[:,2]**2
# Compute the mode and HPD
mode = herit_samples[np.argmax(sampler_bigauss.flatlnprobability)]
herit_hpd = bebi103.hpd(herit_samples, 0.95)
# Print the mode and HPD to the screen
print("""
Heritability mode and 95% HPD
-----------------------------
{0:.2f} {1:.2f} {2:.2f}
""".format(herit_hpd[0], mode, herit_hpd[1]))
# Compute the mean and standard deviation
herit_mean = herit_samples.mean()
herit_std = herit_samples.std()
# Print to the screen
print("""
Heritability mean and standard deviation
----------------------------------------
{0:.2f} ± {1:.2f}
""".format(herit_mean, herit_std))
To plot the distribution $P(h^2\mid D)$, we will perform kernel density estimation with automatic bandwidth selection to avoid binning effects. We also plot the Gaussian distribution with the same mean and variance for comparison.
# Compute kernel
kde = scipy.stats.gaussian_kde(herit_samples, bw_method='scott')
# Evaluate kernel
h2 = np.linspace(0.5, 0.9, 500)
p = kde.evaluate(h2)
# Compute Gaussian
rv = scipy.stats.norm(herit_mean, herit_std)
p_gauss = rv.pdf(h2)
# Plot distributions
plt.plot(h2, p)
plt.plot(h2, p_gauss, '--')
plt.margins(0.02)
plt.xlabel('$h^2$', fontsize=18)
plt.ylabel('$P(h^2 \!\mid D)$')
plt.legend(('MCMC', 'approx. Gaussian'), loc='upper left')
The distribution is approximately Gaussian, just a little sharper.
Note that while the most probable $h^2$ is the same we achieved via linear regression, the 95% HPD region is tighter. These are two different means of estimating the parameters. The models are fundamentally different. In the linear regression, we assume that the parental beak depths are measured without error and that the error in the measured beak depth of the offspring is Gaussian distributed with the true values of the beak depths falling exactly on a line. We proved analytically that finding the most probable heritability is the same as performing a regression to get the slope of a line. But the posterior distribution we are calculating there is not really what we want because this model is kind of absurd. There is no reason to expect that the beak depths would be exactly linearly related to the parents.
In the present model, we just assume that we are taking samples of beak depths that are Gaussian distributed and possibly correlated. We make no distinction between measurement errors and the variability throughout the population. If the variability throughout the population is much bigger than the measurement errors, the measurement error can be neglected. This is much more satisfying to me; it makes no pretense that the points should lie along a line.