Homework 6.1 [SOLO]: Marginalization by sampling (30 pts)


Recalling the introductory lecture on probability, marginalization is the process by which a variable is removed from a joint distribution. If \(P(\{A_i\}, B)\) is the joint probability of some set of “A” events indexed with \(i\), then we obtain the probability of event \(B\) by marginalizing over the A events. This amounts to summing.

\begin{align} P(B) = \sum_i P(\{A_i\}, B). \end{align}

This also works for probability density functions. If \(f(x_1, x_2)\) is the joint probability density function for \(x_1\) and \(x_2\), then

\begin{align} f(x_1) = \int\mathrm{d}x_2\,f(x_1, x_2). \end{align}

Imagine we have three variables, \(x_1\), \(x_2\), and \(x_3\) that are distributed according to a trivariate Normal.

\begin{align} x_1, x_2, x_3 \sim \text{Norm}(\boldsymbol{\mu}, \mathsf{\Sigma}), \end{align}

where \(\boldsymbol{\mu} = (\mu_1, \mu_2, \mu_3)^\mathsf{T}\) is the trivariate location parameter and

\begin{align} \mathsf{\Sigma} = \begin{pmatrix} \Sigma_{11} & \Sigma_{12} & \Sigma_{13} \\ \Sigma_{12} & \Sigma_{22} & \Sigma_{23} \\ \Sigma_{13} & \Sigma_{23} & \Sigma_{33} \end{pmatrix} \end{align}

is the symmetric positive definite scale parameter, called a covariance matrix.

If we want to know the joint distribution of \(x_1, x_3\), we can do it the “hard” way by directly computing the integral

\begin{align} f(x_1, x_3) = \int\mathrm{d}x_2\,f(x_1, x_2, x_3). \end{align}

You can perform this integration if you like (but you don’t have to). You would find that

\begin{align} x_1, x_3 \sim \text{Norm}\left(\begin{pmatrix}\mu_1, \mu_3\end{pmatrix}, \begin{pmatrix} \Sigma_{11} & \Sigma_{13} \\ \Sigma_{13} & \Sigma_{33} \end{pmatrix}\right). \end{align}

The marginal distribution is Normal again, just with the marginalized variable removed. This is true for a multivariate Normal distribution, but is not generally true. Similarly, if we want the distribution for \(x_1\), we can marginalize out \(x_3\).

\begin{align} f(x_1) = \int\mathrm{d}x_2\int\mathrm{d}x_3\,f(x_1, x_2, x_3). \end{align}

The result is

\begin{align} x_1 \sim \text{Norm}(\mu_1, \Sigma_{11}), \end{align}

though we usually write this in terms of a standard deviation and not variance, where \(\sigma_1^2 = \Sigma_{11}\).

It takes some mathematical grunge to arrive at these results, but we can get a similar result by sampling. Specifically, if you can get samples out of a multivariate distribution, you get samples out of a marginalized distribution by simply ignoring the samples of the variables you are marginalizing out. We will use this fact over and over (and over and over) again in the next term of the class. You will demonstrate it to yourself for the case of the trivariate Normal in this problem.

a) Draw 2,000 samples for \(\mathbf{x} = (x_1, x_2, x_3)\) out of a trivariate Normal distribution. You can refer to the Distribution Explorer for direction on how to do that. For your choice of parameters, use

\begin{align} &\boldsymbol{\mu} = (10, 15, 25)^\mathsf{T},\\[1em] &\mathsf{\Sigma} = \begin{pmatrix} 6 & -7 & -5 \\ -7 & 13 & 11 \\ -5 & 11 & 10 \end{pmatrix}. \end{align}

b) Plot the samples of \(x_1\) as an ECDF. Overlay the CDF of the theoretical distribution for \(f(x_1)\). Do they match?