Choosing likelihoods

In our example of model building for measurements of C. elegans egg lengths, we chose a Normal likelihood for the egg lengths. We did so because the story of the repeated measurements matched that of the Normal distribution via the central limit theorem. When a measurement is the result of many processes, none of which has an enormous variance, the values of the measurement is Normally distributed.

This method of choosing a likelihood amounts to story matching. The idea is that we describe the data generation process with a story. We then find a distribution that describes the outcome of the story.

For example, one might perform a single molecule experiment measuring individual binding events of a ligand to a receptor and record the time between binding events. A possible model story for this is that the binding events are all independent and without memory; that is their timing does not depend on previous binding events. This means that we can model binding events as a Poisson process and are interested in the timing between arrivals (binding events) of the Poisson process. This story matches the story of the Exponential distribution, so we would use it for our likelihood.

The procedure of story matching is an important part of Bayesian modeling. In a great many cases, there exists a well-known, named distribution that matches the story you are using to model the generation of your data. In cases where no such distribution exists, or you do not know about it, you need to derive a PDF or PMF matching your story, which can be challenging. It is therefore well worth the time investment to know about distributions that can be useful in modeling. You should read the contents of the Distribution Explorer to get yourself familiar with named distributions. This will greatly facilitate choosing likelihoods.