{"cells": [{"cell_type": "markdown", "metadata": {}, "source": ["# Homework 2.1: Exponential conjugate prior (35 pts)\n", "\n", "**[Data set download](http://www.ncbi.nlm.nih.gov/sviewer/viewer.fcgi?tool=portal&sendto=on&log$=seqview&db=nuccore&dopt=fasta&val=CP042865.1)**\n", "\n", "*This problem was motivated by discussion in section 2.10 of [Holmes and Huber's book](http://web.stanford.edu/class/bios221/book/).*\n", "\n", "
"]}, {"cell_type": "markdown", "metadata": {}, "source": ["**a)** Show that the conjugate distribution for the Exponential distribution is the Gamma distribution. How are the parameters of the Gamma updated from the prior to the posterior? (*The course policies may be ambiguous here. To clarify, you are welcome to look at [Wikipedia's table of conjugate priors](https://en.wikipedia.org/wiki/Conjugate_prior) to check your answer, but you should not look up the actual proof that the Gamma distribution is conjugate to the Exponential.*)\n", "\n", "**b)** Download the sequence of the chromosomal DNA of *E coli* strain ATCC BAA-196 [here](http://www.ncbi.nlm.nih.gov/sviewer/viewer.fcgi?tool=portal&sendto=on&log$=seqview&db=nuccore&dopt=fasta&val=CP042865.1) in FASTA format. This strain is resistant to multiple drugs and is used in studies of antibiotic resistance. The sequence was published in [this paper](https://doi.org/10.1128/MRA.01118-19). Read the sequence in as a single string. Recall from [Homework 2.2 of last term](http://bebi103.caltech.edu.s3-website-us-east-1.amazonaws.com/2019a/content/homework/hw2/hw2.2.html) that you wrote a function to read in a sequence from a FASTA file containing a single record. You can use the function from the solutions to that homework, shown below, if you wish."]}, {"cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": ["def read_fasta_single_record(filename):\n", " \"\"\"Read a sequence in from a FASTA file containing a single sequence.\n", " \n", " We assume that the first line of the file is the descriptor and all\n", " subsequent lines are sequence. \n", " \"\"\"\n", " with open(filename, 'r') as f:\n", " # Read in descriptor\n", " descriptor = f.readline().rstrip()\n", "\n", " # Read in sequence, stripping the whitespace from each line\n", " seq = ''\n", " line = f.readline().rstrip()\n", " while line != '':\n", " seq += line\n", " line = f.readline().rstrip()\n", " \n", " return descriptor, seq\n"]}, {"cell_type": "markdown", "metadata": {}, "source": ["**c)** Find the index in the sequence of all [Shine-Delgarno motifs](https://en.wikipedia.org/wiki/Shine-Dalgarno_sequence), which is an important motif in initiation of protein synthesis. The Shine-Delgarno sequence is `AGGAGGT`. To find the sequences, you can repurpose the function your wrote in [Homework 2.2 of last term](http://bebi103.caltech.edu.s3-website-us-east-1.amazonaws.com/2019a/content/homework/hw2/hw2.2.html) to find recognition sites for restriction enzymes. Below is a function from the homework solutions."]}, {"cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": ["import re\n", "\n", "\n", "def restriction_sites_with_re(seq, recog_seq):\n", " \"\"\"Find the indices of all restriction sites in a sequence.\"\"\"\n", " sites = []\n", " for site in re.finditer(recog_seq, seq):\n", " sites.append(site.start())\n", " \n", " return sites"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Store the number of bases between each occurrence of the motif as an array.\n", "\n", "**d)** We will model the distance between each motif as Exponentially distributed. Explain why this may be a reasonable model.\n", "\n", "**e)** Make an exploratory plot of the ECDF of inter-motif distances. Does it look Exponential?\n", "\n", "**f)** Use an Exponential likelihood and a Gamma prior distribution to compute and plot the posterior distribution of the rate parameter (the inverse of the characteristic distance between motifs) of the Exponential distribution."]}, {"cell_type": "markdown", "metadata": {}, "source": ["
"]}], "metadata": {"anaconda-cloud": {}, "kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6"}}, "nbformat": 4, "nbformat_minor": 4}