Lesson 1a: Where we will soon be

For the first two weeks of the course, you will be learning some basics of computing with Python. As a result, we are not getting our hands on data right away. Sometimes when learning basic skills you need to do the tasks you really want to do (like data visualization, statistical inference, image processing, all the goodies of this course), you can get frustrated. To help alleviate that possible frustration, I want to give a quick demonstration of what you will be able to do in a few weeks after working through some Python basics and learning the mechanics of data frame manipulation.

Check out the abstract of this paper by Ohnishi, et al., published in Nature Cell Biology in 2014. I reproduce the abstract here.

It is now recognized that extensive expression heterogeneities among cells precede the emergence of lineages in the early mammalian embryo. To establish a map of pluripotent epiblast (EPI) versus primitive endoderm (PrE) lineage segregation within the inner cell mass (ICM) of the mouse blastocyst, we characterized the gene expression profiles of individual ICM cells. Clustering analysis of the transcriptomes of 66 cells demonstrated that initially they are non-distinguishable. Early in the segregation, lineage-specific marker expression exhibited no apparent correlation, and a hierarchical relationship was established only in the late blastocyst. Fgf4 exhibited a bimodal expression at the earliest stage analysed, and in its absence, the differentiation of PrE and EPI was halted, indicating that Fgf4 drives, and is required for, ICM lineage segregation. These data lead us to propose a model where stochastic cell-to-cell expression heterogeneity followed by signal reinforcement underlies ICM lineage segregation by antagonistically separating equivalent cells.

So, they determined that Fgf4 expression precedes differentiation of stem cell lineages. In their study, they performed microarray analysis to investigate expression levels of genes in 101 different cells harvested from wild type and Fgf4\(^-/-\) mouse blastocysts and various times in embryonic development. They deposited the microarray data to the ArrayExpress database here. The data sets are large, consisting of signal detected in over 45,000 probe oligos.

After the next few weeks, you will be able to take that data set, extract the pertinent data, and get a plot like the one below, which shows Fgf4 levels detected in cells of different types with different genetic background at different stages of development.

from IPython.display import HTML

Bokeh Plot

The left-most (blue) plot shows that even at embryonic day 3.25 of development, before the EPI and PrE cell lines have differentiated, there are two populations of cells with difference Fgf4 levels.

So, I urge you to be patient. You will soon be a ninja with even large data sets. We need to lay some groundwork first.

A note to Pythonistas

Some of the students in the class are already quite proficient programmers, and some are proficient in the language of instruction of this course, Python. You may find the lessons of these first two weeks of the course to be review. I encourage you to still read carefully and to help your teammates for whom this is new with their work. This will help further cement these foundational skills for you.

Computing environment

%load_ext watermark
%watermark -v -p jupyterlab
CPython 3.7.4
IPython 7.1.1

jupyterlab 1.1.4