Where we will soon be
For the first week of the course, you will be learning some basics of computing with Python. As a result, we are not getting our hands on data right away. Sometimes when learning basic skills you need to do the tasks you really want to do (like data visualization, statistical inference, image processing, all the goodies of this course), you can get frustrated. To help alleviate that possible frustration, I want to give a quick demonstration of what you will be able to do in a few weeks after working through some Python basics and learning the mechanics of data frame manipulation.
Check out the abstract of this paper by Ohnishi, et al., published in Nature Cell Biology in 2014, which I reproduce here.
It is now recognized that extensive expression heterogeneities among cells precede the emergence of lineages in the early mammalian embryo. To establish a map of pluripotent epiblast (EPI) versus primitive endoderm (PrE) lineage segregation within the inner cell mass (ICM) of the mouse blastocyst, we characterized the gene expression profiles of individual ICM cells. Clustering analysis of the transcriptomes of 66 cells demonstrated that initially they are non-distinguishable. Early in the segregation, lineage-specific marker expression exhibited no apparent correlation, and a hierarchical relationship was established only in the late blastocyst. Fgf4 exhibited a bimodal expression at the earliest stage analysed, and in its absence, the differentiation of PrE and EPI was halted, indicating that Fgf4 drives, and is required for, ICM lineage segregation. These data lead us to propose a model where stochastic cell-to-cell expression heterogeneity followed by signal reinforcement underlies ICM lineage segregation by antagonistically separating equivalent cells.
So, they determined that Fgf4 expression precedes differentiation of stem cell lineages. In their study, they performed microarray analysis to investigate expression levels of genes in 101 different cells harvested from wild type and Fgf4ᐨᐟᐨ mouse blastocysts and various times in embryonic development. They deposited the microarray data to the ArrayExpress database here. The data sets are large(ish), consisting of signal detected in over 45,000 probe oligos.
After the next few weeks, you will be able to take that data set, extract the pertinent data, and get a plot like the one below, which shows Fgf4 levels detected in cells of different types with different genetic background at different stages of development.
The left-most (blue) plot shows that even at embryonic day 3.25 of development, before the EPI and PrE cell lines have differentiated, there are two populations of cells with difference Fgf4 levels.
So, I urge you to be patient. You will soon be a ninja with data sets. We need to lay some groundwork first.
A note to those new to programming in Python
The contents of Lesson 3 (all 12 parts of it!) are by far the longest lesson in the course. This first lesson is meant to help get you up to speed with some of the main ideas in Python programming to help make the rest of the class proceed more smoothly. It may take you a while to read/work through the lesson. Please be patient. Also do not worry if you do not have full mastery of everything. The concepts will be reinforced during the course.
A note to Pythonistas and bootcampers
Some of the students in the class are already quite proficient programmers, and some are proficient in the language of instruction of this course, Python. Some of you have taken the Introduction to Programming in the Biological Sciences Bootcamp with me. You may find the lessons of this first week of the course to be review. Former bootcampers will even find some identical text to what we did in the bootcamp. I encourage you to still read carefully and to help your teammates for whom this is new with their work. This will help further cement these foundational skills for you.