# BE/Bi 103 a: Introduction to Data Analysis in the Biological Sciences

Modern biology is a quantitative science, and biological scientists need to be equipped with tools to analyze quantitative data. This course takes a hands-on approach to developing these tools. Together, we will analyze real data. We will learn how to organize, preserve, and share data sets, create informative interactive graphical displays of data, process images to extract actionable data, and perform basic resampling-based statistical inferences.

Importantly, biological data is often “messy” and there is no one right way to perform an analysis or make a plot. As we work with data, we will discuss various approaches to get a feel for the art of biological data analysis.

The sequel to this course goes deeper into statistical modeling, mostly from a Bayesian perspective. This course is foundational for that and further studies in analysis of biological data.

If you are enrolled in the course, please read the Course policies below. We will not go over them in detail in class, and it is your responsibility to understand them.

## Useful links

Ed (used for course communications)

Homework solutions (password protected)

## People

Instructor

Justin Bois (bois at caltech dot edu)

TAs

Victoria Chen (

`vichen AT caltech DOT edu`

)Rosita Fu (

`rfu AT caltech DOT edu`

)Nastya Grebin (

`agrebin AT caltech DOT edu`

)Matteo Guareschi (

`mmguar AT caltech DOT edu`

)Rashi Jeeda (

`rjeeda AT caltech DOT edu`

)David Larios (

`dalarios AT caltech DOT edu`

)

- 0. Preparing computing resources for the course
- 1. The cycle of science
- 2. Version control with Git
- 3. Introduction to Python
- E1. To be completed after lesson 3
- 4. Style
- 5. Test-driven development
- 6. Exploratory data analysis, part 1
- E2. To be completed after lesson 6
- 7. Exploratory data analysis, part 2
- E3. To be completed after lesson 7
- 8. Data file formats
- 9. Data storage and sharing
- 10. Data wrangling
- E4. To be completed after lesson 10
- 11. Intro to probability
- E5. To be completed after lesson 11
- 12. Overplotting
- 13. Dashboards
- 14. Plug-in estimates and confidence intervals
- 15. Random number generation
- 16. Probability distributions
- E6. To be completed after lesson 16
- 17. Null hypothesis significance testing
- 18. Nonparametric inference with hacker stats
- E7. To be completed after lesson 18
- 19. Parametric inference
- 20. Maximum likelihood estimation
- E8. To be completed after lesson 20
- 21. Model assessment
- 22. Regression
- E9. To be completed after lesson 22
- 23. Reproducible workflows
- 24. The paper of the future
- 25. Mixture models
- 26. Implementation of model assessment
- E10. To be completed after lesson 26
- 27. Statistical watchouts

- R1. The command line
- R2. Git/Github tips and traps
- R3. Time series and data smoothing
- R4. Manipulating data frames
- R5. Probability review
- R6. Intro to image processing
- R7. Topics in bootstrapping
- R8. Review of maximum likelihood estimation
- R9. Wild and residual bootstrap
- R10. Packaging and package management

- 0. Configuring your team
- 1. Practice with Python
- 2. Practice with Numpy and plotting
- 3. Exploratory data analysis I
- 4. Exploratory data analysis II
- 5. Dashboards
- 6. Random number generation and probability distributions
- 7. Nonparametric hacker stats
- 8. Parametric inference
- 9. Maximum likelihood estimation
- 10. Model comparison
- 11. Course feedback