Meetings
We have weekly lectures Wednesday mornings from 10-10:50 AM in 100 Broad. On Mondays, you may attend one of the two lab sessions, either 1-4pm or 7-10pm, both in 200 Broad. Bring your laptop and charger. You should always attend the same lab session (either 1pm or 7pm), unless you have a conflict and let the course instructors know.
Lab sessions
With the exception of the first lab session and a few sessions that include some lecture material, the lab sessions are spent working on the week's homework, which always includes working with real data sets with your teammates. You are expected to be working diligently during this time, and it is a golden opportunity to do so. The course staff will be there to help you.
Prior to each lab session, you must go through the tutorials listed on the course website for the week. These will give you the requisite skills you need to work on the homework problems of the week. To verify completion of the tutorials ahead of time, you will individually need to commit a small exercise to the repository in the GitHub group. Place this in the tutorial exercises/ directory in a file named t##_your_name.ipynb, where ## is the number of the tutorial
The BE/Bi 103 GitHub group
A BE/Bi 103 GitHub group is set up for the class. You will be part of the group through your GitHub account. All homeworks and tutorial exercises are submitted by pushing to the GitHub repository.
Homework
With the exception of Thanksgiving week, there are weekly homework assignments. These consist almost entirely of working up real data sets.
Data analysis is almost always a collaborative effort in both research and industry. Therefore, you will be assigned to teams of three (possibly with a couple teams of four depending on course enrollment). You will submit your homework as a team. The following homework policies apply.
- Each homework has a defined due date and time. Your team must tag your completed homework to your team's homework repository before this time.
- The commit containing your final submission must be tagged with "hw##_submission," where ## is the two-digit homework assignment number (e.g., 04). Failure to properly tag your homework will result in a 5% deduction from your point total.
- Each homework problem must be committed as single Jupyter notebook. The file names must be hw#.#.ipynb. For example, homework problem 3.2 is in the file hw3.2.ipynb.
- All code you wrote to do your assignment must be included in the notebook. Code from imported packages that you did not write (e.g., modules distributed by the class instructors) need not be displayed in the notebook. We will run the code in your notebook; all code must run to get credit.
- Since we are running your code to check it, you must have the following path structure for the data sets used in the homework. There must be a directory data/ in the root of your team's repository. This directory contains any data files downloaded for the homework with the file names unaltered. If the data were distributed in a ZIP file, the file must have been unzipped in the data directory and the file names and file names unaltered. So, the path to the data/ directory from your homework/ is ../data/.
- All of your results must be clearly explained and all graphics clearly presented and embedded in the Jupyter notebook.
- Any mathematics in your homework must render clearly and properly with MathJax. This essentially means that your equations must be written in correct LaTeX.
- Where appropriate, you need to give detailed discussion of analysis choices you have made. As an example, you may choose to model error in measurement with a Cauchy distribution instead of Gaussian. You need to justify that choice.
- To give a better guideline on how to construct your assignments (and this is good practice in general in your own workflows), you should follow these guidelines.
- Each code cell should do only one task or define only one, simple function.
- Do not have any adjacent code cells. Thus, you should have explanatory text describing what is happening in the code cells. This text should not just explain the code, but your reasoning behind why you are doing the calculation.
- Show all equations. For example, write down the mathematical expression for the log likelihood before coding it up.
- Use Markdown headers to delineate sections of your notebook. In this class, this at least means using headers to delineate parts of the problem.
- There is seldom a single right way to analyze a set of data. You are encouraged to try different approaches to analysis. If you perform an analysis and find problems with it, clearly write what the problems are and how they came about. Even if your analysis does not completely work, but you demonstrate that you thought carefully about it and understand its difficulties, you will get nearly full credit.
- You should also include attribution in your homework submission: who on the team did what. While different people on the team may do different parts of the homework, I encourage you to work together on all parts of the homework. At the very least, you personally must understand all of the steps taken in the homework solutions and be able to repeat them by yourself.
- Throughout the term, your team will have six "grace days" for late homeworks. For example, your team can submit homework 1 two days late, homework 6 three days late, and homework 8 one day late. After that, no more late homeworks will be accepted.
Grading
80% of your grade is determined from homework. Everyone on your team will get the same grade on the homework.
20% of your grade is determined from submission of your tutorial exercises and participation in the lab sessions. You are expected to work together with the course instructors and fellow students as we go through the tutorials with your full attention.
Collaboration policy and Honor Code
Some of the data we will use in this course is unpublished, generously given to us by researchers both from Caltech and from other institutions. They have given us their data in good faith that it will be used only in this class. It is therefore imperative that you do not disseminate these data sets anywhere outside of this class.
Since the homework is done in assigned teams, you obviously should collaborate heavily with the other members of your team. You are free to discuss the homework with other teams, including via Piazza, but the work you submit must be the work of your own team.
You may not consult solutions of homework problems from previous editions of this course.
You are free to consult references, literature, websites, blogs, etc., outside of the materials presented in class (the obvious exceptions being last year's homework solutions). In fact, you are encouraged to do so. If you do, you must properly cite the sources in your homework. Be warned: doing homework by Google fishing will not work! The problems are too open ended and the techniques are too varied.
Excused absences/extensions
Under certain circumstances, missed lab or lecture sessions will be excused and extensions given on the homework without costing grace days. The reasons for the excuses or extensions must be compelling, such as health or family issues. They must be requested from the course instructor.
Course communications
You are free to contact the course staff at any time, but we encourage you to use the class Piazza page for questions course topics and homework. Most of our mass communication with you will be through Piazza, so be sure to set your Piazza account to give you email alerts if necessary.