(c) 2018 Justin Bois. With the exception of pasted graphics, where the source is noted, this work is licensed under a Creative Commons Attribution License CC-BY 4.0. All code contained herein is licensed under an MIT license.
This document was prepared at Caltech with financial support from the Donna and Benjamin M. Rosen Bioengineering Center.
This tutorial was generated from a Jupyter notebook. You can download the notebook here.
In this lesson, you will set up a Python computing environment for scientific computing. In addition, you will set up a GitHub account, which you will use to collaborate on and submit all exercises of the course.
There are two main ways people set up Python for scientific computing.
In this class, we will use Anaconda, with its associated package manager, conda
. It is pretty much the de facto package manager/distribution for scientific use.
Python is currently in version 3.7 (as of September 13, 2018). Python 3.x is not backwards compatible with Python 2.x. Many scientific packages were written in Python 2.x and have been very slow to update to Python 3. However, Python 3 is Python's present and future, so all packages eventually need to work in Python 3. Today, most important scientific packages work in Python 3. All of the packages we will use do, so we will use Python 3 in this course.
For those of you who are already using Anaconda with Python 2, you can create a Python 3 environment.
If your machine is a Mac, you will need to install XCode, which you can get through the App Store, before installing Anaconda. Once you install XCode, you need to launch it in order to have everything set up properly. It will take a while to launch, and when it has launched, you can close it, and you won't need it again for the rest of the course. Important components under the hood are set up by installing and launching XCode.
We will be using JupyterLab. It is browser-based, and Chrome, Firefox, and Safari are supported. Internet Explorer is not. Therefore, if you are a Windows user, you need to be sure you have either Chrome of Firefox installed.
Git is installed on Macs with XCode. For Windows users, you need to install Git. You can do this by following the instructions here.
Mac users: Before installing Anaconda, be sure you have XCode installed.
Downloading and installing Anaconda is simple.
That's it! After you do that, you will have a functioning Python distribution.
After installing the Anaconda distribution, you should be able to launch the Anaconda Navigator. If you're using macOS, this is available in your Applications
menu. If you are using Windows, you can do this from the Start
menu. Launch Anaconda Navigator.
You should see an option to launch JupyterLab. When you do that, a new browser window or tab will open with JupyterLab running. Within the JupyterLab window, you will have the option to launch a notebook, a console, a terminal, or a text editor. We will notebooks heavily in the course, and will also use text editors. We will use the terminal for package management. While you are free to use any text editor you like (I recommend VS Code or Sublime Text) or the native terminal on your machine.
If you choose to use Jupyter, for the updating and installation of necessary packages, click on Terminal to launch a terminal. You will get a terminal window (probably black) with a prompt. We refer to this text interface in the terminal as the command line. You will use this to install the requisite packages.
conda
package manager¶conda
is a package manager for keeping all of your packages up-to-date. It has plenty of functionality beyond our basic usage in class, which you can learn more about by reading the docs. We will primarily be using conda
to install and update packages.
conda
works from the command line. Now that you know how to get a command line prompt, you can start using conda
. The first thing we'll do is update conda
itself. To do this, enter the following on the command line:
conda update conda
If conda
is out of date and needs to be updated, you will be prompted to perform the update. Just type y
, and the update will proceeed.
Now that conda
is updated, we'll use it to see what packages are installed. Type the following on the command line:
conda list
This gives a list of all packages and their versions that are installed. Now, we'll update all packages, so type the following on the command line:
conda update --all
You will be prompted to perform all of the updates. They may even be some downgrades. This happens when there are package conflicts where one package requires an earlier version of another. conda
is very smart and figures all of this out for you, so you can almost always say "yes" (or "y
") to conda
when it prompts you.
You will also need to install some packages that are not included in the default Anaconda distribution, namely PyStan, Altair, Altair-catplot, IPython Vega, Watermark, node.js. You will also need to install the BE/Bi 103 module. To install these packages, do the following, in succession, on the command line.
conda install nodejs
conda install -c conda-forge altair vega
pip install altair-catplot watermark bebi103
We will be using Altair for most of our plotting. By default, Altair only exports graphics as PNG and HTML (which is really all you need, at least for sharing plots and for the paper of the future, which is not a PDF and is interactive). However, many of us are still publishing the paper of the present, which is typically a PDF, and we want vector graphics for our plots. To enable Altair to publish vector graphics, you will need to install the Google Chrome web browser and ChromeDriver. To install Chrome, simply download it and follow the on-screen instructions. You do not need to make it your default browser if you do not want to. To install ChromeDriver, download the most recent ZIPped binary (choose the zip file that matches your operating system). Unzip it, and save the binary to a directory in your PATH
. If you're using macOS or Linux, you can do the following on the command line, assuming you saved the unzipped file in the Downloads/
folder in your home directory.
mkdir -p /usr/local/bin
mv ~/Downloads/chromedriver /usr/local/bin/
Again, this installation is not necessary, but will allow you to export SVGs from Altair.
Finally, we need to configure JupyterLab to work with Bokeh, which we will use to visualize images.
jupyter labextension install --no-build jupyterlab_bokeh
jupyter labextension install --no-build @pyviz/jupyterlab_pyviz
jupyter labextension install --no-build @jupyter-widgets/jupyterlab-manager
You may also wish to install a spell-checker (this one isn't necessary). I suspect this spell-checker will either be improved or replaced in the future, but it is all that is currently available (as of June 7, 2018).
jupyter labextension install --no-build @ijmbarr/jupyterlab_spellchecker
After installing all of these extensions, you can rebuild JupyterLab.
jupyter lab build
If you're using a terminal in JupyterLab, close your JupyterLab session and relaunch it after you have completed the build. As before, after JupyterLab launched, launch a new terminal window so that you can proceed with setting up Git.
We will make extensive use of Git during the course. We will use GitHub to host the repositories. You need to set up a GitHub account and get yourself acquainted with the basics of Git. To do this, see this tutorial from my Intro to Programming Bootcamp.
Once you have a GitHub account, send an email to bois at caltech dot edu
with your account ID to get access to the BE/Bi 103 Group on GitHub. Within this group, you will form a team. Your team consists of your partners for homework submission.
We'll now run a quick test to make sure things are working properly. We will make a quick plot that requires some of the scientific libraries we will use in the bootcamp.
Use the JupyterLab launcher (you can get a new launcher by clicking on the +
icon on the left pane of your JupyterLab window) to launch a notebook. In the first cell (the box next to the In [ ]:
prompt), paste the code below. To run the code, press Shift+Enter
while the cursor is active inside the cell. You should see a plot that looks like the one below. If you do, you have a functioning Python environment for scientific computing!
import numpy as np
import pandas as pd
import altair as alt
# Generate plotting values
t = np.linspace(0, 2*np.pi, 200)
x = 16 * np.sin(t)**3
y = 13 * np.cos(t) - 5 * np.cos(2*t) - 2 * np.cos(3*t) - np.cos(4*t)
# Build a data frame for plotting
df = pd.DataFrame({'x': x,
'y': y,
't': t})
df_text = pd.DataFrame({'x': [0],
'y': [0]})
# Make a plot
heart = alt.Chart(df
).mark_line(
color='red'
).encode(
x='x:Q',
y='y:Q',
order='t')
text = alt.Chart(df_text
).mark_text(
text='BE/Bi 103',
align='center',
baseline='bottom',
size=30
).encode(
x='x:Q',
y='y:Q')
(heart + text).interactive()
%load_ext watermark
%watermark -v -p numpy,pandas,altair,jupyterlab