Tutorial 0a: Configuring your computer to use Python for scientific computing

(c) 2018 Justin Bois. With the exception of pasted graphics, where the source is noted, this work is licensed under a Creative Commons Attribution License CC-BY 4.0. All code contained herein is licensed under an MIT license.

This document was prepared at Caltech with financial support from the Donna and Benjamin M. Rosen Bioengineering Center.

This tutorial was generated from a Jupyter notebook. You can download the notebook here.

In this lesson, you will set up a Python computing environment for scientific computing. In addition, you will set up a GitHub account, which you will use to collaborate on and submit all exercises of the course.

There are two main ways people set up Python for scientific computing.

  1. By downloading and installing package by package with tools like pip.
  2. By downloading and installing a Python distribution that contains binaries of many of the scientific packages needed. The major distributions of these are Anaconda and Enthought Canopy. Both contain IDEs.

In this class, we will use Anaconda, with its associated package manager, conda. It is pretty much the de facto package manager/distribution for scientific use.

Python 2 vs Python 3

Python is currently in version 3.7 (as of September 13, 2018). Python 3.x is not backwards compatible with Python 2.x. Many scientific packages were written in Python 2.x and have been very slow to update to Python 3. However, Python 3 is Python's present and future, so all packages eventually need to work in Python 3. Today, most important scientific packages work in Python 3. All of the packages we will use do, so we will use Python 3 in this course.

For those of you who are already using Anaconda with Python 2, you can create a Python 3 environment.

A special note to Mac users

If your machine is a Mac, you will need to install XCode, which you can get through the App Store, before installing Anaconda. Once you install XCode, you need to launch it in order to have everything set up properly. It will take a while to launch, and when it has launched, you can close it, and you won't need it again for the rest of the course. Important components under the hood are set up by installing and launching XCode.

Windows users: Install Git and Chrome or Firefox

We will be using JupyterLab. It is browser-based, and Chrome, Firefox, and Safari are supported. Internet Explorer is not. Therefore, if you are a Windows user, you need to be sure you have either Chrome of Firefox installed.

Git is installed on Macs with XCode. For Windows users, you need to install Git. You can do this by following the instructions here.

Downloading and installing Anaconda

Mac users: Before installing Anaconda, be sure you have XCode installed.

Downloading and installing Anaconda is simple.

  1. Go to the Anaconda homepage and download the graphical installer.
  2. In you can, install Python 3.7. If it is not available, you can install Python 3.6. As of September 27, 2018, only Anaconda with Python 3.6 is available. Anaconda with Python 3.7 should be available in September 2018, as per this announcement. Nonetheless, you should not wait to install, since the course is starting.
  3. You will be prompted for your email address, which you should provide. You may wish to use your Caltech email address because educational users get some of the non-free goodies in Anaconda.
  4. Follow the on-screen instructions for installation. While doing so, be sure that Anaconda is installed in your home directory, not in root.

That's it! After you do that, you will have a functioning Python distribution.

Launching JupyterLab and a terminal

After installing the Anaconda distribution, you should be able to launch the Anaconda Navigator. If you're using macOS, this is available in your Applications menu. If you are using Windows, you can do this from the Start menu. Launch Anaconda Navigator.

You should see an option to launch JupyterLab. When you do that, a new browser window or tab will open with JupyterLab running. Within the JupyterLab window, you will have the option to launch a notebook, a console, a terminal, or a text editor. We will notebooks heavily in the course, and will also use text editors. We will use the terminal for package management. While you are free to use any text editor you like (I recommend VS Code or Sublime Text) or the native terminal on your machine.

If you choose to use Jupyter, for the updating and installation of necessary packages, click on Terminal to launch a terminal. You will get a terminal window (probably black) with a prompt. We refer to this text interface in the terminal as the command line. You will use this to install the requisite packages.

The conda package manager

conda is a package manager for keeping all of your packages up-to-date. It has plenty of functionality beyond our basic usage in class, which you can learn more about by reading the docs. We will primarily be using conda to install and update packages.

conda works from the command line. Now that you know how to get a command line prompt, you can start using conda. The first thing we'll do is update conda itself. To do this, enter the following on the command line:

conda update conda

If conda is out of date and needs to be updated, you will be prompted to perform the update. Just type y, and the update will proceeed.

Now that conda is updated, we'll use it to see what packages are installed. Type the following on the command line:

conda list

This gives a list of all packages and their versions that are installed. Now, we'll update all packages, so type the following on the command line:

conda update --all

You will be prompted to perform all of the updates. They may even be some downgrades. This happens when there are package conflicts where one package requires an earlier version of another. conda is very smart and figures all of this out for you, so you can almost always say "yes" (or "y") to conda when it prompts you.

You will also need to install some packages that are not included in the default Anaconda distribution, namely PyStan, Altair, Altair-catplot, IPython Vega, Watermark, node.js. You will also need to install the BE/Bi 103 module. To install these packages, do the following, in succession, on the command line.

conda install nodejs
conda install -c conda-forge altair vega
pip install altair-catplot watermark bebi103

Optional installations

We will be using Altair for most of our plotting. By default, Altair only exports graphics as PNG and HTML (which is really all you need, at least for sharing plots and for the paper of the future, which is not a PDF and is interactive). However, many of us are still publishing the paper of the present, which is typically a PDF, and we want vector graphics for our plots. To enable Altair to publish vector graphics, you will need to install the Google Chrome web browser and ChromeDriver. To install Chrome, simply download it and follow the on-screen instructions. You do not need to make it your default browser if you do not want to. To install ChromeDriver, download the most recent ZIPped binary (choose the zip file that matches your operating system). Unzip it, and save the binary to a directory in your PATH. If you're using macOS or Linux, you can do the following on the command line, assuming you saved the unzipped file in the Downloads/ folder in your home directory.

mkdir -p /usr/local/bin
mv ~/Downloads/chromedriver /usr/local/bin/

Again, this installation is not necessary, but will allow you to export SVGs from Altair.

Configuring JupyerLab

Finally, we need to configure JupyterLab to work with Bokeh, which we will use to visualize images.

jupyter labextension install --no-build jupyterlab_bokeh
jupyter labextension install --no-build @pyviz/jupyterlab_pyviz
jupyter labextension install --no-build @jupyter-widgets/jupyterlab-manager

You may also wish to install a spell-checker (this one isn't necessary). I suspect this spell-checker will either be improved or replaced in the future, but it is all that is currently available (as of June 7, 2018).

jupyter labextension install --no-build @ijmbarr/jupyterlab_spellchecker

After installing all of these extensions, you can rebuild JupyterLab.

jupyter lab build

If you're using a terminal in JupyterLab, close your JupyterLab session and relaunch it after you have completed the build. As before, after JupyterLab launched, launch a new terminal window so that you can proceed with setting up Git.

Usage of Git/GitHub

We will make extensive use of Git during the course. We will use GitHub to host the repositories. You need to set up a GitHub account and get yourself acquainted with the basics of Git. To do this, see this tutorial from my Intro to Programming Bootcamp.

Once you have a GitHub account, send an email to bois at caltech dot edu with your account ID to get access to the BE/Bi 103 Group on GitHub. Within this group, you will form a team. Your team consists of your partners for homework submission.

Checking your distribution

We'll now run a quick test to make sure things are working properly. We will make a quick plot that requires some of the scientific libraries we will use in the bootcamp.

Use the JupyterLab launcher (you can get a new launcher by clicking on the + icon on the left pane of your JupyterLab window) to launch a notebook. In the first cell (the box next to the In [ ]: prompt), paste the code below. To run the code, press Shift+Enter while the cursor is active inside the cell. You should see a plot that looks like the one below. If you do, you have a functioning Python environment for scientific computing!

In [1]:
import numpy as np
import pandas as pd
import altair as alt

# Generate plotting values
t = np.linspace(0, 2*np.pi, 200)
x = 16 * np.sin(t)**3
y = 13 * np.cos(t) - 5 * np.cos(2*t) - 2 * np.cos(3*t) - np.cos(4*t)

# Build a data frame for plotting
df = pd.DataFrame({'x': x,
                   'y': y,
                   't': t})

df_text = pd.DataFrame({'x': [0],
                        'y': [0]})

# Make a plot
heart = alt.Chart(df
        ).mark_line(
            color='red'
        ).encode(
            x='x:Q',
            y='y:Q',
            order='t')

text = alt.Chart(df_text
        ).mark_text(
            text='BE/Bi 103',
            align='center',
            baseline='bottom',
            size=30
        ).encode(
            x='x:Q',
            y='y:Q')

(heart + text).interactive()
Out[1]:

Computing environment

In [2]:
%load_ext watermark
In [3]:
%watermark -v -p numpy,pandas,altair,jupyterlab
CPython 3.7.0
IPython 6.5.0

numpy 1.15.1
pandas 0.23.4
altair 2.2.2
jupyterlab 0.34.9