Auxiliary tutorial 5a: Setting Up Jupyter Notebooks In The Amazon Cloud

(c) 2017 Porfirio Quintero-Cadenas. This work is licensed under a Creative Commons Attribution License CC-BY 4.0. All code contained herein is licensed under an MIT license.

This tutorial was generated from an Jupyter notebook. You can download the notebook here.

If your computer is struggling with your MCMC calculations, you may want to get your hands on a more powerful computer. Amazon Web Services (AWS) provides a service that allows you to use their computers on demand, and this brief tutorial will help you get your Jupyter Notebooks up and running in their computer cloud. With the setup that we recommend in this tutorial, the cost is around $0.2 per hour of usage.

1. Create an Amazon Web Services account

The first thing to do is create an AWS account by clicking on the link on the upper right corner of this page. Once you have your account you can go back to it by clicking on Sign In To Console, which should now be in the upper right corner of the same site.

2. Set up your Server

Now you can select the kind of computer you need. You will be using Amazon's Elastic Compute Cloud (EC2) service.

  • In the menu located in the upper left part of your console, click on ServicesEC2. This is the EC2 dashboard.

  • Within this service, there are two main things that you need to choose, essentially the software and the hardware. You will do this by creating an instance, which you can do by clicking on the Launch Instance button under Create Instance.

  • For your Amazon Machine Image, or your software configuration, select Amazon Linux AMI (HVM / 64-bit).

  • For your Instance Type, or hardware configuration, select c4.xlarge. This will be around $0.2 per hour of usage. You can see more details about this here.

  • On the lower right corner, click on Next: Configure Instance Details. Leave the default options, and click on Next: Add Storage.

  • Change the size to 30 GiB, volume type should be the default General Purpose SSD (GP2). Click on Next: Add Tags. You do not need to change anything here. Click on Next: Configure Security Group

  • With the Create new security group option checked, click on the Add Rule button in the lower part. Fill in as follows:

Type Protocol Port Range Source
Custom TCP TCP 8888 Anywhere

This configuration should enable you to connect to your Jupyter Notebooks from any IP address.

  • On the lower right corner, click on Review and Launch. Again, on the lower right corner, click on Launch.

  • In the popup window, select Create a new key pair, name it bebi103_keypair, and click on Download Key Pair.

  • In your computer, move this file to your bebi103 directory.

  • Finally, click on Launch Instance.

Your have a server up and running!

Click on View Instances to see your server details. You will come back to this window later to retrieve your IP address.

3. Connect to your server

Now that you have a server, you can connect to it using your computer and the ssh protocol. The procedure is different for Windows and macos/Linux users.

Windows

You will need to download and install an ssh client such as PuTTY.


1. Generate PuTTY keypair

First you have to convert the keypair you generated earlier to something compatible with PuTTY.

  • Open PuTTY. Click on the Start menu → All Programs → PuTTY → PuTTYgen

  • Under Type of key to generate, choose RSA.

  • Click on Load, select the option to display files of all types, and open the bebi103_keypair.pem file.

  • Click on Save private key and select yes to save your key.

  • Name your key bebi103_keypair again and save.


2. SSH to server

Now you can connect to the server using ssh. To find your server, you need its IP address, which you can find using your AWS console in your last opened web browser window (see below); you can also go back to this window by clicking on the left panel of your EC2 dashboard on Instances.

  • To get your server's IP address, click on your BeBi103 instance.

  • On the lower part, in upper right of the Description tab, you should see IPv4 Public IP. Copy this address, it should be something like: 13.58.35.204. The IP address will change when you restart your server, so make sure you to retrieve it from here next time you need it.

  • Start Putty by clicking on the Start menu → All Programs → PuTTY → PuTTY. In the host name, input:

    ec2-user@Your.IP.Address
  • Select SSH as your connection type and port 22.

  • In the Category pane → Connection → SSH → Auth.

  • Click on Browse and open the .ppk key pair file you generated earlier.

  • To save this information for the future, click on Session in the Category tree, enter a name for the session in Saved Sessions, and then click Save.

  • Finally, click Open and yes to start the PuTTY session.

You should get a welcome message with Amazon Linux AMI on it and a terminal prompt similar to git-bash.

Mac/Linux

1. Get your keypair

  • Open a terminal window. Navigate to your bebi103 directory, where you saved the bebi103_keypair file:

    cd path_to_bebi103directory

  • Change permissions on your key file to make sure that it isn't publicly viewable:

    chmod 400 bebi103_keypair.pem.txt

  • Finally, edit your bash profile to make your computer aware of your keypair:

    echo ssh-add -K $PWD/bebi103_keypair.pem.txt >> ~/.bash_profile source ~/.bash_profile


2. SSH to server

Now you can connect to the server using ssh. To find your server, you need its IP address, which you can find using your AWS console in your last opened web browser window (see below); you can also go back to this window by clicking on the left panel of your EC2 dashboard on Instances.

  • To get your server's IP address, click on your BeBi103 instance.

  • On the lower part, in upper right of the Description tab, you should see IPv4 Public IP. Copy this address, it should be something like: 13.58.35.204. The IP address will change when you restart your server, so make sure you to retrieve it from here next time you need it.

  • Go back to the terminal and connect to your server using your username (ec2-user is the generic name for the software configuration we selected) and your server's current IP address:

    ssh ec2-user@Your.IP.Address

  • You should get a welcome message with Amazon Linux AMI on it, and your terminal should now show something like:

    [ec2-user@ip-172-31-24-56 ~]$

If this is the case, you are connected to your server! You can type logout whenever you want to do so. Any time you want to connect again, all you need to do is fetch your server's IP address, open a terminal window and type the last command to connect using ssh, since your computer will remember your keypair.

4. Install github, the bebi103 repository, and everything else

Enter the following while connected to the server via ssh:

sudo yum -y install git

Clone your own github repository:

git clone https://github.com/bebi103/XX-bebi103.git

Make sure to replace 'XX' with your team number.

We put together a shell script that will configure the server for you, including a downloading and installing working python distribution with the packages that you will need, as well as the datasets used in this class. All you need to do is clone the auxiliary repository and run the script:

git clone https://github.com/pquinter/aws103.git 
bash ~/aws103/bebi103_AWSconfig.sh

Be patient while everything is downloaded and installed. You will be asked to complete your certificate details, and then to create a password for your jupyter notebooks. You will use this password to login to your notebooks from your web browser, so make sure you remember it.

Finally, source your .bashrc file:

source ~/.bashrc

You are all set with the configuration. The data for the class are in XX-bebi103/data.

5. Access your Jupyter Notebooks from your web browser

Your Jupyter Notebooks should now be available through the internet. To access them, you first have to start Jupyter. Go into your homework directory and launch a jupyter notebook:

cd ~/*-bebi103/homework
jupyter notebook

With Jupyter running you should be able to see your notebook from your own computer. Open a web browser and enter the IP address of your server followed by :8888, which indicates the port we setup for Jupyter:

https://Your.IP.address:8888

If you see a security warning, bypass it. In Chrome, click on Advanced and proceed to your site and type in your password. You should see your usual notebook interface, but all the notebooks you create on this window will be running on Amazon's computers.

Managing your server time

You will be charged for the time your server is running. For this reason, you should prototype your code on your own machine first, and then run on AWS. Also, when you are not using your server, you can temporarily stop it so that you don't get charged. To do this, do the following.

  • Go to your EC2 dashboard and click on instances on the left panel.

  • Click on Actions (if you have more than one instance, right click on the instance you want to stop).

  • Select Instance StateStop. Do not select terminate, as this will delete your configuration.

When you need to use it again, you can restart it using the same procedure, but instead select Instance StateStart. You do not need to be connected to the server for it to be running, but you have to have Jupyter running in order to access it. For example, you can connect to the server, start Jupyter, logout from the server, and keep using Jupyter through your web browser.

We have added a handy function to the auxiliary repository aws103 that you can use to be notified by email when your calculations are done, so that you can stop the server. We also created a default gmail account for this.

To use it, simply import it:

from aws103 import send_notification

and add the following line in your python script, after your calculation code:

send_notification('your_email_address', 'MCMCpower')

Replace the first argument with your email address (e.g. john@gmail.com) and leave the second argument as is (it is the password of the default sender account). You will receive an email notifying you once that line is run.

Transfering files to and from your server

After your setup, transferring files to and from the server should be easy. With your server's IP address in hand, you can use SCP/PSCP from your computer, without being connected to the server.

Windows

You can use the PuTTY Secure Copy Client, similar to SCP on unix. For instance, to transfer a file from your computer to your data directory on the server, enter the following:

pscp -i C:\path_to_bebi103_keypair.ppk C:\path\file_name ec2-user@Your.IP.Address:/home/ec2-user/XX-bebi103/data/file_name

Make sure to replace path_to_bebi103_keypair.ppk with the path to the RSA file generated earlier using PuTTY.

To copy directories, just add the recursive option -r:

pscp -r -i C:\path_to_bebi103_keypair.ppk C:\path\file_name ec2-user@Your.IP.Address:/home/ec2-user/path_to_dir

To transfer a file from the server to your computer, you just invert the arguments:

pscp -i C:\path_to_bebi103_keypair.ppk ec2-user@Your.IP.Address:/home/ec2-user/XX-bebi103/data/file_name C:\path\file_name

Mac/Linux

Open a new terminal window. To transfer files from your computer to your data directory on the server, enter the following:

scp path_to_file ec2-user@Your.IP.Address:/home/ec2-user/XX-bebi103/data/file_name

You can also transfer directories by using the recursive option:

scp -r path_to_directory ec2-user@Your.IP.Address:/home/ec2-user/path_to_dir

To transfer a file from the server to your computer, you just invert the arguments. For example, to transfer your whole bebi103 directory:

scp -r ec2-user@Your.IP.Address:/home/ec2-user/XX-bebi103 path_to_dir

Addendum

A couple of tips that will help you work with AWS:

  • To keep Jupyter running even if you lose your internet connection or you close your web browser, you can use the no hangup command when you start Jupyter Notebooks:
    nohup jupyter notebook
    Jupyter will stop once you stop your instance; then you will have to start it again by connecting via ssh to your server and typing the above commands. Notice that all the notebooks that you open will continue running. To access objects that your notebooks output (i.e. MCMC traces), you will need to save them to disk.


  • You will want to save your MCMC traces or other python objects to disk after long calculations. A convenient way to do this is to use pickle, which can convert a python object into a byte stream, and read from a byte stream into a python object. For example, to save the string stored in the variable my_variable to the file test_file.pkl in my current directory:
import pickle
my_variable = 'hello world!'

with open('./test_file.pkl', 'wb') as f:
    pickle.dump(my_variable, f)

To read the contents of test_file.pkl back into my_variable (this can have any name of course):

with open('./test_file.pkl', 'rb') as f:
    my_variable = pickle.load(f)

You could, for instance, let your server do all the heavy MCMC lifting, pickle MCMC traces, and then use scp/pscp to transfer the files to your computer for further analysis. For example, to pickle a trace and a model, you could do

with open('./trace_and_model.pkl', 'wb') as f:
    pickle.dump(trace, f)
    pickle.dump(model, f)

Then, to retrieve them out of the pkl file that you retrieved using scp or pscp, you can do

with open('./trace_and_model.pkl', 'rb') as f:
    trace = pickle.load(f)
    model = pickle.load(f)