Running Jupyter notebooks with AWS


In the previous part of this lesson on setting up AWS, we had instructions on setting up your AWS account and getting running with EC2. Once you have your account set up with keypairs and everything else, you can follow the instructions below to run your instances. Some of the instructions are repeated from the previous part of this lesson on setting up AWS, but we leave them here so this page can serve as a convenient reference. (Students have explicitly asked that we do this in the past!)

1. Get your instance running

Now that you already have your account set up, all you need to do is:

  1. Go to the AWS Console either via your account (postdocs) or via AWS Educate (students). If you are using AWS Educate, log in to AWS Educate and then choose My Classrooms from the top menu. Select our class (“Statistical Inference in the Biological Sciences” and click Go to Classroom. Proceed to Vocarean, and once logged in there, click on the AWS Console button.

  2. Select EC2 from the Services pulldown menu at the top of your screen.

  3. After selecting EC2, you will see the EC2 Dashboard on the left pane. Under Instances there, click Instances. Alternatively, you can also click the Running Instances link under the Resources main heading at the top of the page.

  4. Right click over the name of the instance you wish to start and go to Instance StateStart.

It may take a little while for your instance to get going. When the Instance State says running and the Status Checks are complete, your instance is ready for you to get working.

2. Connect to your instance

Now that your instance is launched, you can connect to it using your computer and the ssh protocol. The instructions work for Windows, macOS, or Linux, assuming you have a terminal running bash. In Windows, this is accomplished using GitBash. For macOS, use Terminal.

  1. Open a new GitBash (Windows) or Terminal (macOS) window.

  2. SSH into your instance in the terminal. To do this, click on yout instance on the Instances page in the Management Console. At the bottom of the webpage will appear information about your instance, inclugint the IPv4 Public IP. It will look something like 54.92.67.113. Copy this. In what following, I refer to this as <IPv4 Public IP>. Replacing the keypair name below with yours, SSH into your instance by doing

    ssh -i "~/key_pairs/bebi103_aws_keypair.pem" ec2-user@<IPv4 Public IP>

  3. (optional, may only work for macOS) To avoid having to use -i "~/key_pairs/bebi103_aws_keypair.pem" each time, some of you may have added your keypair to your bash profile by doing

    echo ssh-add -K $PWD/bebi103_aws_keypair.pem >> ~/.bash_profile; source ~/.bash_profile

You are now connected to your instance!

3. Launch JupyterLab

When you launch JupyterLab, you want to use `screen <https://en.wikipedia.org/wiki/GNU_Screen>`__. By running screen, your JupyterLab session will not get interrupted if you disconnect from your instance. So, on the command line in your instance:

  1. Execute the following:

    screen

  2. Launch JupyterLab by executing

    jupyter lab --no-browser

    on the command line. This will launch JupyterLab. It will output a URL for you to open JupyterLab in your browser. Don’t use it yet, though.

  3. Open up another GitBash or Terminal window and execute the following, which sets up a socket in order to use JupyterLab through a browser on your machine.

    ssh -i "~/key_pairs/bebi103_aws_keypair.pem" -L 8000:localhost:8888 ec2-user@<IPv4 Public IP>

This socket connects port 8888 on your EC2 instance to port 8000 on your local machine. You can change these numbers as necessary. For example, in the URL listed above that you got with you launched JupyterLab, the port may be localhost:8889, in which case you need to substitute 8889 for 8888 in your ssh command. You may also want a different local port if you already have a JupyterLab instance running on port 8000, e.g., 8001. In what follows, I will use port number 8000 and 8888, which you will probably use 90% of the time, but you can make changes as you see fit.

  1. Now you can paste the URL given when you launched JupyterLab on your EC2 instance into your browser, but substitute 8000 for 8888.

You will now have JupyterLab up and running!

4. If you get detached

If you lose your internet connection, you can reconnect to your session, with JupyterLab running, by reattaching your screen. Execute screen -r on the command line after SSH-ing back in to your EC2 instance to do this.

You can see what screens are active by doing screen -ls on the command line. You can also detach the current screen by using screen -d.

5. Copying results to and from AWS to your local machine

As you work on notebooks and create new files you want to save, you may want to move them to your local machine. If you are working on a notebook or .stan file, the best option is to use git and commit and push those files to your repository directly from the command line on your EC2 instance.

Some files, though, such as MCMC results or intermediate data processing results, are not meant to be under version control. For these file, you an use scp.

  1. Open another GitBash or Terminal window on your local machine.

  2. You can copy files from the EC2 instance to your computer as follows.

    scp -i "~/key_pairs/bebi103_aws_keypair.pem" ec2-user@<IPv4 Public IP>:~/my_file.csv ./

  3. Similarly, you can upload files to your EC2 instance as follows (in this example to the home directory in your instance).

    scp -i "~/key_pairs/bebi103_aws_keypair.pem" my_file.txt ec2-user@<IPv4 Public IP>:~/

6. Exiting

  1. Shut down your notebook in the browser.

  2. If necessary, in the terminal window used to launch JupyterLab, you can shut down JupyterLab by pressing Ctrl-c.

  3. After Jupyter is terminated, you should detach your screen by doing screen -d.

  4. For good measure, you should also quit your screen by doing screen -X quit.

  5. STOP YOUR INSTANCE ON AWS. To do this, go back to the Instances page on your EC2 console. Right click your instance, and navigate to Instance StateStop. Do not terminate your instance unless you really want to. Terminating an instance will get rid of any changes you made to it.

7. Seriously. Stop your instances if you are not using them.

If your instance is not stopped and you leave it running, you will get charged. You can rack up a massive bill with idle, but running, instances. You should stop your instances whenever you are not using them and watch them stop all the way. It is a minor pain to wait for them to spin up again, but forgetting about a running instance will cause more pain than that to your pocketbook.