Lesson 8 exercises¶

Data set download

[1]:

# Colab setup ------------------
import os, sys, subprocess
if "google.colab" in sys.modules:
    cmd = "pip install --upgrade iqplot bebi103 watermark"
    process = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    stdout, stderr = process.communicate()
    data_path = "https://s3.amazonaws.com/bebi103.caltech.edu/data/"
else:
    data_path = "../data/"
# ------------------------------

import pandas as pd

Exercise 8.1¶

In the lesson exercise, we will again work with a subset of the Palmer penguin data set. I will load it and view it now.

[2]:

df = pd.read_csv(os.path.join(data_path, "penguins_subset.csv"), header=[0, 1])

df.head()

[2]:

	Gentoo				Adelie				Chinstrap
	bill_depth_mm	bill_length_mm	flipper_length_mm	body_mass_g	bill_depth_mm	bill_length_mm	flipper_length_mm	body_mass_g	bill_depth_mm	bill_length_mm	flipper_length_mm	body_mass_g
0	16.3	48.4	220.0	5400.0	18.5	36.8	193.0	3500.0	18.3	47.6	195.0	3850.0
1	15.8	46.3	215.0	5050.0	16.9	37.0	185.0	3000.0	16.7	42.5	187.0	3350.0
2	14.2	47.5	209.0	4600.0	19.5	42.0	200.0	4050.0	16.6	40.9	187.0	3200.0
3	15.7	48.7	208.0	5350.0	18.3	42.7	196.0	4075.0	20.0	52.8	205.0	4550.0
4	14.1	48.7	210.0	4450.0	18.0	35.7	202.0	3550.0	18.7	45.4	188.0	3525.0

Explain in words what each of the following code cells does as we work toward tidying this data frame.

[3]:

df.columns.names = ['species', None]

[4]:

df = df.stack(level='species')

[5]:

df = df.reset_index(level='species')

[6]:

df = df.reset_index(drop=True)

Exercise 8.2¶

What is the difference between merging and concatenating data frames?

Exercise 8.3¶

Write down any questions or points of confusion that you have.

Computing environment¶

[7]:

%load_ext watermark
%watermark -v -p pandas,jupyterlab

CPython 3.8.5
IPython 7.18.1

pandas 1.1.3
jupyterlab 2.2.6