Lesson 8 exercises¶
[1]:
# Colab setup ------------------
import os, sys, subprocess
if "google.colab" in sys.modules:
cmd = "pip install --upgrade iqplot bebi103 watermark"
process = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout, stderr = process.communicate()
data_path = "https://s3.amazonaws.com/bebi103.caltech.edu/data/"
else:
data_path = "../data/"
# ------------------------------
import pandas as pd
Exercise 8.1¶
In the lesson exercise, we will again work with a subset of the Palmer penguin data set. I will load it and view it now.
[2]:
df = pd.read_csv(os.path.join(data_path, "penguins_subset.csv"), header=[0, 1])
df.head()
[2]:
Gentoo | Adelie | Chinstrap | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
bill_depth_mm | bill_length_mm | flipper_length_mm | body_mass_g | bill_depth_mm | bill_length_mm | flipper_length_mm | body_mass_g | bill_depth_mm | bill_length_mm | flipper_length_mm | body_mass_g | |
0 | 16.3 | 48.4 | 220.0 | 5400.0 | 18.5 | 36.8 | 193.0 | 3500.0 | 18.3 | 47.6 | 195.0 | 3850.0 |
1 | 15.8 | 46.3 | 215.0 | 5050.0 | 16.9 | 37.0 | 185.0 | 3000.0 | 16.7 | 42.5 | 187.0 | 3350.0 |
2 | 14.2 | 47.5 | 209.0 | 4600.0 | 19.5 | 42.0 | 200.0 | 4050.0 | 16.6 | 40.9 | 187.0 | 3200.0 |
3 | 15.7 | 48.7 | 208.0 | 5350.0 | 18.3 | 42.7 | 196.0 | 4075.0 | 20.0 | 52.8 | 205.0 | 4550.0 |
4 | 14.1 | 48.7 | 210.0 | 4450.0 | 18.0 | 35.7 | 202.0 | 3550.0 | 18.7 | 45.4 | 188.0 | 3525.0 |
Explain in words what each of the following code cells does as we work toward tidying this data frame.
[3]:
df.columns.names = ['species', None]
[4]:
df = df.stack(level='species')
[5]:
df = df.reset_index(level='species')
[6]:
df = df.reset_index(drop=True)
Exercise 8.2¶
What is the difference between merging and concatenating data frames?
Exercise 8.3¶
Write down any questions or points of confusion that you have.
Computing environment¶
[7]:
%load_ext watermark
%watermark -v -p pandas,jupyterlab
CPython 3.8.5
IPython 7.18.1
pandas 1.1.3
jupyterlab 2.2.6