E4. To be completed after lesson 10

Data set download


[ ]:
import pandas as pd

Exercise 4.1

In the lesson exercise, we will again work with a subset of the Palmer penguin data set. I will load it and view it now.

[17]:
df = pd.read_csv(os.path.join(data_path, "penguins_subset.csv"), header=[0, 1])

df.head()
[17]:
Gentoo Adelie Chinstrap
bill_depth_mm bill_length_mm flipper_length_mm body_mass_g bill_depth_mm bill_length_mm flipper_length_mm body_mass_g bill_depth_mm bill_length_mm flipper_length_mm body_mass_g
0 16.3 48.4 220.0 5400.0 18.5 36.8 193.0 3500.0 18.3 47.6 195.0 3850.0
1 15.8 46.3 215.0 5050.0 16.9 37.0 185.0 3000.0 16.7 42.5 187.0 3350.0
2 14.2 47.5 209.0 4600.0 19.5 42.0 200.0 4050.0 16.6 40.9 187.0 3200.0
3 15.7 48.7 208.0 5350.0 18.3 42.7 196.0 4075.0 20.0 52.8 205.0 4550.0
4 14.1 48.7 210.0 4450.0 18.0 35.7 202.0 3550.0 18.7 45.4 188.0 3525.0

Explain in words what each of the following code cells does as we work toward tidying this data frame.

[19]:
df.columns.names = ['species', 'quantity']

df.head()
[19]:
species Gentoo Adelie Chinstrap
quantity bill_depth_mm bill_length_mm flipper_length_mm body_mass_g bill_depth_mm bill_length_mm flipper_length_mm body_mass_g bill_depth_mm bill_length_mm flipper_length_mm body_mass_g
0 16.3 48.4 220.0 5400.0 18.5 36.8 193.0 3500.0 18.3 47.6 195.0 3850.0
1 15.8 46.3 215.0 5050.0 16.9 37.0 185.0 3000.0 16.7 42.5 187.0 3350.0
2 14.2 47.5 209.0 4600.0 19.5 42.0 200.0 4050.0 16.6 40.9 187.0 3200.0
3 15.7 48.7 208.0 5350.0 18.3 42.7 196.0 4075.0 20.0 52.8 205.0 4550.0
4 14.1 48.7 210.0 4450.0 18.0 35.7 202.0 3550.0 18.7 45.4 188.0 3525.0
[20]:
df = df.stack(level='species')

df.head()
[20]:
quantity bill_depth_mm bill_length_mm body_mass_g flipper_length_mm
species
0 Adelie 18.5 36.8 3500.0 193.0
Chinstrap 18.3 47.6 3850.0 195.0
Gentoo 16.3 48.4 5400.0 220.0
1 Adelie 16.9 37.0 3000.0 185.0
Chinstrap 16.7 42.5 3350.0 187.0
[21]:
df = df.reset_index(level='species')

df.head()
[21]:
quantity species bill_depth_mm bill_length_mm body_mass_g flipper_length_mm
0 Adelie 18.5 36.8 3500.0 193.0
0 Chinstrap 18.3 47.6 3850.0 195.0
0 Gentoo 16.3 48.4 5400.0 220.0
1 Adelie 16.9 37.0 3000.0 185.0
1 Chinstrap 16.7 42.5 3350.0 187.0
[22]:
df = df.reset_index(drop=True)

df.head()
[22]:
quantity species bill_depth_mm bill_length_mm body_mass_g flipper_length_mm
0 Adelie 18.5 36.8 3500.0 193.0
1 Chinstrap 18.3 47.6 3850.0 195.0
2 Gentoo 16.3 48.4 5400.0 220.0
3 Adelie 16.9 37.0 3000.0 185.0
4 Chinstrap 16.7 42.5 3350.0 187.0
[23]:
df.columns.name = None

df.head()
[23]:
species bill_depth_mm bill_length_mm body_mass_g flipper_length_mm
0 Adelie 18.5 36.8 3500.0 193.0
1 Chinstrap 18.3 47.6 3850.0 195.0
2 Gentoo 16.3 48.4 5400.0 220.0
3 Adelie 16.9 37.0 3000.0 185.0
4 Chinstrap 16.7 42.5 3350.0 187.0

Exercise 4.2

What is the difference between merging and concatenating data frames?

Exercise 4.3

Write down any questions or points of confusion that you have.

Computing environment

[ ]:
%load_ext watermark
%watermark -v -p pandas,jupyterlab