E4. To be completed after lesson 10
[ ]:
import pandas as pd
Exercise 4.1
In the lesson exercise, we will again work with a subset of the Palmer penguin data set. I will load it and view it now.
[17]:
df = pd.read_csv(os.path.join(data_path, "penguins_subset.csv"), header=[0, 1])
df.head()
[17]:
Gentoo | Adelie | Chinstrap | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
bill_depth_mm | bill_length_mm | flipper_length_mm | body_mass_g | bill_depth_mm | bill_length_mm | flipper_length_mm | body_mass_g | bill_depth_mm | bill_length_mm | flipper_length_mm | body_mass_g | |
0 | 16.3 | 48.4 | 220.0 | 5400.0 | 18.5 | 36.8 | 193.0 | 3500.0 | 18.3 | 47.6 | 195.0 | 3850.0 |
1 | 15.8 | 46.3 | 215.0 | 5050.0 | 16.9 | 37.0 | 185.0 | 3000.0 | 16.7 | 42.5 | 187.0 | 3350.0 |
2 | 14.2 | 47.5 | 209.0 | 4600.0 | 19.5 | 42.0 | 200.0 | 4050.0 | 16.6 | 40.9 | 187.0 | 3200.0 |
3 | 15.7 | 48.7 | 208.0 | 5350.0 | 18.3 | 42.7 | 196.0 | 4075.0 | 20.0 | 52.8 | 205.0 | 4550.0 |
4 | 14.1 | 48.7 | 210.0 | 4450.0 | 18.0 | 35.7 | 202.0 | 3550.0 | 18.7 | 45.4 | 188.0 | 3525.0 |
Explain in words what each of the following code cells does as we work toward tidying this data frame.
[19]:
df.columns.names = ['species', 'quantity']
df.head()
[19]:
species | Gentoo | Adelie | Chinstrap | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
quantity | bill_depth_mm | bill_length_mm | flipper_length_mm | body_mass_g | bill_depth_mm | bill_length_mm | flipper_length_mm | body_mass_g | bill_depth_mm | bill_length_mm | flipper_length_mm | body_mass_g |
0 | 16.3 | 48.4 | 220.0 | 5400.0 | 18.5 | 36.8 | 193.0 | 3500.0 | 18.3 | 47.6 | 195.0 | 3850.0 |
1 | 15.8 | 46.3 | 215.0 | 5050.0 | 16.9 | 37.0 | 185.0 | 3000.0 | 16.7 | 42.5 | 187.0 | 3350.0 |
2 | 14.2 | 47.5 | 209.0 | 4600.0 | 19.5 | 42.0 | 200.0 | 4050.0 | 16.6 | 40.9 | 187.0 | 3200.0 |
3 | 15.7 | 48.7 | 208.0 | 5350.0 | 18.3 | 42.7 | 196.0 | 4075.0 | 20.0 | 52.8 | 205.0 | 4550.0 |
4 | 14.1 | 48.7 | 210.0 | 4450.0 | 18.0 | 35.7 | 202.0 | 3550.0 | 18.7 | 45.4 | 188.0 | 3525.0 |
[20]:
df = df.stack(level='species')
df.head()
[20]:
quantity | bill_depth_mm | bill_length_mm | body_mass_g | flipper_length_mm | |
---|---|---|---|---|---|
species | |||||
0 | Adelie | 18.5 | 36.8 | 3500.0 | 193.0 |
Chinstrap | 18.3 | 47.6 | 3850.0 | 195.0 | |
Gentoo | 16.3 | 48.4 | 5400.0 | 220.0 | |
1 | Adelie | 16.9 | 37.0 | 3000.0 | 185.0 |
Chinstrap | 16.7 | 42.5 | 3350.0 | 187.0 |
[21]:
df = df.reset_index(level='species')
df.head()
[21]:
quantity | species | bill_depth_mm | bill_length_mm | body_mass_g | flipper_length_mm |
---|---|---|---|---|---|
0 | Adelie | 18.5 | 36.8 | 3500.0 | 193.0 |
0 | Chinstrap | 18.3 | 47.6 | 3850.0 | 195.0 |
0 | Gentoo | 16.3 | 48.4 | 5400.0 | 220.0 |
1 | Adelie | 16.9 | 37.0 | 3000.0 | 185.0 |
1 | Chinstrap | 16.7 | 42.5 | 3350.0 | 187.0 |
[22]:
df = df.reset_index(drop=True)
df.head()
[22]:
quantity | species | bill_depth_mm | bill_length_mm | body_mass_g | flipper_length_mm |
---|---|---|---|---|---|
0 | Adelie | 18.5 | 36.8 | 3500.0 | 193.0 |
1 | Chinstrap | 18.3 | 47.6 | 3850.0 | 195.0 |
2 | Gentoo | 16.3 | 48.4 | 5400.0 | 220.0 |
3 | Adelie | 16.9 | 37.0 | 3000.0 | 185.0 |
4 | Chinstrap | 16.7 | 42.5 | 3350.0 | 187.0 |
[23]:
df.columns.name = None
df.head()
[23]:
species | bill_depth_mm | bill_length_mm | body_mass_g | flipper_length_mm | |
---|---|---|---|---|---|
0 | Adelie | 18.5 | 36.8 | 3500.0 | 193.0 |
1 | Chinstrap | 18.3 | 47.6 | 3850.0 | 195.0 |
2 | Gentoo | 16.3 | 48.4 | 5400.0 | 220.0 |
3 | Adelie | 16.9 | 37.0 | 3000.0 | 185.0 |
4 | Chinstrap | 16.7 | 42.5 | 3350.0 | 187.0 |
Exercise 4.2
What is the difference between merging and concatenating data frames?
Exercise 4.3
Write down any questions or points of confusion that you have.
Computing environment
[ ]:
%load_ext watermark
%watermark -v -p pandas,jupyterlab