E4. To be completed after lesson 10

[ ]:

import pandas as pd

Exercise 4.1

In the lesson exercise, we will again work with a subset of the Palmer penguin data set. I will load it and view it now.

[17]:

df = pd.read_csv(os.path.join(data_path, "penguins_subset.csv"), header=[0, 1])

df.head()

[17]:

Explain in words what each of the following code cells does as we work toward tidying this data frame.

[19]:

df.columns.names = ['species', 'quantity']

df.head()

[19]:

species	Gentoo				Adelie				Chinstrap
quantity	bill_depth_mm	bill_length_mm	flipper_length_mm	body_mass_g	bill_depth_mm	bill_length_mm	flipper_length_mm	body_mass_g	bill_depth_mm	bill_length_mm	flipper_length_mm	body_mass_g
0	16.3	48.4	220.0	5400.0	18.5	36.8	193.0	3500.0	18.3	47.6	195.0	3850.0
1	15.8	46.3	215.0	5050.0	16.9	37.0	185.0	3000.0	16.7	42.5	187.0	3350.0
2	14.2	47.5	209.0	4600.0	19.5	42.0	200.0	4050.0	16.6	40.9	187.0	3200.0
3	15.7	48.7	208.0	5350.0	18.3	42.7	196.0	4075.0	20.0	52.8	205.0	4550.0
4	14.1	48.7	210.0	4450.0	18.0	35.7	202.0	3550.0	18.7	45.4	188.0	3525.0

[20]:

df = df.stack(level='species')

df.head()

[20]:

	quantity	bill_depth_mm	bill_length_mm	body_mass_g	flipper_length_mm
	species
0	Adelie	18.5	36.8	3500.0	193.0
	Chinstrap	18.3	47.6	3850.0	195.0
	Gentoo	16.3	48.4	5400.0	220.0
1	Adelie	16.9	37.0	3000.0	185.0
1	Chinstrap	16.7	42.5	3350.0	187.0

[21]:

df = df.reset_index(level='species')

df.head()

[21]:

quantity	species	bill_depth_mm	bill_length_mm	body_mass_g	flipper_length_mm
0	Adelie	18.5	36.8	3500.0	193.0
0	Chinstrap	18.3	47.6	3850.0	195.0
0	Gentoo	16.3	48.4	5400.0	220.0
1	Adelie	16.9	37.0	3000.0	185.0
1	Chinstrap	16.7	42.5	3350.0	187.0

[22]:

df = df.reset_index(drop=True)

df.head()

[22]:

quantity	species	bill_depth_mm	bill_length_mm	body_mass_g	flipper_length_mm
0	Adelie	18.5	36.8	3500.0	193.0
1	Chinstrap	18.3	47.6	3850.0	195.0
2	Gentoo	16.3	48.4	5400.0	220.0
3	Adelie	16.9	37.0	3000.0	185.0
4	Chinstrap	16.7	42.5	3350.0	187.0

[23]:

df.columns.name = None

df.head()

[23]:

	species	bill_depth_mm	bill_length_mm	body_mass_g	flipper_length_mm
0	Adelie	18.5	36.8	3500.0	193.0
1	Chinstrap	18.3	47.6	3850.0	195.0
2	Gentoo	16.3	48.4	5400.0	220.0
3	Adelie	16.9	37.0	3000.0	185.0
4	Chinstrap	16.7	42.5	3350.0	187.0

What is the difference between merging and concatenating data frames?

Write down any questions or points of confusion that you have.

[ ]:

%load_ext watermark
%watermark -v -p pandas,jupyterlab