{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Lesson 3 exercises\n", "\n", "[Data set download](https://s3.amazonaws.com/bebi103.caltech.edu/data/anderson-fisher-iris.csv)\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercise 3.1\n", "\n", "The [Anderson-Fisher iris data set](https://en.wikipedia.org/wiki/Iris_flower_data_set) is a classic data set used in statistical and machine learning applications. Edgar Anderson carefully measured the lengths and widths of the petals and sepals of 50 irises in each of three species, *I. setosa*, *I. versicolor*, and *I. virginica*. Ronald Fisher then used this data set to distinguish the three species from each other.\n", "\n", "**a)** Load the data set, which you can download [here](https://s3.amazonaws.com/bebi103.caltech.edu/data/anderson-fisher-iris.csv) into a Pandas `DataFrame` called `df`. Be sure to check out the structure of the data set before loading. You will need to use the `header=[0,1]` kwarg of `pd.read_csv()` to load the data set in properly." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**b)** Take a look `df`. Is it tidy? Why or why not?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**c)** Perform the following operations to make a new `DataFrame` from the original one you loaded in exercise 1 to generate a new `DataFrame`. You do not need to worry about what these operations do (that is the topic of next week, just do them to answer this question: Is the resulting data frame `df_tidy` tidy? Why or why not?" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "df_tidy = df.stack(\n", " level=0\n", ").sort_index(\n", " level=1\n", ").reset_index(\n", " level=1\n", ").rename(\n", " columns={'level_1': 'species'}\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**d)** Using `df_tidy`, slice out all of the sepal lengths for *I. versicolor*. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercise 3.2\n", "\n", "**a)** Make a scatter plot of sepal width versus petal length with the glyphs colored by species." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**b)** Make a plot comparing the petal widths of the respective species. Comment on why you chose the plot you chose." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercise 3.3\n", "\n", "Write down any questions or points of confusion that you have." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" } }, "nbformat": 4, "nbformat_minor": 4 }