{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Exploratory data analysis\n", "\n", "In 1977, John Tukey, one of the great statisticians and mathematicians of all time, published a book entitled *Exploratory Data Analysis*. In it, he laid out general principles on how researchers should handle their first encounters with their data, before formal statistical inference. Most of us spend a lot of time doing exploratory data analysis, or EDA, without really knowing it. Mostly, EDA involves a graphical exploration of a data set.\n", "\n", "We start off with a few wise words from John Tukey himself." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Useful EDA advice from John Tukey\n", "\n", "- \"Exploratory data analysis can never be the whole story, but nothing else can serve as a foundation stone—as the first step.\"\n", "\n", "
\n", "\n", "- \"In exploratory data analysis there can be no substitute for flexibility; for adapting what is calculated—and what we hope plotted—both to the needs of the situation and the clues that the data have already provided.\"\n", "\n", "
\n", "\n", "- \"There is no excuse for failing to plot and look.\"\n", "\n", "
\n", "\n", "- \"There is often no substitute for the detective's microscope - - or for the enlarging graphs.\"\n", "\n", "
\n", "\n", "- \"Graphs force us to note the unexpected; nothing could be more important.\"\n", "\n", "
\n", "\n", "- \"'Exploratory data analysis' is an attitude, a state of flexibility, a willingness to look for those things that we believe are not there, as well as those we believe to be there.\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The tools of EDA\n", "\n", "Being able to load in a data set and quickly start exploring it graphically enables you to _think_ about your data set instead being mired in the mechanics of producing a plot. In the notebooks that follow in this lesson, we will learn how to use the Python-based tools for EDA. In particular, we will learn how to use [Pandas](https://pandas.pydata.org) to keep the data set organized and accessible, and [Bokeh](https://docs.bokeh.org/en/latest/) and [HoloViews](https://holoviews.org/) to make interactive graphics.\n", "\n", "Along the way, we will learn key concepts of data organization and display. Importantly, we will learn about **tidy data**, **split-apply-combine**, and how to **plot all of your data**.\n", "\n", "Before we march on this trajectory, though, we need to learn a bit about [Numpy](http://numpy.org/) and [Scipy](http://scipy.org/), which form the foundation upon which much of these tools are built." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" } }, "nbformat": 4, "nbformat_minor": 4 }