{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Let's make a package!\n", "\n", "*This recitation was written by Patrick Almhjell.*\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So, where do we start?\n", "\n", "Let's revisit the code to parse my data and make a plot from [before](sharing_code.ipynb) and think about how we could start writing a package based on it.\n", "\n", "I know what you may be thinking: \"Patrick, you're dumb, that's only three functions. It's certainly not enough for a package.\"\n", "\n", "And, perhaps you're right. But a package is designed to be managed, improved, expanded. You will almost never make a package and be done with it in a single commit and push. So, setting the foundation for a package with a few key functions and an idea of how it will expand is just as important as setting up a meaty package.\n", "\n", "So, let's do it!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setting up the repo on GitHub\n", "\n", "First thing's first: we need to set up a repository on GitHub so it's under version control and (eventually) distributable. It's very easy.\n", "\n", "
\n", " \n", "![Repo set-up on GitHub](figs/set_up_repo.png)\n", " \n", "
\n", "\n", "The automatic addition of a `.gitignore` file is really nice. Here's some of the contents of the default Python one:\n", "\n", "```\n", "...\n", "\n", "# Distribution / packaging\n", ".Python\n", "build/\n", "develop-eggs/\n", "dist/\n", "downloads/\n", "eggs/\n", ".eggs/\n", "lib/\n", "lib64/\n", "parts/\n", "sdist/\n", "var/\n", "wheels/\n", "*.egg-info/\n", ".installed.cfg\n", "*.egg\n", "MANIFEST\n", "\n", "...\n", "\n", "# Unit test / coverage reports\n", "htmlcov/\n", ".tox/\n", ".coverage\n", ".coverage.*\n", ".cache\n", "nosetests.xml\n", "coverage.xml\n", "*.cover\n", ".hypothesis/\n", ".pytest_cache/\n", "\n", "...\n", "\n", "# Jupyter Notebook\n", ".ipynb_checkpoints\n", "\n", "...\n", "```\n", "\n", "You certainly don't need _all_ of this, but a lot of it is good, especially when working with Jupyter (ignoring `.ipynb_checkpoints`) and after you build the package (`build/`, `wheels/`, `egg-info/`, etc.).\n", "\n", "After doing this, you'll have a repo on GitHub that you can clone onto your machine and start working with.\n", "\n", "
\n", " \n", "![Initialized repo on GitHub](figs/initial_repo.png)\n", " \n", "
\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Adding our code\n", "\n", "As we said in the last section, your package will always follow this format:\n", "\n", " /package_name\n", " /package_name\n", " __init__.py\n", " module_1.py\n", " module_2.py\n", " ...\n", " ...\n", " README.md\n", " \n", "So, really all we're missing is the coding guts of our package... which we already have! We just have to put it into modules and then place them in another directory called `tinypkg` within our repo.\n", "\n", "Now, let's think about how we want to do this, keeping in mind what we discussed before.\n", "\n", "We have three functions. Two of them perform some sort of general utility, and one of them makes a visualization. To me, this seems like it would be well-organized as follows:\n", "\n", " /tinypkg\n", " /tinypkg\n", " __init__.py\n", " general_utils.py <------ contains check_df_col() and check_replicates()\n", " viz.py <------ contains plot_timecourse()\n", " ...\n", " ...\n", " \n", "If you read my code, you might disagree. Since `check_replicates()` at its core is performing data analysis and processing, it might not be a general utility. So perhaps this is better:\n", "\n", " /tinypkg\n", " /tinypkg\n", " __init__.py\n", " general_utils.py <------ contains check_df_col()\n", " analysis.py <------ contains check_replicates()\n", " viz.py <------ contains plot_timecourse()\n", " ...\n", " ...\n", " \n", "It's really up to you. At this point it's a bit silly to have three functions in three separate files. But I think we can all agree that these don't belong in the same module.\n", "\n", "And, furthermore, this isn't about now. This is about the future, and helping out ***Future You!*** (And your labmates.) When you have more utilities, analysis functions, or plotting functions (which you certainly will, in practice), *you know exactly where to put them* and how they will interact.\n", "\n", "We'll set this up in the original two-module way. Make these changes in the repo you just cloned and `git add`, `git commit`, and `git push` these changes. You'll find this:\n", "\n", "
\n", " \n", "![Initial code](figs/initial_code.png)\n", " \n", "
\n", "\n", "Here's a look at the modules themselves.\n", "\n", "
\n", "\n", "`general_utils.py`\n", "\n", "```python\n", "import numpy as np\n", "import pandas as pd\n", "\n", "\n", "def check_df_col(df, column, name=None):\n", "\n", "# ...\n", "\n", " \n", "def check_replicates(df, variable, value, grouping):\n", " \n", "# ...\n", "```\n", "
\n", "\n", "\n", "\n", "`viz.py`\n", "\n", "```python\n", "import numpy as np\n", "import pandas as pd\n", "\n", "import holoviews as hv\n", "hv.extension('bokeh')\n", "\n", "import tinypkg.general_utils as utils\n", "\n", "\n", "def plot_timecourse(df, variable, value, condition=None, split=None, sort=None, cmap=None, show_all=False,\n", " show_points='default', legend=False, height=350, width=500, additional_opts={}):\n", "# ...\n", " \n", " # Check columns\n", " utils.check_df_col(df, variable, name='variable')\n", " utils.check_df_col(df, value, name='value')\n", " utils.check_df_col(df, condition, name='condition')\n", " utils.check_df_col(df, split, name='split')\n", " utils.check_df_col(df, sort, name='sort')\n", "\n", "# ...\n", "\n", " # Check for replicates; aggregate df\n", " groups = [grouping for grouping in (condition, split) if grouping is not None]\n", " if groups == []:\n", " groups = None\n", " replicates, df = utils.check_replicates(df, variable, value, groups)\n", "```\n", "\n", "
\n", "\n", "Notice here the import of `general_utils.py` as `utils`, which means we call our `check_...()` functions with `utils.check...()`. This is nice, because we are very explicit about where we are getting our functions.\n", "\n", "And last but not least, we need to add our `__init__.py` file:\n", "\n", "```python\n", "from .general_utils import *\n", "from .viz import *\n", "\n", "# ...\n", "```\n", "\n", "You can look through this package on GitHub: [https://github.com/palmhjell/tinypkg](https://github.com/palmhjell/tinypkg). It has additional contents not shown above, which you'll learn about in the next sections." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Computing environment" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPython 3.7.4\n", "IPython 7.8.0\n", "\n", "jupyterlab 1.1.4\n" ] } ], "source": [ "%load_ext watermark\n", "\n", "%watermark -v -p jupyterlab" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" } }, "nbformat": 4, "nbformat_minor": 4 }