{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Package basics\n", "\n", "*This recitation was written by Patrick Almhjell.*\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Packages were explained to you in [Lesson 2](packages_and_modules.ipynb), so you should have a general idea of how they work. But we'll go over the important things again.\n", "\n", "You should think of a package as a collection of modules with some instructions for how they interact:\n", "\n", "- Some of these instructions are set in a file called `__init__.py`.\n", "- Each module should contain python objects (functions and classes\\*, mainly) that are related, and make sense being together.\n", " - Modules _should_ interact with one another intuitively and productively. So, you can and should import modules within other modules.\n", " - Modules _should not_ mix actual code that performs fundamentally different tasks within the same module.\n", "\n", "(\\*We won't be discussing classes in this course, but feel free to reach out if you want to learn about them!)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Package architecture\n", "Personally, I've developed a preference for having instrument-centric data wrangling and processing modules. So, my package that I use and share with my lab looks something like this:\n", "\n", " Package Component Description\n", " --------------------- ----------------------------------\n", " /arnoldLab_utils <------ this is the root directory\n", " /arnoldLab_utils <------ this is where the guts of the package are kept\n", " __init__.py <------ this specifies what/how things are imported into the 'namespace'; see below\n", " tecan.py <------ submodule for Tecan plate reader data\n", " LCMS.py <------ submodule Agilent LCMS data\n", " screening_utils.py <------ this helps work up screening data from either Tecan or LCMS\n", " viz.py <------ my favorite module; helps me visualize my data in many ways\n", " /tests <------ tests, to be used with pytest\n", " /templates <------ useful template files; e.g., to map conditions to a 96-well plate\n", " setup.py <------ helps install the package\n", " README.md <------ gives details on the package, how to install/contribute, etc.\n", " \n", "Here, `tecan.py` contains code for wrangling data from our Tecan plate reader (you'll be doing this in an upcoming problem set!), and `LCMS.py` file contains code for wrangling data from our Agilent LCMS.\n", "\n", "Most times I'm screening enzyme variants for activity. So, I have a module called `screening_utils.py` that interacts with these other modules when I'm doing that, allowing me to specify which wells of a 96-well plate are controls, doing background subtractions/normalizations (after validating that I _should_ be doing subtraction or normalization), etc.\n", "\n", "However, I'm not always using the Tecan or LCMS for screening, so they don't _have_ to interact with `screening_utils.py`. Other times I might be doing a BCA assay (for protein quantification) on the Tecan or looking at single analytical reactions on the LCMS. So they also provide that functionality. But it all starts with working up the data and getting it into a usable format.\n", "\n", "To drive this home, I'll present an excellent quote from [Griffin Chure](https://gchure.github.io) when we were discussing this:\n", "\n", "
\n", "\n", "**Code you write should be separated by what it does.**\n", " \n", "
\n", "\n", "In that vein, my module `viz.py` thus contains functions for making informative plots from data that I collect on a daily basis, usually (but not limited to) data from the Tecan or LCMS. Vizualization is not mixed with the processing or analysis.\n", "\n", "Finally, I have the usual `setup.py`, `README.md`, and `__init__.py` files as well as a `tests/` directory (more on these soon). You'll also see a `templates/` directory, which is where I keep basic templates that can help streamline some functions. This is another nice thing about a package: you can keep anything in the root directory (or almost anywhere, really) that might be essential or helpful for the user, such as templates, documents, example data, etc." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The `__init__.py` file and namespaces\n", "As described in [Lesson 2](packages_and_modules.ipynb), the `__init__.py` file provides instructions about how the modules are imported.\n", "\n", "Mine looks something like this:\n", "\n", "```python\n", "from .tecan import *\n", "from .LCMS import *\n", "from .viz import *\n", "\n", "__author__ = 'Patrick Almhjell'\n", "__email__ = 'palmhjell@caltech.edu'\n", "__version__ = '0.0.1'\n", "```\n", " \n", "The other modules are handled within those three import statements, so that's really all I need. This is a pretty common import style.\n", "\n", "What this means is that when I run `import arnoldLab_utils as ut` in my python session, any given function within `tecan.py`, `LCMS.py`, `viz.py` is accessible to me with `ut.function()`.\n", "\n", "In other words, these functions are available within the ***namespace*** of `ut`. Namespaces help keep python objects separate, which is a very good thing." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### A quick aside on namespaces:\n", "\n", "Say I decide to make a function called `slice()`, which could be used to slice a dataset at a given value and only give me the entries above it (my \"hits\", as they're called when we're doing screening). This seems innocuous enough. However, you may notice that `slice()` is a built-in function in python:\n", "\n", "```python\n", "slice()\n", "```\n", "\n", "So, if we just had a function we imported called `slice()`, we'd overwrite the built-in function. This is *not* something you want to do.\n", "\n", "Namespaces solve this issue, because we import our function as `ut.slice()`, rather than into the global namespace. (Though, generally you should try not to conflict with built-ins.)\n", "\n", "So I'll issue a warning here:\n", "\n", "
\n", "\n", "**Don't import a module into the global namespace (`from module import *`) unless you are really sure you will not get a name clash.** (And even then, be careful.)\n", " \n", "
\n", "\n", "An alternative `__init__.py` import statement might look something like this:\n", "\n", "```python\n", "from . import tecan\n", "from . import LCMS\n", "from . import viz\n", "```\n", " \n", "where you then access a function in `tecan.py` with `ut.tecan.function()`.\n", "\n", "[You can find more on \\_\\_init\\_\\_.py and package architecture here.](https://towardsdatascience.com/whats-init-for-me-d70a312da583)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Computing environment" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPython 3.7.4\n", "IPython 7.8.0\n", "\n", "jupyterlab 1.1.4\n" ] } ], "source": [ "%load_ext watermark\n", "\n", "%watermark -v -p jupyterlab" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" } }, "nbformat": 4, "nbformat_minor": 4 }