Logo

Lessons

  • 0. Preparing computing resources for the course
  • 1. The cycle of science
  • 2. Version control with Git
  • 3. Introduction to Python
  • E1. To be completed after lesson 3
  • 4. Style
  • 5. Test-driven development
  • 6. Exploratory data analysis, part 1
  • E2. To be completed after lesson 6
  • 7. Exploratory data analysis, part 2
  • E3. To be completed after lesson 7
  • 8. Data file formats
  • 9. Data storage and sharing
  • 10. Data wrangling
  • E4. To be completed after lesson 10
  • 11. Intro to probability
  • E5. To be completed after lesson 11
  • 12. Overplotting
  • 13. Dashboards
  • 14. Plug-in estimates and confidence intervals
  • 15. Random number generation
  • 16. Probability distributions
  • E6. To be completed after lesson 16
  • 17. Null hypothesis significance testing
  • 18. Nonparametric inference with hacker stats
  • E7. To be completed after lesson 18
  • 19. Parametric inference
  • 20. Maximum likelihood estimation
  • E8. To be completed after lesson 20
  • 21. Model assessment
  • 22. Regression
  • E9. To be completed after lesson 22
  • 23. Reproducible workflows
  • 24. The paper of the future
  • 25. Mixture models
  • 26. Implementation of model assessment
  • E10. To be completed after lesson 26
  • 27. Statistical watchouts

Recitations

  • R1. The command line
  • R2. Git/Github tips and traps
  • R3. Time series and data smoothing
  • R4. Manipulating data frames
  • R5. Probability review
  • R6. Intro to image processing
  • R7. Topics in bootstrapping
  • R8. Review of maximum likelihood estimation
  • R9. Wild and residual bootstrap
  • R10. Packaging and package management
    • Navigating Packages
    • Package Architecture
    • The innards of a package
    • Publication of packages
    • Survey of other packages

Homework

  • 0. Configuring your team
  • 1. Practice with Python
  • 2. Practice with Numpy and plotting
  • 3. Exploratory data analysis I
  • 4. Exploratory data analysis II
  • 5. Dashboards
  • 6. Random number generation and probability distributions
  • 7. Nonparametric hacker stats
  • 8. Parametric inference
  • 9. Maximum likelihood estimation
  • 10. Model comparison
  • 11. Course feedback

Schedule

  • Schedule overview
  • Homework due dates
  • Lesson exercise due dates
  • Weekly schedule

Policies

  • Meetings
  • Lab sessions
  • Lessons and lesson exercises
  • The BE/Bi 103 GitHub group
  • Homework
  • Grading
  • Collaboration policy and Honor Code
  • Excused absences and extensions
  • Course communications
  • “Ediquette”
BE/Bi 103 a
  • »
  • R10. Packaging and package management »
  • Navigating Packages
  • Open in Google Colab | Download notebook

Navigating Packages

This lesson was developed by Rosita Fu based off of work by Patrick Almhjell.


Before diving in to how packages are made, I think it’s important to see how the user-end of things motivates the developer. We’ll briefly talk about how to navigate packages and their modules and methods and how soothing docstrings are to users who are new to your package. - In case there’s any confusion, a module is a file that ends with .py. This file contains classes, functions, variables, and other objects. - A method is just another name for functions that belong to a particular object, in this case your module and package. - A package contains several related modules and are all grouped together under one name. Within those modules are your methods. Some packages you’ve extensively used involve Numpy, Scipy, Pandas, and Bokeh. You can find third party packages on PyPI, and install ’em using pip, the self-referential acronym Pip Installs Packages.

To access modules in a package, we have to import the name of the package. As your computer reads the import statement, the interpreter stores its contents in memory.
- If you plan on updating your package, your interpreter will not know about those changes, so you’ll have to restart the kernel). - If your package is not in Python’s Standard Library, you’ll have to pip install pkg_name so your machine knows what pkg_name is. - When developing modules, this can get annoying. You can use the magic function and autoreload extension to get around this.
%load_ext autoreload
%autoreload 2

import pkg_name
  • Your import statements conventionally go at the top of your file, just after any module comments and docstrings, and before module globals and constants. For the sake of this discussion, things will be a little out of order. But generally, they should be grouped in the following order, with a blank line in between each group:

    1. standard library imports

    2. related third party imports

    3. local application/library specific imports

I’ll be using my own package chromatose for reference since there are only two modules palettes.py and viz.py and structurally contains the bare minimum for a functioning package. The simplicity makes it a little easier to navigate than giant libraries like numpy. You can pip install chromatose in terminal, or just follow along.

[2]:
import chromatose
Loading BokehJS ...

Chromatose is a long word. Let’s use an alias instead with the keyword as.

[3]:
import chromatose as ct

I generally only use aliases when I absolutely am not in a typing mood, or for very large libraries like Numpy or Pandas where I wouldn’t even recognize np or pd as anything else, or when I don’t anticipate using the reference for other variable names. ct is often used as count, so it might be a bit fragile as an alias, but it is used here mainly for demonstrative purposes.

To see what’s inside, we can use the help() method or place ?? after.

[4]:
ct??
Type:        module
String form: <module 'chromatose' from '/Users/bois/opt/anaconda3/lib/python3.8/site-packages/chromatose/__init__.py'>
File:        ~/opt/anaconda3/lib/python3.8/site-packages/chromatose/__init__.py
Source:
from .palettes import *
from .viz import *

__author__ = "Rosita Fu"
__version__ = "0.0.2"
__license__ = "MIT"
__email__ = "rfu@caltech.edu"

To use the functions and retrieve the variables inside the modules we use dot syntax. If a function/variable is within the namespace of a package, you can directly access those functions with either pkg.function() or pkg.module.function(). This is nice, because we are very explicit about where we are getting our functions.

  • For example, in numpy, you have to call np.random.choice(), and cannot simply call np.choice(). The choice function is thus not in the namespace of numpy. When making a package, there are ways to control the user-end of calls with different import statements in your __init__ file, but we’ll talk about this later.

  • As a side note, sometimes people import modules with

from module_name import *

Although you can do the same with packages, I find it hectic and discourage it. Essentially the statement’s unpacking and plopping all that code directly into your file, instead of bundling it inside module_name. All the variable names defined within module_name are vulnerable to user manipulation, and potentially overwrite built-in functions. Obviously if you import packages this way, you can just access the variables and functions without dot syntax, but again, this is a chaotic way to live. - As an example, if there was a function called slice() in module_name.py, and you imported the module like above, you would no longer be using Python’s slice() function.

In general, DON’T import modules into the global namespace unless you are absolutely positively sure you/users will not get a name clash.

Inside palettes.py are a bunch of variables that are small lists of hex values, the names of which can be found in the README, or again, using help() or ??.

[5]:
ct.pepo
[5]:
['#29937b', '#044032', '#902a42', '#3f0914', '#e4607c']
[6]:
ct.palettes.pepo
[6]:
['#29937b', '#044032', '#902a42', '#3f0914', '#e4607c']

As you can see, they both reference the same object, but note that pepo is not a file in our package, it is actually a variable inside the palettes.py module. Dot syntax is very powerful, and python uses it to retrieve variables, functions, and modules alike.

The __xxx__ variables are typically informative strings that you can call as well. The most useful one I have found when navigating other packages is the __version__ string.

[7]:
import pandas as pd
pd.__version__
[7]:
'1.1.3'
[8]:
ct.__version__
[8]:
'0.0.2'

Similarly, the functions inside viz.py like palplot can be accessed either with ct.viz.palplot() or ct.palplot(). You can view the docstring with ?. Docstrings inform the user of input types and expected returns, additional kywargs, and the overall purpose for its existence.

[9]:
ct.viz.palplot?
Signature:
ct.viz.palplot(
    palette,
    plot='all',
    bg_color='white',
    alpha=1.0,
    shuffle=False,
    scatter_kwargs=None,
)
Docstring:
Displays palette via bokeh. Hover for hex/rgb value.
Arguments
---------
palette : list of hex strings or rgb tuples or HTML names (any combination)
plot :
    'swatch' for squares,
    'pie' for wedges (adjacency comparison),
    'points' for some points,
    'line' for some lines,
    'scatter' for a scatterplot,
    'all' for all (with dropdown menu for lines/points/scatter)
bg_color : background fill color,
    valid name hex or rgb
alpha : alpha of entire palette,
    fraction btw 0.0 and 1.0
shuffle : shuffles palette, boolean,
scatter_kwargs : dicitonary, 'click_policy' is boolean,
    if True, legend is on plot and can click/hide
    if False, legend is off plot, no overlap
File:      ~/opt/anaconda3/lib/python3.8/site-packages/chromatose/viz.py
Type:      function

As the docstring states, the input palette expects a list of colors either in the form of hex values, rgb tuples, or HTML names, and returns a widget for the user.

[10]:
ct.palplot(ct.pepo)
[10]:

Now let’s figure out how it actually works!

Previous Next

Last updated on Dec 02, 2021.

© 2021 Justin Bois and BE/Bi 103 a course staff. With the exception of pasted graphics, where the source is noted, this work is licensed under a Creative Commons Attribution License CC-BY 4.0. All code contained herein is licensed under an MIT license.

This document was prepared at Caltech with financial support from the Donna and Benjamin M. Rosen Bioengineering Center.



Built with Sphinx using a theme provided by Read the Docs.