Publication of packages


Github

When making a repo for your package, the repo name is the name of pkg_name from before.

/pkg_name               # repo-name is at this level
  /pkg_name
    __init__.py
    module1.py
    module2.py
    module3.py
    ...
  setup.py
  README.md

Installation in Development Mode

Once your basic package architecture is built, you can install it locally using pip. Before installing, make sure you are in the directory immediately above your package. If my package is in ~/bebi103a/pkg_name/, I would cd ~/bei103a/, then do the following on the command line:

pip install -e pkg_name

The -e flag is important, which tells pip that this is a local, editable package. Your package is now accessible on your machine whenever you run the Python interpreter! Note that the -e flag is present when installing your own local package that is not yet in PyPI.

For most of your own packages, this setup is ideal. As your making changes, you can test them in Jupyter notebooks by importing the package in a fresh cell.

Live Editing

As a small example of how you’ll be editing your packages, I’ll pull a toy package I made last night. We’ll edit the version numbers locally and see how you the jupyter interface can streamline the process. You can follow along by cloning the repo octopus:

git clone https://github.com/atisor73/octopus

We’ll first cd to the directory immediately above, and run

pip install -e octopus

Alternatively, you could stay in the directory and run

pip install -e .

Now you have a package! Let’s import it into this notebook with autoreload, that we don’t have to keep restarting our kernel every time we make a change.

[1]:
%load_ext autoreload
%autoreload 2

import octopus as op
[2]:
op.__version__
[2]:
'0.0.1'

Cool.

Let’s open up the __init__.py file in octopus/octopus/ and change the string in the __version__ variable.

[3]:
op.__version__
[3]:
'0.0.2'

:O Would ya look at that. No need to reinstall, reimport, or anything. The autoreload function and the –editable pip install allows dynamic editing. This is very helpful, and how I would recommend making changes. Once things look ok, you can commit and push.

In the case all went to hell and we want to get back to the original state, we can just run:

git checkout main

Now let’s check the version number again.

[5]:
op.__version__
[5]:
'0.0.1'

Creating Stable Releases

As you improve your package, you may find that you want to create checkpoints that are stable and functional. While “version control” is technically handled by git every time you make a commit, sometimes it can be hard to traverse these commits to find the most representative version of your package that you want.

You can instead create releases in GitHub that are stable versions of your code. If you know your code worked with a given release, you might check out that old release and use that for your analysis.

Releases are easy, and you know how to do them since this is how you turn in your homework. I created a release of my initial package, with the tag v0.0.1. You should follow a consistent form of tagging and naming your releases to make them easy to follow.

Collaboration

Sharing your code is quite simple: just have someone clone your repo and install it with:

pip install -e .

Alternatively, if you don’t expect them to be making changes to the package, a more static build can be accomplished with:

python setup.py install

from the root directory.

But, let’s assume we want people to collaborate on our package. There are many useful types of collaboration. Since these are all primarily accomplished through GitHub, more details on each of these can be found in Recitation 2

Raising Issues

Perhaps the most productive and simple-for-all-parties approach to collaborating on a package is to have people use it and raise issues on GitHub. In this case, you are the primary contributor to you package, but others can list bugs, enhancement ideas, or even code for enhancements without having to navigate your code themselves.

Issues have been amazing for me, as I always have a record of what needs to be done and can complete them when I think up a fix and have time to implement it. It’s a much more robust method than someone emailing me and mentioning in passing that there’s an issue. I will, 100% of the time, forget. Seriously.

Forking

Forking allows one to make a copy of the repository in their own GitHub, clone it to their machine, and then submit pull requests to the original repo when they have an enhancement. The pull request would be looked over by you (the owner of the repo) and then merged. The contributing user does not need to be an explicit collaborator on the repository.

Adding collaborators directly

If you trust your labmates, or anyone using the package, you can add them as collaborators on the project. This is done in the Settings > Collaborators tab on Github. Collaborators then have many of the same editing privileges as you, making it easier for them to make and push changes.

In addition to making and pushing changes directly to the repo (on the “master” branch), one might make new branches. Branching fills a similar function to forking, but without making a full copy of the repo for the contributing user. Branches are very flexible: multiple branches can be made for different projects and folded back into the project when finished.

Test-Driven Development

We can make some testing modules in a new directory called tests/ I’ve added the module test_octopus_functions.py with the following import:

import numpy as np
import octopus.ink

def test_ocotopi():
    assert float(octopus.ink.octopi()) is 3.1415926
[50]:
!pytest -v ~/octopus
============================= test session starts ==============================
platform darwin -- Python 3.7.7, pytest-6.1.1, py-1.9.0, pluggy-0.13.1 -- /Users/rosita/anaconda3/bin/python
cachedir: .pytest_cache
rootdir: /Users/rosita/octopus
plugins: dash-1.12.0
collected 1 item

../../../octopus/tests/test_octopus_functions.py::test_octopi PASSED     [100%]

============================== 1 passed in 0.29s ===============================

Great! Obviously with more involved packages, you’ll want to separate test modules correspondingly, and be more principled in the naming.

Publishing to PyPI

You might want to add your package to PyPI so anyone can pip install it, without having to clone the repo. If a package is new and not already installed on a user’s local machine, pip will look through PyPI and retrieve files from there. Thus the goal is to upload our package to PyPI! The way I learned to do this was actually through a blog post written by Jeff Hale. I’ve reproduced some of the relevant steps below.

  1. After installing Twine in an activated virtual environment, I installed my dependencies with:

pip install -r requirements.txt
  1. Then to create your package files, run the following:

python setup.py sdist bdist_wheel

This will create many hidden folders like .dist and .build. Inside .dist, there is a .whl file that is your wheel file. The .tar.gz file is the source archive (this is kinda like the files you download from a tag release on Git). Generally, pip will install packages as wheels whenever it can. The process is faster, but if pip struggles with installing the wheel, it will fall back on the source archive.

  1. After you have these valuable files, you should create a TestPyPI account here, as well as a PyPI account. These are two separate accounts!

  2. Use Twine to securely publish your package to TestPyPI with the following command (no modifications are necessary).

twine upload --repository-url https://test.pypi.org/legacy/ dist/*

Enter your username and password. If there are any errors (sometimes you’ll have typos in your text files and whatnot), make a new version number in setup.py and delete the old build artifacts build, dist, and egg folders. You can rebuild with steps 2. and 3. and re-upload with twine. Version numbers on TestPyPI are meaningless, you’re the only one who will see these. Just don’t forget to change it back to the original version once you do get everything to work.

  1. After uploading to TestPyPI, I would deactivate the current activated environment and start fresh in a new one. To see if you can successfully import your package, install it with:

pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple package_name
  • The --index-url flag tells pip to download from TestPyPI instead of PyPI

If everything works as it should…

  1. Push to PyPI! Here’s the code: (Make sure to chnage your version number back to the one you want.)

twine upload dist/*
  1. You can now push to GitHub. Exclude (delete) any virtual environments. The .gitignore file will keep build artifacts from being indexed. Hurrah!

Final thoughts

There’s only so much you can learn about packages from reading about them. I highly encourage you to find utilities you’re excited about, something you really truly see yourself using, and go through the process of turning it into a package. From then on, those functions can be utilized in your notebooks with a single import and will always be under git’s robust version control.

Of course, there is always more to discuss. Two things worth briefly mentioning are testing with travis and the Cookiecutter package, which conveniently provides templates for setting up packages, though it includes many more features beyond what you might include for a small package for specific lab purposes.

Finally, as you might guess, octopus is not a real package that I will be using or maintaining. So, feel free to experiment on it (try forking, pull requests, etc.) and see what happens! The good news: you can’t permanently ruin it, because (as you well know) it’s under version control. How convenient.