Improving and collaborating

This recitation was written by Patrick Almhjell.


So far we’ve only looked at getting a basic package off the ground. Now we’re going to look at some useful approaches and features that will help you improve its functionalities and distribute your package to others.

Note: This notebook is designed to be run as an exercise.

Clone tinypkg with:

git clone https://github.com/palmhjell/tinypkg.git

Then navigate to that directory and run:

pip install -e .

and then make the changes where specified by:

To do:

Editing is (super) easy

Transitioning from working in Jupyter notebooks with self-contained code to working with modules and packages can be scary. But testing edits to your package is actually really easy to do in Jupyter. If set up properly (as we are going to do below), you can make changes directly in your package and test the results in real time in a notebook.

First, run the magic command autoreload.

[1]:
%load_ext autoreload
%autoreload 2

import tinypkg
[2]:
# Check out the version
tinypkg.__version__
[2]:
'0.0.2'

Right. Version 0.0.2, because that’s what’s currently up on the repo.

Now, let’s make a change to __init__.py locally. I’m going to open it in Jupyter’s text editor and increase the version to 0.0.3.

To do: Change __version__ to ‘0.0.3’ in __init__.py in tinypkg/tinypkg/.

[3]:
# Now test if it worked
tinypkg.__version__
[3]:
'0.0.3'

Look at that! No need to reinstall, reimport, or anything. The autoreload function and the --editable pip install allows dynamic editing. This is very helpful, and how I would recommend making changes.

Once things seem good, you can commit and push.

Creating stable releases

As you improve your package, you may find that you want to create checkpoints that are stable and functional. While ‘version control’ is technically handled by git every time you make a commit, sometimes it can be hard to traverse these commits to find the most representative version of your package that you want.

You can instead create releases in GitHub that are stable versions of your code. If you know your code worked with a given release, you might checkout that old release and use that for your analysis.

Releases are easy, and you know how to do them since this is how you turn in your homework. I created a release of my initial package, with the tag v0.0.1. You should follow a consistent form of tagging and naming your releases to make them easy to follow. Look into semantic versioning.

Initial release

Let’s check it out with:

git checkout tags/v0.0.1

(Note: from the commandline, you can see a list of all tags with git tag -l.)

To do: run git checkout tags/v0.0.1 in tinypkg/.

Now check the version.

[4]:
tinypkg.__version__
[4]:
'0.0.1'

Ta-da! We are now running that version, because git checkout set us back there.

We can get back to the original state by running

git checkout master

To do: run git checkout master in tinypkg/.

[5]:
tinypkg.__version__
[5]:
'0.0.2'

Collaboration

Sharing your code is quite simple: just have someone clone your repo and install it with:

pip install -e .

Alternatively, if you don’t expect them to be making changes to the package, a more static build can be accomplished with:

python setup.py install

from the root directory.

But, let’s assume we want people to collaborate on our package. There are many useful types of collaboration. Since these are all primarily accomplished through GitHub, more details on each of these can be found in Recitation 2.

Raising issues

Perhaps the most productive and simple-for-all-parties approach to collaborating on a package is to have people use it and raise issues on GitHub. In this case, you are the primary contributor to you package, but others can list bugs, enhancement ideas, or even code for enhancements without having to navigate your code themselves.

Issues have been amazing for me, as I always have a record of what needs to be done and can complete them when I think up a fix and have time to implement it. It’s a much more robust method than someone emailing me and mentioning in passing that there’s an issue. I will, 100% of the time, forget. Seriously.

Here’s what it looks like for my lab package currently:

Issues

(As an aside, I don’t really recommend private repos, and I would like to make mine public soon. However, most of my code is developed on data from the lab that might not be published or okay to disclose yet, so I exercise caution here. Do what you think is best.)

Forking

Forking allows one to make a copy of the repository in their own GitHub, clone it to their machine, and then submit pull requests to the original repo when they have an enhancement. The pull request would be looked over by you (the owner of the repo) and then merged. The contributing user does not need to be an explicit collaborator on the repository.

Adding collaborators directly

If you trust your labmates, or anyone using the package, you can add them as collaborators on the project. This is done in the Settings > Collaborators tab on Github. Collaborators then have many of the same editing privileges as you, making it easier for them to make and push changes.

In addition to making and pushing changes directly to the repo (on the “master” branch), one might make new branches. Branching fills a similar function to forking, but without making a full copy of the repo for the contributing user. Branches are very flexible: multiple branches can be made for different projects and folded back into the project when finished.

That’s it!

You now have the skills to build, improve, and distribute a package. Useful functions can then be brought into your notebooks with a single import and are always under git’s robust version control.

Of course, there is always more to discuss. Two things worth briefly mentioning are adding required packages to setup.py and testing with travis. It is also worth checking out the Cookiecutter package, which conveniently provides templates for setting up packages, though it includes many more features beyond what you might include for a small package for specific lab purposes. Additionally, you might want to add your package to PyPi so anyone can pip install it.

Finally, as you might guess, tinypkg is not a real package that I will be using or maintaining. So, feel free to experiment on it (try forking, pull requests, etc.) and see what happens! The good news: you can’t permanently ruin it, because (as you well know) it’s under version control. How convenient.

Computing environment

[3]:
%load_ext watermark

%watermark -v -p jupyterlab,tinypkg
CPython 3.7.4
IPython 7.8.0

jupyterlab 1.1.4
tinypkg 0.0.2