Tips & Tricks for using Jupyter Notebooks
In the past semester at Olin, I've been using a LOT of Jupyter Notebooks to work with Python code interactively. As a tool, Jupyter Notebooks are great! That is, unless you need to manage or organize them with git... or collaborate on one with a classmate... or want to access the code you wrote outside of the browser. What follows is a collection of tips I've assembled to make living around, and in, Jupyter Notebooks less painful. This is likely heavily skewed towards the needs of Olin students, but I will try to make it generally applicable too.
# Finding Differences
When working with a partner or by myself, git always seems to ruin the Jupyter Notebook party. Because it generates JSON files and not plain-text, looking at the differences between files and versions is incredibly annoying. For regular codebases that aren't wrapped in JSON, lots of tools exist for solving this problem(Meld, diffmerge, etc). Luckily, someone has made a tool that applies the typical diff/merge interface for notebooks!
In comes nbdime! This tool combines a few utilities into one pip package. We can install it by running:
pip install --upgrade nbdime in the terminal.
The first tool I want to highlight that comes with the nbdime package is nbshow. This command lets us get a quick idea of what is in a notebook without going to the hassle of starting the jupyter server and web client. It is used like cat:
nbshow <a notebook file>
Once we have an idea of what notebooks we’re working with, you can use nbdiff to compare two different ones quickly without wading through JSON. We can run nbdiff and pass it two files to compare by running:
nbdiff <notebook1> <notebook2> It then shows a diff in the terminal we ran the command in. The differences are as we would expect, sorted by cell order and with highlighted deletes and writes.
Now we know what we’ve changed between two notebooks, but we might not be able to visualize it. We can run:
nbdiff-web <notebook1> <notebook2> to start a web server that renders a similar diff to the previous command. However, this way images and figures will be rendered, and the formatting will be preserved!
Now we are ready for action. Let’s solve a conflict with:
nbmerge file_1.ipynb file_2.ipynb file_3.ipynb > merged.ipynb. Where:
file_1.ipynbis the common parent between the two files we want to merge
file_2.ipynbare our local changes to the base
file_3.ipynbare other file changes we want to merge in
Note, this method requires a common base notebook to reference from, and the command line tool outlined here will try to automatically resolve conflicts and write any remaining conflicts to metadata in the final merged file. However, we can specify how the tool handles conflicts with the
—merge-strategy parameter in the command which has the following options:
- inline (default option): merges notebooks and just marks conflicts in the merged notebook, requiring us to fix the final notebook by hand after running
- use-base: merges notebooks by using the base notebook value when there is a conflict.
- use-local: merges notebooks by using the local notebook value when there is a conflict.
- use-remote: merges notebooks by using the remote notebook value when there is a conflict (Can you see a pattern here?).
- union: attempts to include local AND remote values without marking as a conflict. This one can act strangely and I don’t recommend using it.
If we want to perform manual merge conflict decisions about notebooks, I suggest trying
nbmerge-web file_1.ipynb file_2.ipynb file_3.ipynb > merged.ipynb. This works just like the previous command, but starts a web server that gives us an easy way to choose which notebook to prefer in conflicts.
# ReviewNB Github Extension (Tentative to Recommend)
If we really want to live in GitHub, this GitHub extension might help us with that. It is also a diff tool, but does so inside of the GitHub web app, rendering Jupyter views in GitHub PRs and Diffs. It is installed on a per-repo basis.
The only issue is that this tool isn’t open source, and is currently using a temporary beta pricing structure. Right now, it is free to use. Eventually, upon completion, it will be free for open source repositories and have a fee for use with private repos.
# Terminal Access
Sometimes we might want to access the variables you defined in Jupyter notebook in an ipython terminal. Luckily, that is pretty easy to do by first opening a notebook and then running:
jupyter console --existing which will give you an ipython interactive terminal ready for use.
# Collaboration Without Git
If we aren't so keen on using git to collaborate, Colaboratory by Google might work well. It provides a Google-docs-like experience for editing jupyter notebooks that is entirely hosted by google. We can upload libraries, run terminal commands to install packages, upload images, and more. https://colab.research.google.com/notebooks/welcome.ipynb provides a getting started guide that goes into detail.
My friend Sam Daitzman wrote a handy-script to assist setting up Colaboratory notebooks with commonly used libraries in the Olin modeling class "Modsim." https://github.com/sdaitzman/ModSimPyColab.
# Make it Prettier!
If the aesthetic of the Jupyter Notebooks client isn’t pretty enough, we can theme it with https://github.com/dunovank/jupyter-themes! Jupyter-themes is a tool that allows us to pick from a number of different themes to style our Jupyter client. Here are some basic steps to get started:
- Install with
pip install jupyterthemes.
jt -lto list the available themes. (More examples on the GitHub page.)
- Choose a theme with
jt -t <the name of the theme>
- Reset to the default theme with
There are a lot more options available, so I suggest going to the GitHub page to see more detailed documentation.
# Use a Better GUI!
Typically though, for actual work, I prefer
jupyter lab. It provides more options, tabs, and an IDE-like interface for using notebooks. I believe that it is already installed if you have Jupyter installed, but if not, you can install it with:
conda install -c conda-forge jupyterlab. You can then run it with
jupyter lab. It opens any regular jupyter notebook in addition to regular code files.
# A Jupyter Alternatives
At one point in my interactive code journey, I grew weary of Jupyter Notebooks themselves and took a look for alternatives. Here’s the one I found which works pretty well:
Hydrogen is an installable package for the text editor Atom. It isn’t quite the same as Jupyter, instead of writing output and input into notebooks, it uses regular code files. We can interactively run code and it will pop up outputs that are essentially temporary. It is important to note that it actually still uses Jupyter as a server while Hydrogen is a client.
So go forth and learn more about Python and Data Science!!