Data display

In 1977, John Tukey, one of the prominent statisticians and mathematicians in history, published a book entitled Exploratory Data Analysis. In it, he laid out general principles on how researchers should handle their first encounters with their data, before formal statistical inference. Most of us spend a lot of time doing exploratory data analysis, or EDA, without really knowing it. Mostly, EDA involves a graphical exploration of a data set.

We start off with a few wise words from John Tukey himself, chosen from that brilliant book.

Clearly data display, or plotting, is central to exploratory data analysis.

The Python visualization landscape

Let us start by looking at some of the many plotting packages available in Python. In a talk at PyCon in 2017, Jake VanderPlas, who is one of the authors of one of them (Altair), gave an overview of the Python visualization landscape. That landscape is depicted below, taken from this visualization of it by Nicolas Rougier. (It is from 2017, so it is dated, and definitely not complete, notably missing Panel and domain-specific plotting like napari and Folium, for example.)

Python visualization landscape.
Figure 1: A somewhat dated, but reasonably complete, picture of the landscape of data visualization packages in Python.

The landscape is divided into three main pods based on the low-level renderer of the graphics, JavaScript, Matplotlib, and OpenGL (though Matplotlib is higher-level than JavaScript and OpenGL). We will not discuss packages based on OpenGL. Packages that use JavaScript for rendering are particularly well suited for interactivity in browsers. Interactivity and portability (accomplished by rendering in browsers) are key features of modern plotting libraries, so we will use JavaScript-based plotting in the workshop (as I do in my own work).

Though we will be using Bokeh (and a little bit of HoloViews/Datashader), for a neuroscientist working in Python, it is important to take note of the following.

  • Matplotlib is by far the most widely used plotting package in Python. It was even developed a neuroscientist! Seaborn is also widely used as a higher level statistical plotting package that has Matplotlib as its backend (and also developed by a neuroscientist!). We choose to use Bokeh because it is effective
  • There are many neuroscience-specific packages that have plotting modules, like MNE, Nilearn, and SpikeInterface. Here, we are focusing on general tools. If you have master of lower-level plotting software, you can get much more out of domain-specific packages. You are also unshackled to do the visualization you want to do, and not just those that are available.