import os, glob
import numpy as np
import bebi103
import cmdstanpy
import arviz as az
import bokeh.plotting
import bokeh.io
bokeh.io.output_notebook()
= {
schools_data "J": 8,
"y": [28, 8, -3, 7, -1, 1, 18, 12],
"sigma": [15, 10, 16, 11, 9, 11, 10, 18],
}
= """
schools_code data {
int<lower=0> J; // number of schools
vector[J] y; // estimated treatment effects
vector<lower=0>[J] sigma; // s.e. of effect estimates
}
parameters {
real mu;
real<lower=0> tau;
vector[J] eta;
}
transformed parameters {
vector[J] theta = mu + tau * eta;
}
model {
eta ~ normal(0, 1);
y ~ normal(theta, sigma);
}
"""
with open("schools_code.stan", "w") as f:
f.write(schools_code)
with bebi103.stan.disable_logging():
= cmdstanpy.CmdStanModel(stan_file="schools_code.stan")
sm = sm.sample(data=schools_data, output_dir="./", show_progress=False)
samples
= az.from_cmdstanpy(samples)
samples
# Clean up
bebi103.stan.clean_cmdstan()for fname in glob.glob("schools_code*"):
os.remove(fname)
# Make a plot of samples
= bokeh.plotting.figure(
p =250, frame_width=250, x_axis_label="μ", y_axis_label="τ"
frame_height
)
p.scatter("mu"]),
np.ravel(samples.posterior["tau"]),
np.ravel(samples.posterior[=0.1
alpha
)
bokeh.io.show(p)
Appendix B — Configuring your computer to use Python for scientific computing
B.1 Why Python?
There are plenty of programming languages that are widely used in data science and in scientific computing more generally. Some of these, in addition to Python, are Matlab/Octave, Mathematica, R, Julia, Java, JavaScript, Rust, and C++.
I have chosen to use Python. I believe language wars are counterproductive and welcome anyone to port the code we use to any language of their choice, I nonetheless feel we should explain this choice.
Python is a flexible programming language that is widely used in many applications. This is in contrast to more domain-specific languages like R and Julia. It is easily extendable, which is in many ways responsible for its breadth of use. We find that there is a decent Python-based tool for many applications we can dream up, certainly in data science. However, the Python-based tool is often not the very best for the particular task at hand, but it is almost always pretty good. Thus, knowing Python is like having a Swiss Army knife; you can wield it to effectively accomplish myriad tasks. Finally, we also find that it has a shallow learning curve with most students.
Perhaps most importantly, specifically for neuroscience applications, is that Python is widely used in machine learning and AI. The development of packages like TensorFlow, PyTorch, JAX, Keras, and scikit-learn have led to very widespread adoption of Python.
B.2 Jupyter notebooks
The materials of this course are constructed from Jupyter notebooks. To quote Jupyter’s documentation,
Jupyter Notebook and its flexible interface extends the notebook beyond code to visualization, multimedia, collaboration, and more. In addition to running your code, it stores code and output, together with markdown notes, in an editable document called a notebook.
This allows for executable documents that have code, but also richly formatted text and graphics, enabling the reader to interact with the material as they read it.
Specifically, notebooks are comprised of cells, where each cell contains either executable Python code or text.
While you read the materials, you can read the HTML-rendered versions of the notebooks. To execute (and even edit!) code in the notebooks, you will need to run them. There are many options available to run Jupyter notebooks. Here are a few we have found useful.
- JupyterLab: This is a browser-based interface to Jupyter notebooks and more (including a terminal application, text editor, file manager, etc.). As of March 2025, Chrome, Firefox, Safari, and Edge are supported.
- VSCode: This is an excellent source code editor that supports Jupyter notebooks. Be sure to read the documentation on how to use Jupyter notebooks in VSCode. This may be an especially good option for Windows users.
- Google Colab: Google offers this service to run notebooks in the cloud on their machines. There are a few caveats, though. First, not all packages and updates are available in Colab. Furthermore, not all interactivity that will work natively in Jupyter notebooks works with Colab. If a notebook sits idle for too long, you will be disconnected from Colab. Finally, there is a limit to resources that are available for free, and as of March 2025, that limit is unpublished and can vary. All of the notebooks in the HTML rendering of this book have an “Open in Colab” button at the upper right that allows you to launch the notebook in Colab. This is a quick-and-easy way to execute the book’s contents.
For our work in this programming bootcamp, I encourage you to use either JupyterLab in the browser or VSCode, with Colab as a backup if you’re having trouble.
B.3 Marimo
Marimo offers a very nice notebook interface that is a departure from Jupyter notebooks in its structure. The biggest departure is that Marimo notebooks are specifically for Python, as opposed to being language agnostic like Jupyter. As a result, Marimo notebooks can offer many features not seen in Jupyter notebooks (without add-ons). The two most compelling, at least to me, are
- Marimo notebooks are simple
.py
files which allow for easier version control and simple execution as scripts. - Marimo notebooks are reactive, meaning that the ordering of the cells is irrelevant and the notebook runs all cells that need to be rerun as a result of a change of value of a variable in any given cell.
In the course, we will use Jupyter notebooks, but you are welcome to play with Marimo notebooks. Upon completing the installation instructions in this notebook, Marimo will be installed.
B.4 Ensuring you have a C++ toolchain
We will be using Stan for some of our modeling. Stan has a probabilistic programming language. Programs written in this language, called Stan programs, are translated into C++ by the Stan parser, and then the C++ code is compiled. As you will see throughout the class, there are many advantages to this approach.
There are many interfaces for Stan, including the two most widely used RStan and PyStan, which are R and Python interfaces, respectively. We will use a simpler interface, CmdStanPy, which has several advantages that will become apparent when you start using it.
Whichever interface you use needs to have Stan installed and functional, which means you have to have an installed C++ toolchain. Installation and compilation can be tricky and varies from operating system to operating system. The instructions below are not guaranteed to work; you may have to do some troubleshooting on your own. Note that you can use Google Colab (or other cloud computing resources) for computing as well, so you do not need to worry if you have trouble installing Stan locally.
You can read the CmdStanPy documentation about setting up the necessary tooling for your operatinve system. The long and short of it is that you do not need to do anything if you are using Windows. Likewise, if you are using Linux, a suitable C++ toolchain is typically preinstalled. For MacOS, you need to install Xcode command line tools by running the following on the command line.
xcode-select --install
B.5 Installing Python tools
Prior to embarking on your journey into data analysis, you need to have a functioning Python distribution installed on your computer. We will use pixi, a relatively new package manager that I have found very effective.
Importantly, it does so in a project-based way. That is, for each project, you use Pixi to create and manage the packages needed for that project. Our “project” here is our course!
Pixi is a package management tool that allows installation of packages. Importantly, it does so in a project-based way. That is, for each project, you use Pixi to create and manage the packages needed for that project. Our “project” here is our data analysis/statistical inference course.
Step 1: Install Pixi. To install Pixi, you need access to the command line. For macOS users, hit Command-space
, type in “terminal” and open the Terminal
app. In Windows, open PowerShell by opening the Start Menu, typing “PowerShell” in the search bar, and selecting “Windows PowerShell.” I assume you know how to get access to the command line if you are using Linux.
On the command line, do the following.
macOS or Linux
curl -fsSL https://pixi.sh/install.sh | sh
Windows
powershell -ExecutionPolicy ByPass -c "irm -useb https://pixi.sh/install.ps1 | iex"
Step 2: Create a directory for your work in the course. You might want to name the directory wis-stats/
, which is what I have named it. You can do this either with the command line of your graphical file management program (e.g., Finder
for macOS).
Step 3 Navigate to the directory you created on the command line. For example, if the directory is wis_stats/
in your home directory and you are in your home directory, you can do
cd wis_stats
on the command line.
Step 4 Download the requisite Pixi files: pixi.toml, pixi.lock. These files need to be stored in the directory you created in step 3. You may download them by right-clicking those links, or by doing the following on the command line.
macOS or Linux
curl -fsSL https://raw.githubusercontent.com/wis-stats/wis-stats.github.io/refs/heads/main/pixi.toml
curl -fsSL https://raw.githubusercontent.com/wis-stats/wis-stats.github.io/refs/heads/main/pixi.lock
Windows
irm -useb https://raw.githubusercontent.com/wis-stats/wis-stats.github.io/refs/heads/main/pixi.toml -OutFile pixi.toml
irm -useb https://raw.githubusercontent.com/wis-stats/wis-stats.github.io/refs/heads/main/pixi.lock -OutFile pixi.lock
Step 5 Install CmdStan. Do the following on the command line (it may take a while to execute).
pixi run install_cmdstan
Step 6 Install the environment! Do the following on the command line.
pixi install
Step 6 To be able to use all of the packages, you need to invoke a Pixi shell. To do so, execute the following on the command line.
pixi shell
You are now good to go! After you are done working, to exit the Pixi shell, hit Control-D
.
For doing work for this class, you will need to cd into the directory you created in step 2 and execute pixi shell
every time you open a new terminal (or PowerShell) window.
B.6 Launching JupyterLab
Once you have invoked a Pixi shell, you can launch JupyterLab via your operating system’s terminal program (Terminal on macOS and PowerShell on Windows). To do so, enter the following on the command line (after having run pixi shell
).
jupyter lab
You will have an instance of JupyterLab running in your default browser. If you want to specify the browser, you can, for example, type
jupyter lab --browser=firefox
on the command line.
Alternatively, if you are using VSCode, you can use its menu system to open .ipynb
files. Make sure you select the Python kernel corresponding to your environment. You can read the documentation here. Hint: You may need to restart VSCode after doing the above installations so it is aware of your pixi environment.
B.7 Checking your distribution
Let’s now run a quick test to make sure things are working properly. We will make a quick plot that requires some of the scientific libraries we will use.
Launch a Jupyter notebook in JupyterLab. In the first cell (the box next to the [ ]:
prompt), paste the code below. To run the code, press Shift+Enter
while the cursor is active inside the cell. You should see a plot that looks like the one below. If you do, you have a functioning Python environment for scientific computing!
Computing environment
%load_ext watermark
%watermark -v -p numpy,cmdstanpy,arviz,bebi103,bokeh,jupyterlab
print("CmdStan : {0:d}.{1:d}".format(*cmdstanpy.cmdstan_version()))
Python implementation: CPython
Python version : 3.13.5
IPython version : 9.4.0
numpy : 2.2.6
cmdstanpy : 1.2.5
arviz : 0.22.0
bebi103 : 0.1.28
bokeh : 3.7.3
jupyterlab: 4.4.5
CmdStan : 2.36