72 Funnel of hell

Code

# Colab setup ------------------
import os, shutil, sys, subprocess, urllib.request
if "google.colab" in sys.modules:
    cmd = "pip install --upgrade iqplot colorcet bebi103 arviz cmdstanpy watermark"
    process = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    stdout, stderr = process.communicate()
    from cmdstanpy.install_cmdstan import latest_version
    cmdstan_version = latest_version()
    cmdstan_url = f"https://github.com/stan-dev/cmdstan/releases/download/v{cmdstan_version}/"
    fname = f"colab-cmdstan-{cmdstan_version}.tgz"
    urllib.request.urlretrieve(cmdstan_url + fname, fname)
    shutil.unpack_archive(fname)
    os.environ["CMDSTAN"] = f"./cmdstan-{cmdstan_version}"
    data_path = "https://s3.amazonaws.com/bebi103.caltech.edu/data/"
else:
    data_path = "../data/"
# ------------------------------

import numpy as np

import bebi103

import bokeh.io
bokeh.io.output_notebook()

Loading BokehJS ...

In his 2003 paper, Radford Neal pointed out that sampling out of relatively simply distributions using MCMC can be challenging. He proposed the following model, a kind of distribution that Thomas Wiecki has referred to as the “funnel of hell,” as a demonstration.

\[\begin{align} & v \sim \text{Norm}(0, 3),\\[1em] & \theta \sim \text{Norm}(0, \mathrm{e}^{v/2}), \end{align} \tag{72.1}\]

That is, \(v\) is Normally distribution with mean zero and variance 9, and \(\theta\) is Normally distributed with mean zero and variance \(\mathrm{e}^v\). The joint distribution is then

\[\begin{align} P(\theta, v) = P(\theta\mid v) \,P(v) = \frac{\mathrm{e}^{-v/2}}{6\pi}\,\exp\left[-\frac{1}{2}\left(\frac{v^2}{9} + \frac{\theta^2}{\mathrm{e}^v}\right)\right] \end{align} \tag{72.2}\]

We can compute the PDF analytically, so let’s make a plot of it so we know what we’re sampling out of.

theta = np.linspace(-4, 4, 400)
v = np.linspace(-15, 5, 400)

THETA, V = np.meshgrid(theta, v)
P = np.exp(-V / 2) / 6 / np.pi * np.exp(-(V ** 2 / 9 + THETA ** 2 / np.exp(V)) / 2)

# Show it by hacking contour to show image, but no contours
bokeh.io.show(
    bebi103.viz.contour(
        THETA,
        V,
        P,
        overlaid=True,
        line_kwargs=dict(alpha=0),
        x_axis_label="θ",
        y_axis_label="v",
    )
)

This probability density function is funnel shaped, named “the Funnel of Hell” by Thomas Wiecki.

Note that much of the probability density lies deep in the needle, which is a region of high curvature.

This simple distribution allows for independent sampling without MCMC. First, let’s generate some of these samples so we can see what effective sampling should look like.

# Sample out of distribution
np.random.seed(3252)
v = np.random.normal(0, 3, size=4000)
theta = np.random.normal(0, np.exp(v / 2))

p = bokeh.plotting.figure(
    height=400, width=450, x_range=[-100, 100], x_axis_label="θ", y_axis_label="v"
)
p.scatter(theta, v, alpha=0.3, color="#66c2a5", legend_label="indep. samples")
p.legend.location = "bottom_left"
bokeh.io.show(p)

Your task is to code up a Stan model for this distribution and use MCMC to get samples out of it. Then, overlay those samples on the plot of the independent samples. What do you observe? (There will be obvious problems with the sampling, and we will address these in coming lessons.)