54  GLMs applied to neurons and aggression

Open in Google Colab | Download notebook


import numpy as np
import scipy.io

import sklearn.linear_model

import cmdstanpy
import arviz as az

import bebi103

import bokeh.io
import bokeh.plotting
bokeh.io.output_notebook()
Loading BokehJS ...

In this lesson, I will demonstrate the use of a GLM in practice. Specifically, we will perform a logistic regression using the data from Remedios, et al. that we have already encountered in ?sec-matlab-files.

Now that we are modeling these data in earnest, it is important to note some facts from the paper about the data set.

We have already loaded and visualized the data set in ?sec-matlab-files. I will expose the code to load in the data set, but hide the visualization code.

data = scipy.io.loadmat('../data/hypothalamus_calcium_imaging_remedios_et_al.mat')
neural_data = data['neural_data'].transpose()  # Row is time index, column is neuron
attack_vector = data['attack_vector'].flatten()
sex_vector = data['sex_vector'].flatten()

# Time points at 30 Hz sampling
t = np.arange(len(attack_vector)) / 30
Code
p_attack = bokeh.plotting.figure(
    frame_height=50,
    frame_width=600,
    x_axis_label=None,
    y_axis_label='attack',
    y_range=['no', 'yes'],
    toolbar_location=None,
)
attack_color = ['tomato' if a else 'gray' for a in attack_vector]
p_attack.scatter(t, ['yes' if a else 'no' for a in attack_vector], size=2, color=attack_color)
p_attack.xaxis.major_label_text_font_size = '0pt'

p_sex = bokeh.plotting.figure(
    frame_height=50,
    frame_width=600,
    x_axis_label=None,
    y_axis_label='sex',
    y_range=['m', 'f'],
    toolbar_location=None,
)
sex_color = ['orchid' if s else 'dodgerblue' for s in sex_vector]
p_sex.scatter(t, ['f' if s else 'm' for s in sex_vector], size=2, color=sex_color)
p_sex.xaxis.major_label_text_font_size = '0pt'

p_neuron = bebi103.image.imshow(
    neural_data.transpose(),
    frame_height=150,
    frame_width=600,
    x_axis_label="time (s)",
    y_axis_label="neuron",
    interpixel_distance=1/30,
)

p_neuron.yaxis.major_label_text_font_size = '0pt'

p_attack.x_range = p_neuron.x_range
p_sex.x_range = p_neuron.x_range

bokeh.io.show(bokeh.layouts.column(p_sex, p_attack, p_neuron))

Clearly, the neuronal activity and attack behavior changes when presented with a male. We could model the neuronal activity based on sex presentation, but that is not our task here. Here, we will model how neuronal activity affects attack behavior.

54.1 Centering and scaling

As we approach our analysis, it is wise to first check that the data are as we think. If the neural data are fluorescence difference relative to the mean, then the mean signal (averaged over all time points) of each neuron should be zero. Let’s check.

neural_data.mean(axis=0)
array([-0.50595541,  2.80241833,  0.65463783, -0.05233957,  1.91911147,
        1.05586614,  0.36410999,  0.87441479,  0.43653782,  2.80454759,
        1.8669907 , -0.91001794, -2.39086245,  2.36254721,  0.60409699,
        0.54337706,  3.85461084, -1.23488834, -0.23093228,  1.96520683,
        0.20655417, -1.55486544,  1.59377787,  0.76639594,  1.07599103,
        1.67454872, -2.15025706, -1.59622908, -1.09768275,  2.07154601,
        0.23631976,  1.47605528,  0.56212051, -0.97385873,  1.10629566,
        0.57570062, -0.79088113, -0.97003822,  0.73574308,  1.58055879,
        2.15372034,  0.80003251, -0.4833639 , -0.40458862,  0.59728615,
        0.28368438, -0.9700867 , -0.38844875,  0.62099988, -0.23944945,
       -0.44615113, -0.7285588 ,  1.06474938, -0.31719077, -0.30342351,
       -1.3040946 ,  0.64354608,  0.58876965,  0.19892946, -0.37661813,
       -0.82218469,  0.98216878,  0.60068407,  0.62474366, -0.80829488,
        0.47448697,  0.35872685,  0.360741  , -0.45829286,  0.67356388,
        0.24463406,  0.01289178, -1.00545236, -0.28845844, -0.68860872,
        1.19482826,  0.82365939, -0.80095742,  0.79133022, -1.36316494,
       -0.01224409,  0.88449665, -0.51000284,  0.19967641,  0.4952818 ,
        0.31396237, -1.31430482, -0.48578166, -1.70865534,  0.1779136 ,
        0.01354794, -0.1001097 ,  0.63450004, -0.64862253, -0.62575994,
       -0.35120753,  1.24988798, -0.00664137, -0.10881931, -0.39084366,
        0.572822  ,  0.06283101, -1.74370732,  0.52895888,  1.2437877 ,
        0.31988937, -0.87772531, -1.12787452,  0.14820647,  0.77293784,
       -0.36212311,  0.02396753, -0.20294646,  0.37525496,  0.61183772])

Clearly, this is not the case. The methods section of the paper discusses some post processing after segmentation, but it is not clear how that was done. We will therefore center and scale the data, centering by the mean of all measurements and scaling by the standard deviation of all measurements.

# Centered-and-scaled neuronal data
X = (neural_data - neural_data.mean()) / neural_data.std()

54.2 Logistic regression model

Now, we will proceed with modeling.

Our goal here is to model how neural activity leads to attack behavior. We will model the probability of a mouse being in attack mode based only on instantaneous neuronal activity. That is, we will not explicitly model any temporal behavior.

Let \(\theta\) be the probability of a mouse being in attack mode. Let \(\mathbf{x}\) be a vector where each entry is a quantitation of the activity of a neuron; each entry corresponds to a separate neuron. So, \(\theta = \theta(\mathbf{x})\). We denote by \(f(\mathbf{x})\) the log odds ratio of being in attack mode,

\[\begin{align} f(\mathbf{x}) = \ln\left(\frac{\theta(\mathbf{x})}{1 - \theta(\mathbf{x})}\right) \equiv \mathrm{logit}(\theta(\mathbf{x})), \end{align} \]

where the odds ratio is \(\theta / (1 - \theta)\), and varies from zero to infinity. The log odds ratio is referred to as a logit function and its value can take any real value. We can invert this function to get

\[\begin{align} \theta(\mathbf{x}) = \mathrm{logit}^{-1}(f(\mathbf{x})) = \frac{1}{1 + \mathrm{e}^{-f(\mathbf{x})}}. \end{align} \]

We are left to model the log odds ratio as a function of neuronal activity. As is often a useful strategy when we do not have a specific theoretical in mind, we assume \(f(\mathbf{x})\) is continuous with continuous derivatives and may be approximated as a Taylor series about \(\mathbf{x} = 0\),

\[\begin{align} f(\mathbf{x}) = \alpha + \boldsymbol{\beta}\cdot \mathbf{x} + \frac{1}{2}\,\mathbf{x}^\mathsf{T}\cdot\mathsf{B}\cdot \mathbf{x} + \ldots , \end{align} \]

where \(\alpha\) is scalar valued, \(\beta\) is vector valued, \(\mathsf{B}\) is matrix valued, etc. We will truncate the Taylor expansion to first order such that

\[\begin{align} f(\mathbf{x}) \approx \alpha + \boldsymbol{\beta}\cdot \mathbf{x}. \end{align} \]

Note that by neglecting the higher order terms, we have neglected all interdependence of the neurons. This is in general the case of GLMs. Since the output is a linear function of the inputs \(\mathbf{x}\), any interdependence of the entries in \(\mathbf{x}\) is not modeled. Thus, our theoretical model for the probability of a mouse being in attack mode is

\[\begin{align} \theta(\mathbf{x}) = \frac{1}{1 + \mathrm{e}^{\alpha + \boldsymbol{\beta}\cdot \mathbf{x}}}. \end{align} \]

Say at time \(t_i\), a mouse has neuronal activity \(\mathbf{x}_i\) and probability \(\theta_i\) of being in attack mode. Let \(a_i\) be one if the mouse is observed to be in attack mode at time point \(t_i\) and zero if the mouse is not in attack mode. Then, we have

\[\begin{align} a_i \sim \text{Bernoulli}(\theta_i)\;\forall i. \end{align} \]

We will take the centered-and-scaled relative fluorescence of a neuron in calcium imaging as a quantitation of its activity. We neglect all time dependence, and assume that each \(\theta\) is i.i.d. and each \(a_i\) is also i.i.d. This completes the specification of the likelihood,

\[\begin{align} &\theta_i = \mathrm{logit}^{-1}\left(\alpha + \boldsymbol{\beta}\cdot \mathbf{x}_i\right)\;\forall i,\\[1em] &a_i \sim \text{Bernoulli}(\theta_i)\;\forall i. \end{align} \]

Next, we need to specify the priors for \(\alpha\) and \(\boldsymbol{\beta}\). We have centered and scaled the data, so we do not expect big variation in the orders of magnitude of the parameters. However, I am not really sure what their scale should be a priori. I will therefore choose a zero-mean Normal distribution with an inferred scale parameter, which I shall call \(\tau\). I will then choose a weakly informative HalfNormal prior for \(\tau\). Note that this choice of priors is sometimes referred to as a regularization, specifically ridge regularization. In this context, \(\tau\) is usually expressed as \(\lambda \equiv 1/\tau^2\), with \(\lambda\) being referred to as the sparcity parameter. Our complete model is then

\[\begin{align} &\tau \sim \text{Norm}(0, 100),\\[1em] &\alpha \sim \text{Norm}(0, \tau),\\[1em] &\beta_i \sim \text{Norm}(0, \tau) \; \forall i,\\[1em] &\theta_i = \mathrm{logit}^{-1}\left(\alpha + \boldsymbol{\beta}\cdot \mathbf{x}_i\right)\;\forall i,\\[1em] &a_i \sim \text{Bernoulli}(\theta_i)\;\forall i. \end{align} \]

We can code this up in a Stan model. We could code our model as follows.

data {
  int N_neurons;
  int N_time_points;
  array[N_time_points] int a;
  matrix[N_time_points, N_neurons] X;
}


parameters {
  real alpha;
  vector[N_neurons] beta_;

  // Inverse sparcity parameter
  real<lower=0> tau;
}


transformed parameters {
  // Sparcity parameter as typically defined
  real lambda = 1.0 / tau^2;

  // Probability of being in attack state
  vector[N_time_points] theta = inv_logit(alpha + X * beta_);
}


model {
  alpha ~ normal(0, tau);
  beta_ ~ normal(0, tau);
  a ~ bernoulli(theta);
}

Fortunately, though, Stan has allows for a Bernoulli distribution with a logit-based generalized linear model, which is a fancy phrase for what we have just done. So, we use the following Stan code.

data {
  int N_neurons;
  int N_time_points;
  array[N_time_points] int a;
  matrix[N_time_points, N_neurons] X;
}


parameters {
  real alpha;
  vector[N_neurons] beta_;

  // Inverse sparcity parameter
  real<lower=0> tau;
}


transformed parameters {
  // Sparcity parameter as typically defined
  real lambda = 1.0 / tau^2;
}


model {
  tau ~ normal(0, 100);
  alpha ~ normal(0, tau);
  beta_ ~ normal(0, tau);
  a ~ bernoulli_logit_glm(X, alpha, beta_);
}

Let’s compile and sample! From experience with this model, I know that I need to increase the maximum tree depth for the no U-turn recursion beyond the default of 10.

# Data dictionary to send to Stan
data = dict(
    a=attack_vector,
    X=X,
    N_time_points=X.shape[0],
    N_neurons=X.shape[1],
)

# Compile
with bebi103.stan.disable_logging():
    sm = cmdstanpy.CmdStanModel(stan_file='logistic_ridge.stan')

# For convenience, if I have already saved the samples, keep them, otherwise sample
try:
    samples = az.from_netcdf('full.nc')
except:
    with bebi103.stan.disable_logging():
        samples = az.from_cmdstanpy(sm.sample(data=data, max_treedepth=15))
    samples.to_netcdf('full.nc')

As usual, we inspect the health of the sampler by checking diagnostics.

bebi103.stan.check_all_diagnostics(samples, max_treedepth=15)
Effective sample size looks reasonable for all parameters.

Rhat looks reasonable for all parameters.

0 of 4000 (0.0%) iterations ended with a divergence.

0 of 4000 (0.0%) iterations saturated the maximum tree depth of 15.

E-BFMI indicated no pathological behavior.
0

Looks good!

Now, we can investigate the results. To start with, let’s make a corner plot of the sparcity parameter, the parameter \(\alpha\), and the first few \(\beta\)’s. I find it easier to reason about the sparcity parameter in terms of \(\tau\) and not \(\lambda\), so we will use that.

bokeh.io.show(
    bebi103.viz.corner(samples, parameters=["tau", "alpha"] + [f"beta_[{i}]" for i in range(4)])
)

The sparcity parameter is about 10, allowing for variation in parameter values. Nonetheless, it is of interest that the data must be quite informative, as, for example, \(\alpha\) is far outside its prescribed prior range.

Let us now plot the 95% credible intervals of all of the \(\beta\)’s. We will first sort them by the median posterior value for each of visualization.

# Compute credible interval
cred_int_beta = samples.posterior.beta_.quantile(
    q=[0.0275, 0.5, 0.975], dim=["chain", "draw"]
).values.transpose()

neuron_inds = cred_int_beta[:, 1].argsort()

summaries = [
    dict(label=str(i), estimate=c[1], conf_int=[c[0], c[2]])
    for i, c in zip(neuron_inds, cred_int_beta[neuron_inds, :])
]

p_cred_ints = bebi103.viz.confints(
    summaries, x_axis_label="β", y_axis_label="neuron", frame_height=1000
)
bokeh.io.show(p_cred_ints)

Note that the credible interval for a parameter that is uninformed by the data (essentially the middle 95% of a standard normal) is centered at zero and goes roughly from -1.96 to 1.96. Neurons 51 and 87 show roughly this. Interestingly, a great many of the contributions to the attack behavior of specific neurons are in fact informed by the data. Note that the parameter \(\alpha\) is small, about \(-50\), which means that the non-attack state is the “ground state” of quiescent neurons.

Let us now do a posterior predictive check to see if this model and predict when a mouse will be in an attack state. To do so, we compute \(\theta\) for each time point for each set of parameter values. We then compute a credible region for \(\theta\) and compare to the observed attack times.

def inv_logit_glm(X, alpha, beta):
    return 1 / (1 + np.exp(-alpha - np.dot(X, beta)))


# Compute theta
theta_full = np.array(
    [
        inv_logit_glm(X, alpha, beta)
        for alpha, beta in zip(
            samples.posterior.alpha.stack({"sample": ("chain", "draw")}).values,
            samples.posterior.beta_.stack(
                {"sample": ("chain", "draw")}
            ).values.transpose(),
        )
    ]
)

# Confidence region
theta_full = np.percentile(theta_full, [2.5, 50, 97.5], axis=0)

# Make plot
p_pred = bokeh.plotting.figure(
    frame_height=150,
    frame_width=600,
    x_axis_label="time (s)",
    y_axis_label="attack prob",
)
p_pred.scatter(t, attack_vector, size=2, color=attack_color)
bebi103.viz.fill_between(
    t,
    theta_full[0, :],
    t,
    theta_full[2, :],
    patch_kwargs=dict(alpha=0.3),
    show_line=False,
    p=p_pred,
)
p_pred.line(t, theta_full[1, :])

bokeh.io.show(p_pred)
/var/folders/8h/qwnxpqcx6vldhxr71n1582d00000gn/T/ipykernel_51710/548793961.py:2: RuntimeWarning: overflow encountered in exp
  return 1 / (1 + np.exp(-alpha - np.dot(X, beta)))

To properly view the plot, it is important to zoom in at regions of interest. This looks quite good. There are almost no false positives for attack, and real attack events happen when \(\theta\) is substantial.

54.3 Cross validation

The posterior distribution encodes all of our knowledge about the parameters. Nonetheless, it is useful to perform a cross validation. We will therefore refit the model omitting half of the data points (omitted data points chosen at random, since we are neglecting time-dependence) and then check the posterior distribution of the parameter values we get and predictive checks against the entire data set.

First, let’s make a training set, which is half of the data.

# Boolean array of data points to include
rng = np.random.default_rng(seed=3252)
inds = np.array([True]*(len(t) // 2 + len(t) % 2) + [False] * (len(t) // 2))
rng.shuffle(inds)

# Split into test and train
X_train = X[inds, :]
X_test = X[~inds, :]
a_train = attack_vector[inds]
a_test = attack_vector[~inds]

Now, we can get samples from the model using only the training data set.

data_train = dict(
    a=a_train,
    X=X_train,
    N_time_points=X_train.shape[0],
    N_neurons=X_train.shape[1],
)

# For convenience, if I have already saved the samples, keep them, otherwise sample
try:
    samples_train = az.from_netcdf('train.nc')
except:
    with bebi103.stan.disable_logging():
        samples_train = az.from_cmdstanpy(
            sm.sample(data=data_train, max_treedepth=15, iter_sampling=2000, thin=2)
        )
    samples_train.to_netcdf('train.nc')

We of course perform the usual diagnostic checks.

bebi103.stan.check_all_diagnostics(samples_train, max_treedepth=15)
Effective sample size looks reasonable for all parameters.

Rhat looks reasonable for all parameters.

0 of 4000 (0.0%) iterations ended with a divergence.

0 of 4000 (0.0%) iterations saturated the maximum tree depth of 15.

E-BFMI indicated no pathological behavior.
0

Looks good! Again, we’ll make a corner plot with the sparcity parameter, \(\alpha\), and the first few \(\beta\)’s.

bokeh.io.show(
    bebi103.viz.corner(samples_train, parameters=["tau", "alpha"] + [f"beta_[{i}]" for i in range(4)])
)

Interestingly, the sparcity parameter is larger (\(\tau\) is smaller), and \(\alpha\) is smaller as well. This could be due to fewer data failing to overwhelm the prior to be informative. As a graphical display, let us compare the credible intervals for \(\alpha\) from the full data set and from the training set.

summaries = [
    dict(label='full', estimate=float(samples.posterior.alpha.median()),
    conf_int=samples.posterior.alpha.quantile([0.0275, 0.975]).values),
    dict(label='train', estimate=float(samples_train.posterior.alpha.median()),
    conf_int=samples_train.posterior.alpha.quantile([0.0275, 0.975]).values),
]

bokeh.io.show(bebi103.viz.confints(summaries, x_axis_label='α'))

Now, we will overlay the credible intervals from the model fit with the training data with that of the entire data set.

# Compute credible intervals
cred_int_beta_train = samples_train.posterior.beta_.quantile(
    q=[0.0275, 0.5, 0.975], dim=["chain", "draw"]
).values.transpose()

# Overlay with full model
neuron_labels = [str(i) for i in neuron_inds]
p_cred_ints.segment(
    x0=cred_int_beta_train[neuron_inds, 0],
    y0=neuron_labels,
    x1=cred_int_beta_train[neuron_inds, 2],
    y1=neuron_labels,
    color="orange",
    line_width=3,
    alpha=0.5,
)
p_cred_ints.scatter(
    x=cred_int_beta_train[neuron_inds, 1],
    y=neuron_labels,
    color="orange",
    size=5,
    alpha=0.5,
)

bokeh.io.show(p_cred_ints)

We do see some systematic variation. For inference based on the training set, \(|\beta|\) tends to be smaller than on values inferred from the entire set. This is consistent with \(\alpha\) also being smaller in magnitude.

Perhaps the predictions of \(\theta\) based on the inference based on the training set will be different than those based on the full data set. Let’s take a look.

# Compute theta
theta_train = np.array(
    [
        inv_logit_glm(X, alpha, beta)
        for alpha, beta in zip(
            samples_train.posterior.alpha.stack({"sample": ("chain", "draw")}).values,
            samples_train.posterior.beta_.stack(
                {"sample": ("chain", "draw")}
            ).values.transpose(),
        )
    ]
)

# Confidence region
theta_train = np.percentile(theta_train, [2.5, 50, 97.5], axis=0)

# Add to plot
bebi103.viz.fill_between(
    t,
    theta_train[0, :],
    t,
    theta_train[2, :],
    patch_kwargs=dict(alpha=0.3, color='orange'),
    show_line=False,
    p=p_pred,
)
p_pred.line(t, theta_train[1, :], color='orange')

bokeh.io.show(p_pred)

The posterior inferred on only half of the data is still predictive, but with larger credible intervals on the probability of attack.

The fact that the model is still predictive with only half of the data points and with different values of the parameters and the wide credible intervals implies that the parameters of the model are not identifiable, at least not quantitatively, but their relative values are.

We can try plotting the credible intervals of the ratio of \(\beta/|\alpha|\) instead; checking the relative scale of \(\beta\) and \(\alpha\).

# Compute credible interval of beta / alpha
samples.posterior['beta_alpha'] = samples.posterior.beta_ / np.abs(samples.posterior.alpha)
cred_int_beta_alpha = samples.posterior.beta_alpha.quantile(
    q=[0.0275, 0.5, 0.975], dim=["chain", "draw"]
).values.transpose()

summaries = [
    dict(label=str(i), estimate=c[1], conf_int=[c[0], c[2]])
    for i, c in zip(neuron_inds, cred_int_beta_alpha[neuron_inds, :])
]

p_cred_ints_ratio = bebi103.viz.confints(
    summaries, x_axis_label="β / |α|", y_axis_label="neuron", frame_height=1000
)

# Add results based on training
samples_train.posterior['beta_alpha'] = samples_train.posterior.beta_ / np.abs(samples_train.posterior.alpha)
cred_int_beta_alpha_train = samples_train.posterior.beta_alpha.quantile(
    q=[0.0275, 0.5, 0.975], dim=["chain", "draw"]
).values.transpose()

# Overlay with full model
neuron_labels = [str(i) for i in neuron_inds]
p_cred_ints_ratio.segment(
    x0=cred_int_beta_alpha_train[neuron_inds, 0],
    y0=neuron_labels,
    x1=cred_int_beta_alpha_train[neuron_inds, 2],
    y1=neuron_labels,
    color="orange",
    line_width=3,
    alpha=0.5,
)
p_cred_ints_ratio.scatter(
    x=cred_int_beta_alpha_train[neuron_inds, 1],
    y=neuron_labels,
    color="orange",
    size=5,
    alpha=0.5,
)

bokeh.io.show(p_cred_ints_ratio)

Now we see more consistency in the parameters. The parameters inferred from the training set all lie within the credible intervals inferred from the full data set, but with wider credible intervals themselves. Perhaps, then, is is only \(\boldsymbol{\beta}\) relative to background \(\alpha\) that can be inferred.

54.4 LASSO regularization

Let us now consider another set of priors for \(\alpha\) and \(\beta\) instead of the Normal priors. We will instead use a Laplace prior, otherwise known as a double exponential prior. With this choice of prior, the GLM is referred to as LASSO-regularized logistic regression. The model is then

data {
  int N_neurons;
  int N_time_points;
  array[N_time_points] int a;
  matrix[N_time_points, N_neurons] X;
}


parameters {
  real alpha;
  vector[N_neurons] beta_;

  // Inverse sparcity parameter
  real<lower=0> tau;
}


transformed parameters {
  // Sparcity parameter as typically defined
  real lambda = 1.0 / tau^2;
}


model {
  tau ~ normal(0, 100);
  alpha ~ double_exponential(0, tau);
  beta_ ~ double_exponential(0, tau);
  a ~ bernoulli_logit_glm(X, alpha, beta_);
}

Let’s grab samples from this one.

with bebi103.stan.disable_logging():
    sm_lasso = cmdstanpy.CmdStanModel(stan_file="logistic_lasso.stan")

try:
    samples_lasso = az.from_netcdf("lasso.nc")
except:
    with bebi103.stan.disable_logging():
        samples_lasso = az.from_cmdstanpy(
            sm_lasso.sample(
                data=data,
                max_treedepth=15,
                iter_warmup=2000,
                iter_sampling=2000,
                thin=2,
            )
        )
    samples_lasso.to_netcdf('lasso.nc')

We’ll do the same analysis. Check diagnostics, make a corner plot, plot credible intervals, and check prediction.

bebi103.stan.check_all_diagnostics(samples_lasso, max_treedepth=15)
Effective sample size looks reasonable for all parameters.

Rhat looks reasonable for all parameters.

0 of 4000 (0.0%) iterations ended with a divergence.

0 of 4000 (0.0%) iterations saturated the maximum tree depth of 15.

E-BFMI indicated no pathological behavior.
0

Now, the corner plot.

bokeh.io.show(
    bebi103.viz.corner(samples_lasso, parameters=["tau", "alpha"] + [f"beta_[{i}]" for i in range(4)])
)

We see a similar value of \(\tau\) as we did the ridge regularization. The parameter \(\alpha\) is much smaller though.

Let’s check out the \(\beta\) parameters. We’ll overlay them on our credible intervals from ridge regularization, with the results from LASSO regularization in red.

# Compute credible intervals
cred_int_beta_lasso = samples_lasso.posterior.beta_.quantile(
    q=[0.0275, 0.5, 0.975], dim=["chain", "draw"]
).values.transpose()

# Overlay with full model
neuron_labels = [str(i) for i in neuron_inds]
p_cred_ints.segment(
    x0=cred_int_beta_lasso[neuron_inds, 0],
    y0=neuron_labels,
    x1=cred_int_beta_lasso[neuron_inds, 2],
    y1=neuron_labels,
    color="tomato",
    line_width=3,
    alpha=0.5,
)
p_cred_ints.scatter(
    x=cred_int_beta_lasso[neuron_inds, 1],
    y=neuron_labels,
    color="tomato",
    size=5,
    alpha=0.5,
)

bokeh.io.show(p_cred_ints)

The LASSO results follow the same trend, but with some broader confidence intervals and with some neurons having extreme values.

Finally, let’s check prediction.

# Compute theta
theta_lasso = np.array(
    [
        inv_logit_glm(X, alpha, beta)
        for alpha, beta in zip(
            samples_lasso.posterior.alpha.stack({"sample": ("chain", "draw")}).values,
            samples_lasso.posterior.beta_.stack(
                {"sample": ("chain", "draw")}
            ).values.transpose(),
        )
    ]
)

# Confidence region
theta_lasso = np.percentile(theta_lasso, [2.5, 50, 97.5], axis=0)

# Add to plot
bebi103.viz.fill_between(
    t,
    theta_lasso[0, :],
    t,
    theta_lasso[2, :],
    patch_kwargs=dict(alpha=0.3, color='tomato'),
    show_line=False,
    p=p_pred,
)
p_pred.line(t, theta_lasso[1, :], color='tomato')

bokeh.io.show(p_pred)
/var/folders/8h/qwnxpqcx6vldhxr71n1582d00000gn/T/ipykernel_51710/548793961.py:2: RuntimeWarning: overflow encountered in exp
  return 1 / (1 + np.exp(-alpha - np.dot(X, beta)))

The predictions from the model with LASSO regularization are very similar to those with ridge regularization, which is not surprising since most of the parameter values are similar for the two models.

As a final check, let’s perform parameter inference with LASSO regularization using only the training data.

try:
    samples_lasso_train = az.from_netcdf("train_lasso.nc")
except:
    with bebi103.stan.disable_logging():
        sm_lasso = cmdstanpy.CmdStanModel(stan_file="logistic_lasso.stan")
        samples_lasso_train = az.from_cmdstanpy(
            sm_lasso.sample(
                data=data_train,
                max_treedepth=15,
                iter_warmup=2000,
                iter_sampling=2000,
                thin=2,
            )
        )
    samples_lasso_train.to_netcdf('train_lasso.nc')

And, we’ll proceed with the usual procedures.

bebi103.stan.check_all_diagnostics(samples_lasso_train, max_treedepth=15)
Effective sample size looks reasonable for all parameters.

Rhat looks reasonable for all parameters.

0 of 4000 (0.0%) iterations ended with a divergence.

0 of 4000 (0.0%) iterations saturated the maximum tree depth of 15.

E-BFMI indicated no pathological behavior.
0
bokeh.io.show(
    bebi103.viz.corner(samples_lasso_train, parameters=["tau", "alpha"] + [f"beta_[{i}]" for i in range(4)])
)

We have a smaller \(\tau\) and also a smaller \(\alpha\) compared to when we used the full data set. We also saw this with the analysis with ridge regularization.

# Compute credible intervals
cred_int_beta_lasso_train = samples_lasso_train.posterior.beta_.quantile(
    q=[0.0275, 0.5, 0.975], dim=["chain", "draw"]
).values.transpose()

# Overlay with full model
neuron_labels = [str(i) for i in neuron_inds]
p_cred_ints.segment(
    x0=cred_int_beta_lasso_train[neuron_inds, 0],
    y0=neuron_labels,
    x1=cred_int_beta_lasso_train[neuron_inds, 2],
    y1=neuron_labels,
    color="purple",
    line_width=3,
    alpha=0.5,
)
p_cred_ints.scatter(
    x=cred_int_beta_lasso_train[neuron_inds, 1],
    y=neuron_labels,
    color="purple",
    size=5,
    alpha=0.5,
)

bokeh.io.show(p_cred_ints)
# Compute theta
theta_lasso_train = np.array(
    [
        inv_logit_glm(X, alpha, beta)
        for alpha, beta in zip(
            samples_lasso_train.posterior.alpha.stack({"sample": ("chain", "draw")}).values,
            samples_lasso_train.posterior.beta_.stack(
                {"sample": ("chain", "draw")}
            ).values.transpose(),
        )
    ]
)

# Confidence region
theta_lasso_train = np.percentile(theta_lasso_train, [2.5, 50, 97.5], axis=0)

# Add to plot
bebi103.viz.fill_between(
    t,
    theta_lasso_train[0, :],
    t,
    theta_lasso_train[2, :],
    patch_kwargs=dict(alpha=0.3, color='purple'),
    show_line=False,
    p=p_pred,
)
p_pred.line(t, theta_lasso_train[1, :], color='purple')

bokeh.io.show(p_pred)

Apparently, with the performance of the LASSO regularization with the training set is better than that of the ridge regularization.

54.5 GLM with scikit-learn

Scikit-learn offers a convenient interface into a subset of GLMs via its sklearn.linear_model module. You can find the MAP parameter values for \(\alpha\) and \(\boldsymbol{\beta}\). Importantly, though, for some GLMs, only ridge regularization is available. For all functions, the sparcity parameter must be specified (with a default value of 1). That is, the sparcity parameter is not inferred, but conferred.

Let’s put it to use! We first instantiate a logistic regression instance. We will use ridge regularization, which we specify with the penalty='l2' keyword argument (specifying an L2 norm regularizer, which is equivalent to a Normal prior on the \(\alpha\) and \(\boldsymbol{\beta}\) values). We will also allow for more iterations when fitting, as I found was necessary for this particular case, using the max_iter keyword argument.

logreg = sklearn.linear_model.LogisticRegression(penalty='l2', max_iter=1000)

As usual, to get the MAP estimates of the parameters, we use the fit() method.

logreg.fit(X, attack_vector)
/Users/bois/miniconda3/envs/datasai/lib/python3.12/site-packages/sklearn/linear_model/_linear_loss.py:200: RuntimeWarning: divide by zero encountered in matmul
  raw_prediction = X @ weights + intercept
/Users/bois/miniconda3/envs/datasai/lib/python3.12/site-packages/sklearn/linear_model/_linear_loss.py:200: RuntimeWarning: overflow encountered in matmul
  raw_prediction = X @ weights + intercept
/Users/bois/miniconda3/envs/datasai/lib/python3.12/site-packages/sklearn/linear_model/_linear_loss.py:200: RuntimeWarning: invalid value encountered in matmul
  raw_prediction = X @ weights + intercept
/Users/bois/miniconda3/envs/datasai/lib/python3.12/site-packages/sklearn/linear_model/_linear_loss.py:330: RuntimeWarning: divide by zero encountered in matmul
  grad[:n_features] = X.T @ grad_pointwise + l2_reg_strength * weights
/Users/bois/miniconda3/envs/datasai/lib/python3.12/site-packages/sklearn/linear_model/_linear_loss.py:330: RuntimeWarning: overflow encountered in matmul
  grad[:n_features] = X.T @ grad_pointwise + l2_reg_strength * weights
/Users/bois/miniconda3/envs/datasai/lib/python3.12/site-packages/sklearn/linear_model/_linear_loss.py:330: RuntimeWarning: invalid value encountered in matmul
  grad[:n_features] = X.T @ grad_pointwise + l2_reg_strength * weights
LogisticRegression(max_iter=1000)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

We can access the MAP value of \(\alpha\) via the logreg.intercept_ attribute and the MAP value of \(\boldsymbol{\beta}\) via the logreg.coef_ attribute. Conveniently, we can access the predicted attack probability using logreg.predict_proba(X).

To check how well we did with our regression, we will add the attack probability to the above plot, this time with a single green line.

p_pred.line(t, logreg.predict_proba(X)[:, 1], color='green')

bokeh.io.show(p_pred)
/Users/bois/miniconda3/envs/datasai/lib/python3.12/site-packages/sklearn/utils/extmath.py:203: RuntimeWarning: divide by zero encountered in matmul
  ret = a @ b
/Users/bois/miniconda3/envs/datasai/lib/python3.12/site-packages/sklearn/utils/extmath.py:203: RuntimeWarning: overflow encountered in matmul
  ret = a @ b
/Users/bois/miniconda3/envs/datasai/lib/python3.12/site-packages/sklearn/utils/extmath.py:203: RuntimeWarning: invalid value encountered in matmul
  ret = a @ b

We unsurprisingly have good predictions; the data are informative!

As a final now, the table below shows which classes in the sklearn.linear_model module offer which named GLMs.

GLM name scikit-learn access
Linear regression LinearRegression, Ridge, Lasso
Logistic regression LogisticRegression
Multinomial logistic regression LogisticRegression (number of categories inferred based on input to fit())
Binomial regression
Multinomial regression
Poisson regression PoissonRegressor
Gamma regression GammaRegressor (only uses exponential inverse link function)

54.6 Computing environment

%load_ext watermark
%watermark -v -p numpy,scipy,sklearn,cmdstanpy,arviz,bebi103,bokeh,jupyterlab
print("cmdstan   :", bebi103.stan.cmdstan_version())
Python implementation: CPython
Python version       : 3.12.11
IPython version      : 9.1.0

numpy     : 2.1.3
scipy     : 1.15.3
sklearn   : 1.6.1
cmdstanpy : 1.2.5
arviz     : 0.21.0
bebi103   : 0.1.27
bokeh     : 3.6.2
jupyterlab: 4.3.7

cmdstan   : 2.36.0