import numpy as npimport polars as plimport scipy.stats as stimport scipy.optimizeimport statsmodels.tools.numdiff as smndimport sklearnimport cmdstanpyimport arviz as azimport iqplotimport bebi103import colorcetimport bokeh.ioimport bokeh.plottingbokeh.io.output_notebook()
Loading BokehJS ...
As an example of an application of the EM algorithm to a Gaussian mixture model, we can use a now-classic data set featuring the waiting time and durations of eruptions of the Old Faithful geyser in Yellowstone National Park. The data set consists of a set of measurements of eruptions in 1985 and was published by Azzalini and Bowman in 1990. Let us first take a look at the data.
To build our model, we assume there are two bivariate Gaussians in our mixture. We will assume uniform priors for the mixing coefficients and for the correlation matrix. We choose reasonable priors for the remaining parameters. Our model is
To avoid a label switching nonidentifiability, we stipulate that \(\mu_{d,1} < \mu_{d,2}\).
40.0.1 Full MCMC of the generative model
We start by applying the gold standard: performing full MCMC to sample out of the posterior distribution. Because HMC (Stan’s algorithm) can only sample continuous variables, we must sample out of the marginal posterior where we have marginalized out the \(\mathbf{z}_i\)’s. This is not problematic, as we can always compute the responsibilities for each sample. The Stan code below is more general for Gaussian mixture models, but we can use it in this case.
data {int<lower=2> K; // Number of mixture components (at least 2)int<lower=1> d; // Dimensionality of observationsint<lower=1> N; // Number of multidimensional data pointsarray[N] vector[d] y; // Measurements}parameters {// Mixing coefficientssimplex[K] pi_;// The mu values for the first dimension are ordered to // prevent label switching nonidentifiability.ordered[K] mu_1;array[K] vector[d-1] mu_2_to_K; // Rest of mu's// Cholesky decomposition of correlation matrixarray[K] cholesky_factor_corr[d] L_Omega;// Sqrt of diagonal of covariance matrixarray[K] vector<lower=0>[d] sigma;}transformed parameters {// mu's as a vector. Have to hand-build because we keep first entry // in mu is ordered to prevent label-switching nonidentifiability.array[K] vector[d] mu;for (k in1:K) { mu[k][1] = mu_1[k];// for (j in 2:K) {// mu[k][j] = mu_2_to_K[k][j-1];// } mu[k][2:K] = mu_2_to_K[k]; }// Cholesky factor for covariance matrixarray[K] cholesky_factor_cov[d] L_Sigma;for (k in1:K) { L_Sigma[k] = diag_pre_multiply(sigma[k], L_Omega[k]); }}model {// Begin priors -- edit for specific use case// Prior for mixing coefficients pi_ ~ dirichlet(rep_vector(1.0, d));for (k in1:K) {// For Old Faithful analysis, d = 2; explicit priors on mu's. mu[k][1] ~ normal(5.0, 2.0); mu[k][2] ~ normal(60.0, 10.0);// Sqrt of diagonal of covariance matrix// For Old Faithful analysis, d = 2; explicit priors. sigma[k][1] ~ normal(0.0, 5.0); sigma[k][2] ~ normal(0.0, 20.0);// Use LKJ priors with eta = 1 for correlation matrix L_Omega[k] ~ lkj_corr_cholesky(1.0); }// End priors// Likelihood for GMMfor (i in1:N) {// Use logsumexp trick to compute likelihood of mixture modelarray[K] real log_pdf_components;for (k in1:K) { log_pdf_components[k] = log(pi_[k]) + multi_normal_cholesky_lpdf(y[i] | mu[k], L_Sigma[k]); }target += log_sum_exp(log_pdf_components); }}generated quantities {// Compute responsibilitiesarray[N, K] real responsibilities;for (i in1:N) {array[K] real log_responsibilities;// Compute log of the numerator for the responsibilitiesfor (k in1:K) { log_responsibilities[k] = log(pi_[k]) + multi_normal_cholesky_lpdf(y[i] | mu[k], L_Sigma[k]); }// Normalization factor is the sum of all of the numeratorsreal log_normalization_factor = -log_sum_exp(log_responsibilities);// Exponentiate log responsibilities to get responsibilitiesfor (k in1:K) { responsibilities[i, k] = exp( log_responsibilities[k] + log_normalization_factor ); } }// Covariance matricesarray[K] cov_matrix[d] Sigma;for (k in1:K) { Sigma[k] = multiply_lower_tri_self_transpose(L_Sigma[k]); }}
Let’s put this to use!
# Extract data set from data framey = df.select(pl.col('duration-minutes', 'waiting-minutes')).to_numpy()# Prepare data for input, including # number of components in mixture and number of dimensions.data =dict( K=2, d=2, N=len(df), y=y,)# Compile and sample!with bebi103.stan.disable_logging(): sm = cmdstanpy.CmdStanModel(stan_file='old_faithful_gmm.stan') samples = sm.sample(data=data)# Get a convenient ArviZ objectsamples = az.from_cmdstanpy(samples)# As always, check diagnosticsbebi103.stan.check_all_diagnostics(samples, parameters=['mu', 'L_Sigma', 'pi_'])
Effective sample size looks reasonable for all parameters.
Rhat looks reasonable for all parameters.
0 of 4000 (0.0%) iterations ended with a divergence.
0 of 4000 (0.0%) iterations saturated the maximum tree depth of 10.
E-BFMI indicated no pathological behavior.
0
Looks great! We can check a corner plot to see where the centers of the two Gaussians of the mixture model are.
This looks about right from what we would expect from our exploration of the plot. Conveniently, the corner plot displays credible intervals for the parameters describing the centers of the bivariate Gaussians in the mixture. We can explicitly calculate these from the samples.
# Credible intervals on all mu'sfor component in [0, 1]:for dim, descriptor inzip([0, 1], ['duration (min)', 'waiting (min)']):print(f"95% cred int for mean {descriptor} for component {component}: ", end='')print(np.percentile(samples.posterior.mu.sel(dict(mu_dim_0=component, mu_dim_1=dim)).values, [2.5, 97.5]))
95% cred int for mean duration (min) for component 0: [1.98143675 2.09175075]
95% cred int for mean waiting (min) for component 0: [53.302945 55.70961 ]
95% cred int for mean duration (min) for component 1: [4.2240095 4.35365475]
95% cred int for mean waiting (min) for component 1: [78.9693875 80.8362875]
We can further explore the results by making a plot of the data overlaid with a contour plot of the mixture of Gaussians given by the expectation of the parameter values. We can then color the data points according to the responsibilities. We start by computing expectations of the responsibilities.
To color the data points according to which component of the mixture model they came from, we can convert the quantitative responsibilities to colors using the convenient bebi103.viz.q_to_color() function that converts quantitative information to colors.
Finally, we can make the plot, first plotting the contours and then overlaying the appropriately shaded data.
p = bebi103.viz.contour( X, Y, Z, frame_width=300, frame_height=300, x_axis_label='eruption duration (min)', y_axis_label='eruption waiting time (min)')# Add datap.scatter(x=df['duration-minutes'], y=df['waiting-minutes'], color=colors)bokeh.io.show(p)
This is a reasonably effective visualization that is commonly employed for data set of this type with a Gaussian mixture model. I would not use this visualization alone; corner plots and/or credible intervals for the parameters are important as well.
40.0.2 MAP estimation by direct optimization
We now turn to parameter estimation by find the maximum a posteriori (MAP) estimate. Optimization methods we have discussed so far are difficult in general for mixture models. Instead of coding up the log prior, log likelihood, etc., in Python and using scipy.optimize(), we will use Stan’s optimization routine (which is a BFGS solver) to attempt to find the MAP.
# Perform optimization, cheating by giving starting points at mu's from MCMCmap_stan = sm.optimize(data=data, inits=dict(mu=[[2.0, 53.0], [4.25, 79.0]])).optimized_params_dict# Check out mu valuesprint('mu_1d = ', map_stan['mu[1,1]'], 'min')print('mu_2d = ', map_stan['mu[2,1]'], 'min')print('mu_1w = ', map_stan['mu[1,2]'], 'min')print('mu_2w = ', map_stan['mu[2,2]'], 'min')
22:26:28 - cmdstanpy - INFO - Chain [1] start processing
22:26:28 - cmdstanpy - INFO - Chain [1] done processing
mu_1d = 3.48583 min
mu_2d = 3.48583 min
mu_1w = 59.9959 min
mu_2w = 70.8621 min
We clearly have not converged to the correct positions that we saw from doing MCMC. We show this calculation only to demonstrate the challenge of naively performing optimization for a Gaussian mixture model. We now turn to using EM to find the MAP.
40.0.3 MAP estimation using EM
To estimate the MAP using EM, we need a function to compute the responsibilities for the E-step. This is accomplished by directly coding up Equation 39.5.
def responsibilities(y, pi, mu, Sigma):"""Compute responsibilities for a multivariate Gaussian mixture model containing K components. Parameters ---------- y : Numpy array, shape (N, d) The measurements. pi : Numpy array, shape (K,) The mixing coefficients. mu : Numpy array, shape (K, d) mu[k, :] is a vector giving the center of Gaussian component k. Sigma : Numpy array, shape (K, d, d) Sigma[k, :, :] is the covariance matrix of Gaussian component k. Returns ------- gamma : Numpy array, shape (N, K) gamma[i, k] is the responsibility of component k for data point i. """# Problem dimensions K =len(pi) N =len(y)# Values of log numerator of responsibilities log_num = np.empty((N, K))for k inrange(K): log_num[:, k] = np.log(pi[k]) + st.multivariate_normal.logpdf(y, mu[k], Sigma[k])# Values of log numerator of responsibilities log_denom =-scipy.special.logsumexp(log_num, axis=1)return np.exp(log_num + log_denom[:, np.newaxis])
Next, we need to code up an optimization routine to find the maximum of the parameter-dependent part of the surrogate function given the responsibilities. First, we code up the surrogate function using Equation 39.6. As we have done before when we have constraints to satisfy (specifically \(\sum_k \pi _k = 1\)), it is convenient to use Powell’s method for optimizing. As such, we need to make sure the surrogate function evaluates to \(-\infty\) when constraints on parameters are not met.
def surrogate(y, pi, mu, Sigma, gamma, log_prior):"""Parameter-dependent part of the surrogate function Q(θ, φ) for Gaussian mixture models. Parameters ---------- y : Numpy array, shape (N, d) The measurements. pi : Numpy array, shape (K,) The mixing coefficients. mu : Numpy array, shape (K, d) mu[k, :] is a vector giving the center of Gaussian component k. Sigma : Numpy array, shape (K, d, d) Sigma[k, :, :] is the covariance matrix of Gaussian component k. gamma : Numpy array, shape (N, K) gamma[i, k] is the responsibility of component k for data point i. log_prior : function Function with call signature log_prior(pi, mu, Sigma) that returns the scalar log prior. Returns ------- Q : float The parameter-dependent part of the surrogate function for a Gaussian mixture model. """# Number of components K =len(pi)# Make sure pi is between zero and one# We already enforce it is a simplex when we unpack paramsif (pi <0).any() or (pi >1).any():return-np.inf# No need to check the covariance matrices because we use# Cholesky decomposition in the numerical routines# Prior Q = log_prior(pi, mu, Sigma)if Q ==-np.inf:return Q# Sum of gamma's over the data gamma_sum = np.sum(gamma, axis=0)# Mixing coefficients Q += np.dot(gamma_sum, np.log(pi))# Terms from Gaussiansfor k inrange(K): Q += np.dot(gamma[:, k], st.multivariate_normal.logpdf(y, mu[k], Sigma[k]))return Q
Since we will be using numerical optimization routines, we need to store all of the parameters in an array. We will therefore need function to pack and unpack this array. We also will use the Cholesky decomposition of the covariance matrices. If \(d\) is the dimensionality of the data, then the covariance matrix is \(d\times d\), and there are a total of \(d(d+1)/2\) nonzero entries in the Cholesky decomposition.
def pack_params(pi, mu, Sigma):"""Pack parameters into a single 1D Numpy array. Parameters ---------- pi : Numpy array, shape (K,) Mixing coefficients mu : Numpy array, shape (K, d) Centers of Gaussian components Sigma : Numpy array, shape (K, d, d) Covariance matrices for Gaussian components Returns ------- output : Numpy array, shape (K + K*d + K*d*(d+1)//2 - 1,) A 1D array containg parameter values. The first K-1 entries are the mixing coefficients. The Kth mixing coefficient is omitted because it must be equal to 1 - sum(other pi's). The next K * d entries are the values of mu. The remaining entries are the Cholesky decompositions of the covariance matrices. """# Problem dimensions K, d = mu.shape# Cholesky decompositions of the Sigma's L = np.empty_like(Sigma)for k inrange(K): L[k] = np.linalg.cholesky(Sigma[k])# Params array holds K-1 pi values, K*d mu values,# and K * d * (d + 1) / 2 values of Cholesky decomps params = np.empty((K -1) + (K * d) + (K * d * (d +1) //2))# Omit the last entry in pi, since that is not manipulated by solver params[: K -1] = pi[: K -1]# Plop in the mu's params[K -1 : K * (1+ d) -1] = np.ravel(mu)# Put in nonzero terms of Cholesky decomposition params[K * (1+ d) -1 : ] = np.concatenate([L_k[np.tril_indices(d)] for L_k in L])return paramsdef unpack_params(params, K, d):"""Unpack parameters from a single 1D Numpy array. Parameters ---------- params : Numpy array, shape (K + K*d + K*d*(d+1)//2 - 1,) A 1D array containg parameter values. The first K-1 entries are the mixing coefficients. The Kth mixing coefficient is omitted because it must be equal to 1 - sum(other pi's). The next K * d entries are the values of mu. The remaining entries are the Cholesky decompositions of the covariance matrices. K : int Number of components in the mixture model d : int Dimensionality of the data Returns ------- pi : Numpy array, shape (K,) Mixing coefficients mu : Numpy array, shape (K, d) Centers of Gaussian components Sigma : Numpy array, shape (K, d, d) Covariance matrices for Gaussian components """# Give pi containing all entries pi = np.concatenate((params[:K-1], [1.0- np.sum(params[:K-1])]))# Pull out mu's mu = params[K -1 : K * (1+ d) -1].reshape((K, d))# Grab Cholesky decompositions L = np.zeros((K, d, d)) i = K * (1+ d) -1for k inrange(K): L[k][np.tril_indices(d)] = params[i + k * d * (d +1) //2 : i + (k +1) * d * (d +1) //2]# Convert Cholesky decompositions to covariance matrices Sigma = np.empty((K, d, d))for k inrange(K): Sigma[k] = np.dot(L[k], L[k].transpose())return pi, mu, Sigma
Now that we have an array of parameters, we can code up an objective function for our solver.
def negative_Q(params, y, gamma, log_prior):"""Negative of parameter-dependent part of the surrogate function Q(θ, φ) for Gaussian mixture models. Parameters ---------- y : Numpy array, shape (N, d) The measurements. pi : Numpy array, shape (K,) The mixing coefficients. mu : Numpy array, shape (K, d) mu[k, :] is a vector giving the center of Gaussian component k. Sigma : Numpy array, shape (K, d, d) Sigma[k, :, :] is the covariance matrix of Gaussian component k. gamma : Numpy array, shape (N, K) gamma[i, k] is the responsibility of component k for data point i. log_prior : function Function with call signature log_prior(pi, mu, Sigma) that returns the scalar log prior. Returns ------- negative_Q : float The parameter-dependent part of the surrogate function for a Gaussian mixture model multiplied by -1. """return-surrogate(y, *unpack_params(params, gamma.shape[1], y.shape[1]), gamma, log_prior)
To perform the M step, we first use the analytical results for uniform priors outlined in Section 39.2.3. If we do indeed have uniform priors, then the job is done. If not, we use these parameter values as an initial guess for a numerical solver. For simplicity, we will use Powell’s method, setting out-of-bounds parameter values (such as parameter sets with \(\sum_k \pi_k \ne 1\)) to \(-\infty\).
def m_step_no_prior(y, pi, mu, Sigma, gamma): """Perform the M step of an EM algorithm for a GMM where the model has uniform priors. Parameters ---------- y : Numpy array, shape (N, d) The measurements. pi : Numpy array, shape (K,) The mixing coefficients. mu : Numpy array, shape (K, d) mu[k, :] is a vector giving the center of Gaussian component k. Sigma : Numpy array, shape (K, d, d) Sigma[k, :, :] is the covariance matrix of Gaussian component k. gamma : Numpy array, shape (N, K) gamma[i, k] is the responsibility of component k for data point i. Returns ------- pi : Numpy array, shape (K,) Maximizing mixing coefficients. mu : Numpy array, shape (K, d) Maximizing mu where mu[k, :] is a vector giving the center of Gaussian component k. Sigma : Numpy array, shape (K, d, d) Maximizing Sigma where Sigma[k, :, :] is the covariance matrix of Gaussian component k. """ N =len(y) K =len(pi) gamma_sum = np.sum(gamma, axis=0)# Update mu valuesfor k inrange(K): mu[k] = np.dot(gamma[:, k], y) / gamma_sum[k]# Update Sigma valuesfor k inrange(K): Sigma[k] = np.zeros_like(Sigma[k])for i inrange(N): Sigma[k] += gamma[i, k] * np.outer(y[i] - mu[k], y[i] - mu[k]) Sigma[k] /= gamma_sum[k]# Update pi values pi = gamma_sum / Nreturn pi, mu, Sigmadef m_step(y, pi, mu, Sigma, gamma, log_prior):"""Perform the M step of an EM algorithm for a GMM. Parameters ---------- y : Numpy array, shape (N, d) The measurements. pi : Numpy array, shape (K,) The mixing coefficients. mu : Numpy array, shape (K, d) mu[k, :] is a vector giving the center of Gaussian component k. Sigma : Numpy array, shape (K, d, d) Sigma[k, :, :] is the covariance matrix of Gaussian component k. gamma : Numpy array, shape (N, K) gamma[i, k] is the responsibility of component k for data point i. log_prior : function Function with call signature log_prior(pi, mu, Sigma) that returns the scalar log prior. Returns ------- pi : Numpy array, shape (K,) Maximizing mixing coefficients. mu : Numpy array, shape (K, d) Maximizing mu where mu[k, :] is a vector giving the center of Gaussian component k. Sigma : Numpy array, shape (K, d, d) Maximizing Sigma where Sigma[k, :, :] is the covariance matrix of Gaussian component k. """# Initial guess of parameters comes from optimization without prior pi, mu, Sigma = m_step_no_prior(y, pi, mu, Sigma, gamma)# If there is no prior, returnif log_prior isNone:return pi, mu, Sigmaelse: res = scipy.optimize.minimize( negative_Q, pack_params(pi, mu, Sigma), args=(y, gamma, log_prior), method='powell' )return unpack_params(res.x, len(pi), y.shape[1])
To assess convergence of the EM algorithm, we will need to check how the log posterior changes from iteration to iteration. The log prior function is supplied as an argument, but the log likelihood function is the same for all GMMs. We code that up now.
def gmm_log_likelihood(y, pi, mu, Sigma):"""Log likelihood function for a GMM. Parameters ---------- y : Numpy array, shape (N, d) The measurements. pi : Numpy array, shape (K,) The mixing coefficients. mu : Numpy array, shape (K, d) mu[k, :] is a vector giving the center of Gaussian component k. Sigma : Numpy array, shape (K, d, d) Sigma[k, :, :] is the covariance matrix of Gaussian component k. Returns ------- output : float The log likelihood. """ N =len(y) K =len(pi) log_lik =0.0 log_likes = np.empty(K)for i inrange(N): for k inrange(K): log_likes[k] = np.log(pi[k]) + st.multivariate_normal.logpdf(y[i], mu[k], Sigma[k]) log_lik += scipy.special.logsumexp(log_likes)return log_lik
We finally have all the pieces to code up the complete EM algorithm!
def em(y, pi, mu, Sigma, log_prior=None, max_iter=100, tol=1e-6): """Perform the M step of an EM algorithm for a GMM. Parameters ---------- y : Numpy array, shape (N, d) The measurements. pi : Numpy array, shape (K,) Initial guess of the mixing coefficients. mu : Numpy array, shape (K, d) Initial guess of the centers of the Gaussian components. mu[k, :] is a vector giving the center of Gaussian component k. Sigma : Numpy array, shape (K, d, d) Initial guess of the covariance matrices of the Gaussian components. Sigma[k, :, :] is the covariance matrix of Gaussian component k. log_prior : function or None, default None Function with call signature log_prior(pi, mu, Sigma) that returns the scalar log prior. If None, uniform priors on all parameters are assumed. max_iter : int, default 100 Maximum number of iterations. tol : float, default 1e-6 Tolerance for convergence. If the log posterior changes by less than `tol` from one iteration to the next, convergence is achieved. Returns ------- pi : Numpy array, shape (K,) MAP mixing coefficients. mu : Numpy array, shape (K, d) MAP mu where mu[k, :] is a vector giving the center of Gaussian component k. Sigma : Numpy array, shape (K, d, d) MAP Sigma where Sigma[k, :, :] is the covariance matrix of Gaussian component k. """ params = pack_params(pi, mu, Sigma) log_post =-np.inffor i inrange(max_iter):# E-step gamma = responsibilities(y, pi, mu, Sigma)# M-step pi, mu, Sigma = m_step(y, pi, mu, Sigma, gamma, log_prior)# Check for convergences log_post_new = gmm_log_likelihood(y, pi, mu, Sigma)if log_prior isnotNone: log_post_new += log_prior(pi, mu, Sigma)if log_post_new - log_post < tol:breakelse: log_post = log_post_newif i == max_iter:raiseRuntimeError(f'Failed to converge in {max_iter} steps with tolerance {tol}.')return pi, mu, Sigma
Let’s put our EM algorithm to the test with the Old Faithful data! We code up the prior for the Old Faithful data.
def log_prior_old_faithful(pi, mu, Sigma):# No need to check if pi_k isout-of-bounds; checked in surrogate function # No need to add rho or pi terms, since they're uniform# Break nonidentifiability by enforcing the first entry mu is sortedif mu[0, 0] > mu[1, 0]:return-np.inf lp = st.norm.logpdf(mu[0, 0], 5, 5) + st.norm.logpdf(mu[0, 1], 60, 10) lp += st.norm.logpdf(mu[1, 0], 5, 5) + st.norm.logpdf(mu[1, 1], 60, 10) lp += st.halfnorm.logpdf(Sigma[0, 0, 0], 0, 5) + st.halfnorm.logpdf(Sigma[0, 1, 1], 0, 10) lp += st.halfnorm.logpdf(Sigma[1, 0, 0], 0, 5) + st.halfnorm.logpdf(Sigma[1, 1, 1], 0, 10)return lp
We now supply initial guesses and let the EM algorithm get to work!
# Initial guessespi = np.array([0.5, 0.5])mu = np.array([[1.0, 50.0], [3.0, 50.0]])Sigma = np.array([[[2, 0.5],[0.5, 7]], [[2, 0.6],[0.6, 8]]])# MAP!pi, mu, Sigma = em(y, pi, mu, Sigma, log_prior=log_prior_old_faithful)# Responsibilities are useful to ascribe data points # to components of mixturegamma = responsibilities(y, pi, mu, Sigma)
/Users/bois/miniconda3/envs/datasai/lib/python3.12/site-packages/scipy/optimize/_optimize.py:2502: RuntimeWarning: invalid value encountered in scalar multiply
tmp2 = (x - v) * (fx - fw)
Now that we have our MAP, we can compute an approximate posterior, as usual by making an assumption of local normality. To do this, we need to code of a log posterior function we can differentiate.
def log_posterior_old_faithful(params, y, K, log_prior):"""Log posterior of a GMM. """# Unpack parameters d = y.shape[1] pi, mu, Sigma = unpack_params(params, K, d)# log prior log_post = log_prior_old_faithful(pi, mu, Sigma)# log likelihood log_post += gmm_log_likelihood(y, pi, mu, Sigma)return log_post
We now numerically compute the Hessian of the log posterior to get the covariance matrix for the approximate posterior.
Focusing on the centers of the components of the GMM, we can report credible intervals.
print("""Most probable parameters in units of minutes--------------------------------------------Component 1: mean duration = {0:.2f} ± {1:.2f} mean waiting = {2:.2f} ± {3:.2f}Component 2: mean duration = {4:.2f} ± {5:.2f} mean waiting = {6:.2f} ± {7:.2f}""".format( mu[0, 0],1.96* np.sqrt(posterior_cov[1, 1]), mu[0, 1],1.96* np.sqrt(posterior_cov[2, 2]), mu[1, 0],1.96* np.sqrt(posterior_cov[3, 3]), mu[1, 1],1.96* np.sqrt(posterior_cov[4, 4]), ))
Most probable parameters in units of minutes
--------------------------------------------
Component 1:
mean duration = 2.04 ± 0.05
mean waiting = 54.50 ± 1.07
Component 2:
mean duration = 4.29 ± 0.06
mean waiting = 79.94 ± 0.84
We start by instantiating a GMM with the appropriate number \(K\) of components in the mixture. For the present data set, \(K = 2\).
gmm = sklearn.mixture.GaussianMixture(2)
Next, we “fit” the model using our data set y.
gmm.fit(y)
GaussianMixture(n_components=2)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
GaussianMixture(n_components=2)
We can pull the MAP (again, it’s a MAP for a model with uniform priors) using the appropriate attributes. For example, our array mu above, in which the first row is \(\boldsymbol{\mu}\) for the first component and the second row is \(\boldsymbol{\mu}\) for the second component, is accessed using the means_ attribute of the GaussianMixture instance.
This is quite similar to the result we got with the full prior, which is not surprising given we had plenty of data and the model seems to be a good descriptor of the generative process. Note, though, that the order of the components may be different, since scikit-learn does not handle the label-switching nonidenfiability.
Similarly, we can get the values of \(\mathsf{\Sigma}\) using the covariances_ attribute.