Principal component analysis and related models

Neuroscience abounds with models involving latent parameters. A latent parameter is a parameter or random variable that cannot be observed. They are central to both encoding and decoding problems, which are at the heart of neuroscience.

As a reminder, an encoding problem addresses how stimuli or other inputs produce a neuronal response. That is, in an encoding problem, we seek to infer neuronal activity given an observed stimulus or behavior. By contrast, in a decoding problem, we seek to infer what stimulus resulted in an observed neuronal activity or what behavior results from an observed neuronal activity. In both cases, there are latent variables that link stimuli and behaviors to and from neuronal activity.

As a simple example, we can imagine observing, e.g., by calcium imaging, activity of many neurons in the brain. There may only be a few simple “causes” of this observed behavior.¹ In the following lessons, we will see that factor analysis and the associated models of principal component analysis and probabilistic principal component analysis provide a set of formal models in which lower-dimensional latent variables may be inferred from high-dimensional observational data. For this reason, these models are sometimes referred to as models for dimensionality reduction.

I am being intentional with the use of the word “cause” here. Neuroscientists ascribe the word cause somewhat liberally to mean parameters on which observed data are conditioned. See, e.g., Chapter 10 of Dayan and Abbott. This is different from more restrictive definitions of causes in the context of causal inference; see Judea Pearl’s book.↩︎