84 Heavy-tailed distributions and outliers revisited
In ?exr-heavy-tails-outliers, you modeled zebrafish activity using a Normal model and also with a Student-t model that allows for heavier tails. In that case, you used Markov chain Monte Carlo. Here, you will repeat the problem, but use optimization to get parameter estimates.
Because I am having trouble getting references to exercises working, I am repeating much of the text from ?exr-heavy-tails-outliers below.
David Prober’s lab studies sleep using zebrafish as a model organism. In a paper by Gandhi and coworkers his lab studied the effect of a deletion in the gene coding for arylalkylamine N-acetyltransferase (aanat), which is a key enzyme in the rhythmic production of melatonin. Melatonin is a hormone responsible for regulation of circadian rhythms. It is often taken as a drug to treat sleep disorders. The goal of this study was to investigate the effects of aanat deletion on sleep pattern in 5+ day old zebrafish larvae.
In one of the analyses, the authors compared the average activity of multiple fish over the course of night 6 of their lives. Here, activity is defined as the number of seconds per ten minutes that the fish is moving. You can download the results of the data processing here: https://s3.amazonaws.com/bebi103.caltech.edu/data/gandhi_et_al_night_six_activity.csv.
In performing exploratory data analysis, you will see that there are some clear outliers in activity. For at least two of these outliers, domain experts have told me that there were developmental problems with the fish.
a) Construct a model assuming that the activity of fish for each respective genotype is Normally distributed. Obtain MAP estimates for the mean activity (parameter \(\mu\)) for each genotype with credible intervals.
b) Normal models tend to fail when there are outliers. This is because of the very light tail of the Normal distribution. If you have an experiment that you suspect may have major deviations, it is often useful to choose a generative model with heavier tails. To that end, build a model where the distribution for the activity has a distribution with a heavier tail. Obtain MAP estimates for the parameter \(\mu\) with credible intervals. Comment on how the inference changes using this model. It will help to make graphical displays of the data and the inference overlaid.