Sunday, June 11, 2017

Posterior distribution of predictions in multiple linear regression

In a previous post I showed how to compute posterior predictions from multiple linear regression inside JAGS. In this post I show how to do it after JAGS, in R.

There are trade-offs doing it in JAGS or afterwards in R. If you do it in JAGS, then you know that the model you're using to generate predictions is exactly the model you're using to describe the data, because you only specify the model once. If you do it after JAGS in R, then you have to code the model all over again in R, and face the peril of making mistakes because JAGS and R parameterize distributions differently (e.g., in dnorm and dt are parameterized differently in JAGS than in R). On the other hand, if you generate the predictions in JAGS, then you have to specify all the to-be-probed x values in advance, along with the data to be fit. If instead you generate predictions in R (after JAGS), then you can re-use the already-generated MCMC chain for to-be-probed x values as much as you like, without having to run MCMC all over again. In this post, we face the peril of specifying the model in R.

Please see the previous post for background information about this specific application to predicting SAT scores from spending per student and percentage of students taking the exam. To run the R code below, please open the high-level script Jags-Ymet-XmetMulti-Mrobust-Example.R. Run the script "out of the box" so it uses the SAT data (Guber1999data.csv) and the two predictors mentioned in the previous sentence. Then, at the end of the script, append the following R code:

# Remind self of the predictors:
show( xName )
# Specify probe values of the predictors. Columns must match predictors actually
# used, in order! Specify as many probe rows as you like:
xProbe = matrix( c( 4.0 , 10 ,   # Spend , PrcntTake
                    9.0 , 10 ,   # Spend , PrcntTake
                    4.0 , 90 ,   # Spend , PrcntTake
                    9.0 , 10 ) , # Spend , PrcntTake
                 ncol=length(xName) , byrow=TRUE )
# Convert mcmc coda object to matrix:
mcmcMat = as.matrix( mcmcCoda )
# Show column names of mcmcMat. Notice they include beta0, beta[1], beta[2],
# ..., which can be extracted using grep("^beta",colnames(mcmcMat)):
colnames( mcmcMat )
# Now, go through rows of xProbe matrix and display predicted mu and y:
for ( xProbeIdx in 1:nrow(xProbe) ) {
  # Predicted mu is b0+b1*x1+b2*x2+... at every step in chain:
  muPred = ( mcmcMat[,grep("^beta",colnames(mcmcMat))]
             %*% matrix(c(1,xProbe[xProbeIdx,]),ncol=1) )
  # Predicted y is is pred mu plus t-distrib noise at every step in chain:
  yPred = muPred + mcmcMat[,"sigma"] * rt(nrow(mcmcMat),df=mcmcMat[,"nu"])
  # Make plot of posterior distribution of prediction. Notice that plotPost()
  # also returns numerical values to console.
  par( mar=c(4,1,1,1) , mgp=c(2.0,0.7,0) , xpd=NA )
  text(0,0, adj=c(0.5,0.7),
       paste0(xName,"=",xProbe[xProbeIdx,],collapse="\n"), cex=1.25 )
  xLim = quantile( yPred , probs=c(0.005,0.995) )
  muPredInfo = plotPost( muPred ,xlab="Predicted mu" , xlim=xLim ,
                 main="" )
  show( muPredInfo )
  yPredInfo = plotPost( yPred , xlab="Predicted y" , xlim=xLim ,
                main="" )
  show( yPredInfo )

The code produces output plots such as the following (along with numerical values displayed on the command line):

N.B.: The plots only display the first three significant digits; see the numerical output at the command line for more digits.


Reminders of some recent posts:

Monday, May 29, 2017

New version of BEST package (Bayesian estimation of two groups)

Mike Meredith has updated the BEST package again! 

CHANGES in v.0.5.0 (2017-05-28)

  * 'BESTmcmc' uses 'rjags' directly, instead of 'jagsUI' wrappers. This resolves 'set.seed' issues, but values returned will not be the same as with previous versions.

  * Function 'hdi' removed; imports HDInterval::hdi instead.

Mike created and updates the BEST package entirely on his own. He has implemented some wonderful functionality and protections in the BEST package that are not in the original R code I created to accompany the BEST article. Thanks Mike!

Reminders of some recent posts:

Thursday, May 25, 2017

Looking for great teachers of Bayesian statistics

I'm looking for great teachers of Bayesian statistics!
  • Have you taught a course in Bayesian statistics? 
  • Have you used DBDA2E
  • Are you an enthusiastic teacher with very high evaluations?
  • Do you have an abiding interest in a range of applied statistical methods?
If so, please send me an email ( Even if you think I should already know about you, please send me an email now so I can be sure to put you on the list. If you know of someone else who has those qualities -- maybe a colleague, or maybe a teacher in a class you've taken -- please tell me about that person.


P.S. This applies to instructors anywhere on the planet; just need to be fluent in English for teaching.

Thursday, May 4, 2017

Bent Coins for a Twisted Mind

Ben Motz voluntarily sat through an entire semester of my Bayesian stats course. In karmic retribution, after the conclusion of the course he surprised me with a set of biased coins that he designed and created with the amazing craftsmanship of Jesse Goode. Obverse and reverse:

Heads: In Bayes Rule We Trust!

Tails: Wagging at a (posterior) beta distribution
parameterized by mode and concentration.
(And notice the half-folded ears.)

Ben was intrigued by claims [see footnote 2, p. 73, of DBDA2E] that normal coins when flipped cannot be biased (unlike normal coins when spun), but bent coins when flipped can be biased. Ever the empiricist, he decided to conduct an experiment using progressively bent coins (while manifestly expressing his teacher evaluation at the same time).

A set of progressively bent coins!

Each coin was flipped 100 times, letting it land on a soft mat. The results are shown below:
Data from flipping each coin 100 times. Prior was beta(1,1).

Clearly the most acutely bent coins do not come up heads half the time. One paradoxical thing I like about a bent coin is that the less you can see of its face, the more its face comes up!

To preserve the apparatus of this classic experiment for posterity, and especially to give me something for show-and-tell at the old Bayesians' home, Jesse built a beautiful display box:

Protected by plexiglass from the thronging crowds of onlookers.

How did they manufacture these coins? It was quite a process. Starting with discs of metal, Jesse powder coated and baked them to get a smooth and secure coating. Then he used a computerized laser to burn off areas of the coating to reveal the shiny metal as background to the design. Finally, they used psychic telekenesis to bend the coins. (Ben assured me, however, that he withheld psychokenesis when flipping the coins.)

I've gotta admit this made my day, and it makes me smile and laugh even as I type this. I hope it gives you a smile too! Huge thanks to Ben Motz and Jesse Goode.

Sunday, April 16, 2017

Wednesday, April 12, 2017

New article: Bayesian for newcomers

Just published: Bayesian data analysis for newcomers.

Abstract: This article explains the foundational concepts of Bayesian data analysis using virtually no mathematical notation. Bayesian ideas already match your intuitions from everyday reasoning and from traditional data analysis. Simple examples of Bayesian data analysis are presented, that illustrate how the information delivered by a Bayesian analysis can be directly interpreted. Bayesian approaches to null-value assessment are discussed. The article clarifies misconceptions about Bayesian methods that newcomers might have acquired elsewhere. We discuss prior distributions and explain how they are not a liability but an important asset. We discuss the relation of Bayesian data analysis to Bayesian models of mind, and we briefly discuss what methodological problems Bayesian data analysis is not meant to solve. After you have read this article, you should have a clear sense of how Bayesian data analysis works and the sort of information it delivers, and why that information is so intuitive and useful for drawing conclusions from data.

Published version at
Accepted manuscript available at
View only: .

Saturday, April 8, 2017

Trade-off of between-group and within-group variance (and implosive shrinkage)

Background: Consider data that would traditionally be analyzed as single-factor ANOVA; that is, a continuous metric predicted variable, \(y\), and a nominal predictor, "Group." In particular, consider the data plotted as red dots here:

A Bayesian approach easily allows a hierarchical model in which both the within-group and between-group variance are estimated. The hierarchical structure imposes shrinkage on the estimated group means. All of this explained in Chapter 19 of DBDA2E

The data above come from Exercise 19.1, which is designed to illustrate "implosive shrinkage." Because there are only a few data points in each group, and there are lots of groups with little variance from baseline, a reasonable description of the data merely collapses the group means close to baseline while expanding the estimate of within-group variance. 

The purpose of the present post is to show the trade-off in the posterior distribution of the estimated within-group variance and between-group variance, while also providing another view of implosive shrinkage.

After running the data through the generic hierarchical model in Exercise 19.1, we can look at the posterior distribution of the within-group and between-group standard deviations. The code below produces the following plot:

mcmcMat = as.matrix( mcmcCoda )
plot( mcmcMat[,"aSigma"] , mcmcMat[,"ySigma"] , 
      xlab="Between-Group Sigma" , ylab="Within-Group Sigma" , 
      col="skyblue" , cex.lab=1.5 , 
      main="Trade-Off of Variances (and implosive shrinkage)" )
abline( lm( mcmcMat[,"ySigma"] ~ mcmcMat[,"aSigma"] ) )

Notice in the plot that as the between-group standard deviation gets larger, the within-group standard deviation gets smaller. Notice also the implosive shrinkage: The estimate of between-group standard deviation is "piled up" against zero.