Wednesday, August 20, 2014

Converting combination of random variables to hierarchical form for JAGS (BUGS, Stan, etc.)

An emailer asks:

Hi, John. Long-time listener, first-time caller... I have a model that says X is a function of three (independent) random variables:
X ~ normal(mu, sigma) / uniform(a,a+b) - beta(v,w)
and I also have N random samples of X. Can I use JAGS to estimate mu, sigma, a, b, v and w? Or is this problem outside its scope? Not sure if it's even possible.

My reply:

In other words, you want

x <- r / u - b
r ~ normal( rMean , rSD )
u ~ uniform( uLow , uHigh )
b ~ beta( bA , bB )

but the problem is that JAGS won't accept that sort of model specification.

One solution is to change to an equivalent hierarchical formalization:

x ~ normal( rMean/u-b , rSD/u )
u ~ uniform( uLow , uHigh )
b ~ beta( bA , bB )

This hierarchical form is directly implementable in JAGS. But be careful that in JAGS the normal is parameterized by precision, not SD, so you'll need to write x ~ dnorm( rMean/u-b , 1/(rSD/u)^2 ).

To demonstrate that the two forms are equivalent, here is a Monte Carlo sample generated in R from each formalization:

# Specify parameter values and sample size:
rMean=7 ; rSD=2
uLow=3 ; uHigh=5
bA=4 ; bB=3
N = 50000

# Sample x values your original way:
b = rbeta(N, bA,bB)
u = runif(N, uLow,uHigh)
r = rnorm(N, rMean,rSD)
x1 = r/u-b

# Sample x values a new way:
x2 = rep(0,N)
for ( i in 1:N ) {
  b = rbeta(1, bA,bB)
  u = runif(1, uLow,uHigh)
  x2[i] = rnorm(1, rMean/u-b , rSD/u )

# Plot results:
text( mean(x1) , 0 , "+" , cex=2 )
text( mean(x2) , 0 , "+" , cex=2 )

The result is shown below, where you can see that the original and new distributions are identical:

Friday, August 15, 2014

How to use MCMC posterior as prior for future data

An emailer writes:

Dear Prof. Kruschke,
Hello. My name is ... and I am ... . I'm trying to apply Bayesian theorem in developing a model of ... . I used your code to estimate posterior distribution without any trouble. Here is my question. Would you kindly let me know how to use the estimated posterior distribution as the prior of another Bayesian update? ...  Any comment would be greatly appreciated. Thank you very much for your time. Regards,
 My reply:

Thanks very much for your interest in the book.

Because MCMC methods require a mathematically specified prior, but generate a Monte Carlo sample of the posterior, you need to either (a) find a reasonable mathematical summary of the MCMC posterior to use as the mathematical prior for the next batch of data or (b) concatenate the previous data with the next batch of data and analyze them together.

Option (b) gives you the mathematically correct answer, but at the cost of a large data set which might slow down the MCMC process, and it demands that the previous and next batch of data are structured the same way and analyzed with the same model.

For option (a), there is no uniquely correct approach. For each parameter, examine its marginal MCMC posterior distribution and summarize it with a reasonable mathematical function. For example, if the parameter is metric on positive values, maybe a gamma distribution would mimic the MCMC distribution well. Use a mathematical distribution with the same central tendency and standard deviation as the MCMC posterior. A peril of mimicking each marginal distribution separately is that you lose all information about the correlation of parameters in the posterior. If the correlation is important for capturing the prior information you want, then you can use a multivariate prior for two or more parameters simultaneously (e.g., dmnorm in JAGS).

Hope this helps.

P.S. (Added an hour or so after the text above.) A similar thing was done in the book with pseudo-priors. See the middle paragraph of p. 251: "The shape and rate constants for the gamma pseudopriors were set so that the gamma distributions had the same mean and standard deviation as the posterior sample of kappa values. For example, consider the parameter kappa0. In the initial run without pseudopriors, a reasonable posterior sample was obtained for model index 2 (i.e., when kappa0 is actually constrained by the data, as shown in the bottom panel of Figure 10.3). That posterior sample had a mean m and a standard deviation s. The conversion from mean and standard deviation to shape and rate parameters was explained in the caption of Figure 9.8. The conversion yields shape and rate values specified in lines 55 to 56 of the code."

Tuesday, August 12, 2014

Stopping and testing intentions in p values

An emailer asks:

I am really interested in Bayesian analysis, but I don't get the issue of sampling intention being so important in frequentist t-tests; if you have 60 values you have 60 values surely - why does your intention matter? The computer does not know what I intended to do; or have I missed the point entirely!?

That is exactly the point -- intuitively (for natural Bayesians) the stopping and testing intentions should not matter (for some things), but for p values the stopping and testing intention is of the essence, sine qua non. In lieu of reading the BEST article, see this video, starting at 7:15 minutes:

One other quick question if I may; how does the BEST package differ from the BayesFactor package?

An appendix of the BEST article explains, but you can instead see this video, starting at 2:35 minutes:

Those video excerpts might make the most sense if you just start with the first and watch them through, all the way...

Friday, August 8, 2014

Prior for normality (df) parameter in t distribution

A routine way to describe outliers in metric data is with a heavy-tailed t distribution instead of with a normal distribution. The heaviness of the tails is governed by a normality parameter, ν, also called the df parameter. What is a reasonable prior for ν? One answer: A shifted exponential distribution, which is what I used in the "BEST" article and software. But the book (on pp. 432-433) seems to use a different and obscure prior distribution. The purpose of this post is to show that the book's prior on the normality parameter is, in fact, a shifted exponential.

The book's prior on ν, there called tdf, is
  tdf <- 1 - tdfGain*log(1-udf)
  udf ~ dunif(0,1)
where tdfGain is a constant. In other words, tdf is a strange logarithmic transformation of udf, and udf has a uniform prior. Well, that's the same thing as a shifted exponential on tdf, with tdfGain being its mean.

To prove that claim, we could do some math, as suggested by the method in Section 23.4 (p. 630) regarding reparameterization. But instead I'll show a large Monte Carlo sample of tdf from the book, and superimpose a shifted exponential density, and you can see that they are identical:

In each of the above graphs, the title indicates the value of tdfGain, labelled as nuGain. A large number of values, udf, were randomly generated from a uniform distribution, and then those values were transformed to tdf (=ν) values using the formula tdf <- 1 - tdfGain*log(1-udf). The histogram plots the resulting tdf (=ν) values. The title of each graph also indicates the mean of Monte Carlo sample, which should be nuGain+1 except for randomly sampling noise. Superimposed on the histogram is a exponential density with mean set to nuGain, and shifted 1 unit to the right (because ν must be > 1). Thus, you can see that the book's prior for tdf really is a shifted exponential, as used in the BEST article.

Appendix: Here is the code for generating the graphs.
for ( nuGain in c(3,30,100) ) {
  # Random sample of nu from book prior:
  nu = 1 - nuGain*log( 1 - runif(50000) )
  # Plot histogram:
  xMax = 5*nuGain
  histInfo = hist( nu , probability=TRUE ,
                   col="skyblue" , border="white" ,
                   xlab=bquote(nu) , xlim=c(0,xMax) ,
                   cex.lab=1.5 , cex.main = 1.5 ,
                   breaks=c(0,seq(1,xMax,length=30),1E10) ,
                   main=bquote( list( nuGain==.(nuGain) ,
                                      mean(nu)==.(round(mean(nu),2)) ) ) )
  # Superimpose shifted exponential curve with same mean-1:
  x = seq( 1 , xMax , length=201 )   # Make x-axis comb.
  lines( x , dexp( x-1 , 1/(nuGain) ) , lwd=2 )
  text( mean(nu) , dexp( mean(nu)-1 , 1/(nuGain ) ) ,
        bquote("curve: dexp( "*nu*"-1 , mean="*.(nuGain)*" )") ,
        adj=c(-0.1,-0.1) , cex=1.5 )

Tuesday, April 1, 2014

Potpourri 2: more misc emails regarding doing Bayesian data analysis

More miscellaneous communications with readers. (I have omitted all the salutations and pleasantries to save space here, but most correspondents do begin and end their messages with introductions and salutations that I truly appreciate.) Again, apologies to those I don't reply to -- too many!

  • Adapt ANOVA-style models for mixed between- within- designs? 
  • glm() in R, with a poem 
  • Metropolis algorithm
  • I have some data, what should I do?
  • Could you look over my conference paper?
  • Time series models?
  • How to write up a Bayesian analysis
  • Trying to adapt programs, getting errors
  • Modifying FilconJags.R and FilconJagsPower.R for two groups instead of four groups
  • Sequential updating (including more data) when using MCMC
  • How to take the difference between two parameter distributions
  • What book should be studied after Doing Bayesian Data Analysis?
  • Exercise 13.4

March 9: Adapt ANOVA-style models for mixed between- within- designs?
My data set comprises two populations: patients and control subjects.
All subjects were repeatedly assessed - albeit in different conditions, because the treatment manipulation of the patients is not applicable for controls.
While our main interest pertains the patients' treatment conditions (4 levels), I'd like to compare them with the "natural variation" observed in the control group, who just repeated our experiments twice.
Thus abstractly, the patient group is split by a within-factor of 4 levels P1-P4, and the controls are split by a 2-level within factor, C1 and C2. What's the best method to compare these, assuming normal distributed response variables?

Currently I use one-way BANOVAs, extended by a subject factor to account for repeated measures. But with these models (4 and 2 levels, respectively), I think I can only model the two population groups separately in different models, which hinders direct comparisons between parameter estimates.

My questions in particular:
1) Can/should I extend the BANOVA model to include both populations?
2) Or is it possible to compare the parameters of different BANOVA models? If so, how?
3) Does this kind of design have a specific name? This might be handy to know to search for modeling examples.
Thank you very much in advance!
I talked to a colleague, who was of the opinion that I can directly compare posteriors, provided that they are in the same units and play the same "role" in the model.
In my example, it would be admissible to compare the b0 parameters, i.e. the BANOVA baselines from different models. Using that comparison, I would at least be able to state whether the groups are different from each other on average.

Specifically, the comparison has to be directly computed from the MCMC chains. For each pair of values, randomly sampled from the chains, I would compute the difference, and thus build up a distribution of differences. This then may be inspected with the usual methods, e.g. using its 95% HDI.

While this sounds correct to me, I'd like to know some more details. How many samples should/can I draw from the chains? Should I inspect the autocorrelation? Any help and/or pointers to literature are greatly appreciated!

Regarding the model structure, yes, you should be able modify the ANOVA-style models to suit the structure of your data. This is not to say it is necessarily trivial or obvious how to do that, especially when trying to impose the sum-to-zero constraints (if you choose to do so). For example, see this blog post about a split-plot design, which is related to the sort of design you have:

Regarding computing differences: At every step in the MCMC chain, you can compute differences or ratios or other transformations of parameters in whatever way is meaningful. At each step in the chain, the parameters are *jointly* representative of the posterior distribution.

If you have split your data into two parts that are modeled separately, then the parameter values are, by definition, independent. You can then set the two chains side-by-side and take differences between them as if they were in a joint space.

March 11: glm() in R, with a poem
n R question that I cannot get to the bottom of. I was hoping that you might be able to shed some light on it, as everyone I ask is drawing a blank. 

It is rather simple actually, I have two factors, and I want to do a logistic regression against a (0,1) target using glm in R. 

I have a 0,1 target and then use some explanatory variables which are factors with levels e.g. ... When I try and perform the logistic regression it refuses to use the first level of each factor, as you can see from the screen cap jk1.png attached to this e-mail. ...  Is this something you have experience of? Am I overlooking something trivial?! Apologies if so! 
As your question is about glm() in R, and not about Bayesian analysis per se, I will defer to the R experts. You might try posting your question on or related groups.
Best wishes.
In my desperation
To elucidate the facts
I disturbed you unnecessarily
When I should have consulted stacks.

You're welcome, it's a superb book, and has helped me a great deal. Kind regards

March 1: Metropolis algorithm
I am currently doing my statistics thesis on ... which requires quite a great knowledge of Bayesian Theory especially MCMC methods. However I have some minor problems regarding the Metropolis algorithm:

Any distribution can be attributed to the proposal density function?

Also, the initial values given to the parameters (i.e. the first step of the Metropolis algorithm) are random or there must be some thinking behind the choice of values?

Thank you very much
If you are programming your own Metropolis sampler, you should study the general Metropolis-Hastings algorithm and learn about methods for "tuning" the proposal distribution during an adaptation phase. A great thing about JAGS/BUGS is that they do it for you.

The initial values can be anywhere that has non-zero prior and non-zero likelihood (actually, even that is okay - the real constraint is that the prior times likelihood needs to be defined at the initial values), but initial values near the mode of the posterior make burn-in a lot faster. In either case, you need to monitor convergence to be sure that the chains have lost any bias from their starting positions.

Good luck with your work.

March 14: I have some data, what should I do?
I'm contacting you to ask for your advise on the following problem.

I have 500 samples, each sample being ... [extended description omitted].
Using Bayesian analysis I want to determine the error rate of ... .
Below is an exemplary of the data set. [data omitted] I have never used Bayesian Analysis before so I dont really know how to proceed. I'm thinking that Bayesian analysis is suitable to answer the research question.
-The idea is to use a non informative prior, so the sample can be positive or negative.
-Not sure what distribution to use (binomial, bernoulli, multinomial,). not sure.
-The rest, I do not know.

I look forward to your reply.
Thanks very much.
Thanks for you interest in Bayesian analysis. I hope you do choose to pursue it.

The first thing you need to analyze your data is a descriptive model. The goal of the analysis is to estimate parameter values that are meaningful descriptions of the data. Once you have a model, Bayesian analysis provides a distribution of credible values of the parameters, given the data.

As your field of research is not my specialization, I'm not in a position to prescribe the model you should use. But, whatever it is, I am confident that Bayesian analysis is a richly informative method for analyzing your data.

Best wishes,

 March 16: Could you look over my conference paper?
... I will be presenting a paper at the annual meeting of the ... Association. My paper is based on your BEST software, and I was wondering if you would be willing to take a look at it to point out any major errors I may have committed and save me from potential embarrassment.
Thanks for thinking of me and showing me a draft of your paper. [And I meant that sincerely!] Unfortunately I don't have time to study its 20 pages. If you have a specific question about some topic, however, I might be able to provide a quick answer. Best of luck with your presentation!

March 16: Time series models?
Do you think it is possible to apply Bayesian data methods to time series? What I would hope to avoid is trying to splice GARCH methodology with the Bayesian approach. GARCH is lumbering and clumsy enough in itself. For high frequency financial data, it must surely be possible to assume distribution stability, until evidence begins to emerge that the distribution has broken down, and needs to be reformulated? I would greatly value your advice.
I have not ventured deeply into time-series models, so I have no personal endorsements of particular resources. But Bayesian methods can (in principle) be applied to any model that has parameters, even so-called "non-parametric" models (which have up to an infinite number of parameters). For example, there is a 2011 book titled "Bayesian time series models" that collects recent advanced applications. Although it's not a beginner's tutorial, it might provide useful pointers.

Feb. 25: How to write up a Bayesian analysis
Sir, thanks for Bayesian Estimation Supersedes the t-Test, I can make the plots, sorry but how to analyse the plots and write a paper. Will you please put up a model paper with such analyses. [Here] we have been brought up on a strict diet of ANOVA-RBD/CRD, and t tests with CV values (Analysis of Variance - Randomized Block Design / Completely Randomized design). How can I do these analyses / similar in Bayesian methods,
Reply from Mike Meredith:

I can't point you to papers describing a Bayesian analysis of ANOVA-type models, but I will indicate sources with advice and pass this on to John Kruschke who may have more references to hand.

John has a blog post on the topic at This has a reproduction of the first part of section (23.1) on "Reporting a Bayesian analysis" from Kruschke, J.K. (2011) Doing Bayesian data analysis: a tutorial with R and BUGS Elsevier, Amsterdam etc.

The main points are summarised in Kruschke, J.K., Aguinis, H., & Joo, H. (2012) The time has come: Bayesian methods for data analysis in the organizational sciences. Organizational Research Methods, 15, 722-752 in the section "Recommendations and Illustration of How to Report Bayesian Analysis Results", where they also give an example of writing up a regression analysis.

Also Kruschke, J.K. (2013) Bayesian estimation supersedes the t test. *Journal of Experimental Psychology: General*, 142, 573-603 has a section on "Reporting the results of a Bayesian analysis". You would need to cite that paper as the description of the methods for robust Bayesian analysis.

There are links to the two Kruschke papers at

In my own field (wildlife ecology) we can rarely do experiments, so hypothesis testing is scarcely relevant anyway. The focus is on estimation (of density, abundance, occupancy, survival, etc) or on modelling. In the latter context, information theoretic approaches (using Akaike's Information Criterion, AIC) have been the norm, and moving from there to a Bayesian approach is relatively painless. So I can't point you to papers reporting a Bayesian analysis of ANOVA-type data; Marc Kery's book, An Introduction to WinBUGS for Ecologists (Academic Press, 2010), shows how to do this, but not how to write it up.

April 1: Trying to adapt programs, getting errors
I have been looking through your textbook Doing Bayesian Data Analysis and specifically have been looking at how to do the Bayesian ANOVA in Chapter 18. I have been running into many problems, and I was wondering if you would have time to take a quick look over my code to see what's going wrong. I have no familiarity with R prior to this book and no experience programming, so I am not sure where my errors are. Generally speaking, the problem is that objects and functions are not being found. I have attached a the .csv of my imaginary data set and a .txt of the code I am running in RStudio in case you are able to take a look (I don't mean to presume you will, it's just quicker to already have the materials attached). I do load the .csv into RStudio before running the code. I suppose my core problem here is that I don't know how to take the code from the book using the examples there and apply the code to my own data sets for personal use. Thank you for your time,
Thanks very much for your interest in the book. To have any chance to help you, I need to know what error messages are being shown. Could you please run your script and then copy and paste the R console, with error messages, into an email? Especially important is to identify the very first error that occurs.
Then I run the code, which I have attempted (unsuccessfully) to adapt from what is in the textbook for a Bayesian ANOVA.

After running the code, this is what I get. It seems like it's failing to create several objects that it needs to perform the analysis, or is not finding the functions it needs to do it. I'm not really sure where the problem begins. I know in programming code a single error can lead to other errors, but this is my first exposure to R programming and the code in the book is for specific practice examples and I'm just not sure if I am adapting it to my pretend data set properly (which I'm most likely not given the errors).

 I also get several windows where graphs are supposed to be, but nothing is in them. I colored the errors red to make them easier to find.


> modelCheck( "model.txt" )
Error: could not find function "modelCheck"
+     fnroot = paste( fnroot , dataSource , sep="" )
+     datarecord = read.table( "model.txt", header=T ,
+                              colClasses=c("factor","numeric") )
Error: unexpected symbol in:
"    NxLvl = length(unique(datarecord$Group))
    contrastList = list(1v2"
A few quick thoughts:

* If you're just interested in comparing two groups, definitely look here:
It provides you with complete programs for that case.

* I now strongly recommend NOT using BUGS, and instead using JAGS. JAGS is much like BUGS but works better. See these posts:

* Your first error,
> modelCheck( "model.txt" )
Error: could not find function "modelCheck"
is that R cannot find BRugs. I think you need to first say
or whatever (I've forgotten since it's been so long that I've used BRugs).

* Your second error is that datarecord() is reading in model.txt, not data.
Hope that helps.

March 25: Modifying FilconJags.R and FilconJagsPower.R for two groups instead of four groups
The R code assumes the existence of four conditions (consistent with the ongoing example), but the data file has data for only two, so the code fails when it attempts to access the non-existent data. I can remove the code for handling the two conditions not in the data, but I assume that the intent was to have all four in the Rdata file. Is there a version of FilconJagsMuKappa.Rdata that has the expected four conditions?
If I understand your question correctly, what you are trying to do is apply the FilconJags.R and FilconJagsPower.R programs to data with only 2 groups instead of 4 groups. I think the programs should run without complaint for just 2 groups, although there won't be much benefit (i.e., shrinkage) from the hierarchical structure when there are only 2 groups.

Try this:
In FilconJags.R, include only the first 2 groups. Find the data section and insert the lines shown here:

# For each subject, specify the condition s/he was in,
# the number of trials s/he experienced, and the number correct.
# (These lines intentionally exceed the margins so that they don't take up
# excessive space on the printed page.)
cond = c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4)
N = c(64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64)
z = c(45,63,58,64,58,63,51,60,59,47,63,61,60,51,59,45,61,59,60,58,63,56,63,64,64,60,64,62,49,64,64,58,64,52,64,64,64,62,64,61,59,59,55,62,51,58,55,54,59,57,58,60,54,42,59,57,59,53,53,42,59,57,29,36,51,64,60,54,54,38,61,60,61,60,62,55,38,43,58,60,44,44,32,56,43,36,38,48,32,40,40,34,45,42,41,32,48,36,29,37,53,55,50,47,46,44,50,56,58,42,58,54,57,54,51,49,52,51,49,51,46,46,42,49,46,56,42,53,55,51,55,49,53,55,40,46,56,47,54,54,42,34,35,41,48,46,39,55,30,49,27,51,41,36,45,41,53,32,43,33)

# Include only the first two groups:
includeIndex = ( cond <= 2 )
cond = cond[includeIndex]
N = N[includeIndex]
z = z[includeIndex]

nSubj = length(cond)
nCond = length(unique(cond))
Then source FilconJags.R. This will save the MCMC chain for the 2-group model. It will also try to produce graphs, but because the graphs assume four groups, the program will balk. But the posterior MCMC is saved before that happens.

FilconJagsPower.R will now run as is, except for two things. First, its graphics assume 4 groups. Therefore you can make this change:

    # Make plots for first 10 simulated experiments:
    #if ( expIdx <= 10 ) { plotRes = T } else { plotRes = F }
And, its "goals" are defined in terms of 4 groups, so you can make this change:

# Goal is filtration vs condensation 95% HDI exceeds zero:
#goal1Ach = ( HDIofMCMC( (mu[1,]+mu[2,])/2 - (mu[3,]+mu[4,])/2 )[1] > 0.0 )
# Goal is filtration1 vs filtration2 95% HDI exceeds zero:
goal2Ach = ( HDIofMCMC( mu[1,]-mu[2,] )[1] > 0.0 )
# Goal is condensation1 vs condensation2 95% HDI has width less than 0.2:
#HDIlim = HDIofMCMC( mu[3,]-mu[4,] )
#HDIwidth = HDIlim[2] - HDIlim[1]
#goal3Ach = ( HDIwidth < 0.2 )
#goalAchieved = c( goal1Ach , goal2Ach , goal3Ach )
goalAchieved = c( goal2Ach )

I hope that steers you toward achieving your goals!

April 1: Sequential updating (including more data) when using MCMC
I'm sorry to bother you, but I have a question for which I did a lot of researches, but I couldn't get any result. Basically, I implemented a Hierarchal Bayesian model in R and it works fine. However, what I will like to do now is to apply the model on a trial by trial basis (instead of applying it on the whole set of trials) and get the parameters estimations after each trial to use them both as "prior" parameters on the next trial and to track the evolution of these parameters. I do not want to use sequential methods, I just want to apply the model on a trial-by-trial methods. Do you think you can give some idea?
If you are using MCMC methods, and you really want to use the posterior from one run as the prior for the next, it is not directly possible. This is because the posterior distribution from the first run will usually be complex (e.g., curved and correlated in parameter space, with possibly skewed tails and so on) and not possible to be exactly expressed mathematically in the prior specification. (The beauty of using conjugate priors is that the posterior has the same mathematical form as the prior, so incremental updating is elegantly easy. For complex models, there are no [known] conjugate priors.)
Some people attempt to get around that problem by approximating the posterior of the first run with some simpler mathematical form, and using that mathematical re-description as the prior for the next run. But this method is only as good (or bad) as the approximation, and I wouldn't recommend it unless you have really cogent reasons to think that the approximation is excellent.
If you are merely illustrating how the posterior changes with incremental updating, then you can simply do the data analysis using incremental data sets. Do an analysis with {x1,...xN}, then an analysis using the original prior with {x1,...,xN,xN+1}, then an analysis using the original prior with {x1,...xN,xN+1,xN+2}, and so on.

March 28: How to take the difference between two parameter distributions
one thing that has me perplexed right now is how you take the difference between two posterior distributions to see if there is a change. One very common application to us is to have similar data on a specific metric before and after a ... change. Naturally, the intent of our analysts is to understand if there has been a difference after that ... change. Our first step is to calculate the posteriors of the two datasets (before and after). Then I assumed that the posteriors are a random variable so they are compared by forming the convolution of the two distributions. However, if you just visualize two normal distributions (the posteriors) where one has a mean near zero and the other is shifted (say, to 5), the convolution will still cover zero. So although these two datasets are clearly different and the software change has had an impact, the HDI will still straddle zero and the conclusion will be that there was no change. Is this method of using convolution how you did your poster comparisons in the textbook? If not, I would very much like to know the methods for taking this difference like you did with respect to recall and classical music in your text (chapter 12). Thanks very much for you time,
At every step in the MCMC chain, the parameter values at that step are jointly representative of the multidimensional posterior distribution. Therefore the difference of relevant parameter values at each step is a representative difference.

I think the first example of this in the book occurs at the top of p. 176.

Because posterior distributions can have strange shapes, it is important to compute the difference at each step, maintaining the joint parameter values. See Fig. 12.3, p. 300.

March 27: What book should be studied after Doing Bayesian Data Analysis?
What do you recommend as a good second book [after Doing Bayesian Data Analysis]? I’m a long time R user and my problem space is in engineering.
For a follow-up book, I'd recommend
Ntzoufras (2009) Bayesian modeling using WinBUGS
Gelman et al. (2014) Bayesian data analysis 3rd ed.
You might also try
Lee & Wagenmakers (2014) Bayesian cognitive modeling.

None of those is specifically engineering, but Ntzoufras and Lee & Wagenmakers give lots of BUGS/JAGS examples and Gelman et al. is a great authoritative resource.

March 27: Exercise 13.4
Exercise 13.4 refers to Kruscke1996CSbugs.R; I don’t see it (or the Jags version) in the collection of R programs and data. The associated data file is there (Kruschke1996CSdatsum.Rdata), however. Is it not there on purpose (i.e., the student is expected to write the code from scratch)?
The program for that exercise was deliberately excluded from the posted programs because it was supposed to be a programming exercise, not merely "click the 'source' button and watch what happens." I should not have mentioned the program name in the exercise description, but I did because that was how I got my word processing software (LaTeX) to include code snippets. The program I have for that exercise is old and not prepared for release, so I would prefer not to circulate it. I apologize that this is not the answer you were hoping for!

Saturday, March 22, 2014

Potpourri of recent inquires about doing Bayesian data analysis

I get a lot of email asking about the book and the blog. Too often, I say to myself, "That's interesting. I'll have to think about it and reply later." And then later never comes. My apologies to everyone to whom I have not yet replied. Below is a sample of a few of the messages I've received in the last few days, actually with replies.

  • Initializing chains for multi-factor ANOVA-style models.
  • Errors in BUGS (another reason to use JAGS).
  • Getting the decimal places of the HDI limits that don't appear in the graphs.
  • Effect size measure in ANOVA-style log-linear models.
  • Variable selection in multiple regression.

March 22, 2014:
Dear Prof. Kruschke,

by now I've opened and closed your contact form many a times, and have been successfully able to tackle each problem I had at the time using your book. Now, though, I am stumped. I am calculating a Poisson Exponential "ANOVA" for 4 nominal predictors and the code for initializing the chains troubles me. For that matter, my question is regarding any and all situations in which there are more than two nominal predictors present. The code to initialize the linear predictor sum is this:

linpred = as.vector( outer( a1 , a2 , "+" ) + a0 )

I understand the concept of the outer() function for two variables, but could you please explain how you would deal with adding a3 and a4 into this mix, assuming the rest of the code has been updated accordingly?

Kind regards,


Thanks very much for your interest in the book.

One approach is to try not to initialize manually, letting JAGS initialize for you. You have to check that the chains have burned in well and converged. But if it works, problem solved.

But if you really want to manually initialize, you have a few choices. One option is not to bother with the interaction terms. That is, initialize the main effects with lines like
a0 = mean( theData$y )
a1 = aggregate( theData$y , list( theData$x1 ) , mean )[,2] - a0
a2 = aggregate( theData$y , list( theData$x2 ) , mean )[,2] - a0

but leave the interaction terms all initialized at zero.

If you really, really want to initialize the interactions, you can use the outer() function to build higher dimensional arrays. To get help, type
at R's command line. As an example, consider this:

> a=1:2
> b=3:5
> c=6:9
> outer(a,b,"+")
     [,1] [,2] [,3]
[1,]    4    5    6
[2,]    5    6    7
> outer(c,outer(a,b,"+"),"+")
, , 1

     [,1] [,2]
[1,]   10   11
[2,]   11   12
[3,]   12   13
[4,]   13   14

, , 2

     [,1] [,2]
[1,]   11   12
[2,]   12   13
[3,]   13   14
[4,]   14   15

, , 3

     [,1] [,2]
[1,]   12   13
[2,]   13   14
[3,]   14   15
[4,]   15   16

Hope that helps!

March 20, 2014:
Hi John,

First of all I would like to mention that I am really enjoying reading the book.

I have studied a bit of Bayesian statistics in a university course and I find really interesting to see real world applications of MCMC.

I have been doing the exercises okay but I have come across a stumbling block in chapter Nine.

On exercise 9.2 when I changed the data parameters as per exercise solutions I do get the following error message. These errors appear when I run the FilconBrugs.R script in its original form as well

'Initializing chain 1:
initial values loaded and chain initialized but another chain contain uninitialized variables
Initializing chain 2:
initial values loaded and chain initialized but another chain contain uninitialized variables
Initializing chain 3:
model is initialized,

After this error then I encounter the following one:

"update error for node <theta[4]> algorithm conjugate beta updater error argument two  has too small a value"

Basically the model does not produce any posterior. If you could let me know what the error could be due to it would be much appreciated. I'm using R studio and I haven't come across any errors in the rest of exercises and code that I have ran so far.

Kind Regards,


Thanks very much for your interest in the book.

First off, I strongly recommend using JAGS instead of BUGS. I have repeatedly found that BUGS yields troubles when JAGS works fine. Here is the link to a blog post about using JAGS instead of BUGS:

Both of the errors you encounter are from BUGS. The statements about intializing the chains are not errors, but merely status reports. It's simply going through the chains and initializing them, but letting you know that after chain 1 there are still 2 more chains to go, and after chain 1 there is still another chain to go.

The bit about "update error for node..." is a genuine error. As I recall, I do not get that sort of error in JAGS. There are work-arounds in BUGS, by truncating the distribution or the values of higher-level parameters. But give JAGS a try and see if that solves the problem.

Thanks again.

March 20, 2014:
Good afternoon Mr. Kruschke,

I have a  questions about two-way ANOVA from your book. On page 522, you run an example using salary data and you mention that the means and HDI limits are displayed with only three significant digits, but more precise values can be obtained directly from the program.  How do I get those more precise values?

Thank you for help,

Thanks very much for your interest in the book.

The 22-Jun-2012 version of plotPost.R (see programs at ) produces a summary of the posterior as its returned value. For example, if you use the R statement

postInfo = plotPost( ... )

then postInfo has the mean, median, mode, HDI limits, etc.

Don't forget that even though multiple decimal points are there, they are not necessarily meaningful. The values are estimates from a random MCMC sample from the posterior, so they are noisy!

Thanks again.

March 19, 2014:
Dear Prof. Krushke, I'm working with ch 22 of DBDA to analyse contingency tables. I'm curious to know where to look for a measure of effect size, both for the whole table and for individual interactions. Any tips you have would be much appreciated!


Thanks for your interest in the book.

My short answer is that I don’t have an adequate answer. There are many relevant measures of effect size in the frequentist literature, of course, but we seek something for hierarchical Bayesian log-linear models. A measure of effect size is supposed to be an aggregated difference (i.e., the “effect”) standardized relative to some indicator of variance or noise in the data. Moreover, in a Bayesian setting, the effect size has a posterior distribution; it’s not just a point estimate. An example of posterior effect size for a difference between groups is given here:
One possibility for describing effect size in hierarchical Bayesian ANOVA-style models (including log-linear models) is a ratio of estimated variance parameters. For example, the main effect of factor A might have an “effect size” given by sigma_A divided by sigma_y, where sigma_A is the scale parameter of the distribution of the factor-A deflections, and sigma_y is the scale parameter of the within-cell noise distribution. But that’s just off the top of my head.
Thanks again,

March 18, 2014:
John, I really enjoyed reading your post [] as, similar to you, I feel that when it comes to real data there is a lot of fine tuning that is usually inadequately addressed in papers/books.

I have a problem where the posterior probability of delta=1 is always above 0.5 (even when the respective variable should clearly not be selected). I was wondering whether you made progress with your issues and perhaps can share some light on mine.



Thanks for the comment.

I’m not sure how you know that the “respective variable should clearly not be selected.” If you already know that, does that mean that you have prior knowledge that is not being expressed in the prior distribution of the model?

My impression of variable selection is that it’s a bit of an art, in the sense that the prior matters! But that’s true of Bayesian model selection in general. Real prior knowledge should be implemented if possible, and, in the case of variable selection, shrinkage from hierarchical structure should be implemented if appropriate.

Friday, March 14, 2014

Bayesian estimation and precision as the goal for data collection (expanded)

Precision as the goal for data collection, talk at U.C. Irvine, March 14, 2014.

Part 1: Rejecting null is not enough, also need estimate and precision. Bayesian estimation supersedes confidence intervals and "the new statistics".

Part 2: Two Bayesian ways to assess a null value. Highest density interval with region of practical equivalence. Bayesian model comparison and Bayes factor.

Part 3: Biased estimation in sequential testing and optional stopping.

Part 4: Monte Carlo study of biased estimation in sequential testing and optional stopping.

Parts 3 and 4 are elaborations of this previous blog post.