Tuesday, April 19, 2016

Bayesian estimation of log-normal parameters - Update

Using the log-normal density can be confusing because it's parameterized in terms of the mean and precision of the log-scale data, not the original-scale data. Thus, if your data, y, are nicely described by a log-normal distribution, the estimated mean and precision are for log(y), not y. To get the corresponding parameters for the original-scale data, you have to transform the estimated parameters. And the prior on the mean and precision have to be appropriate for log(y), as opposed to y. This post shows an example of a JAGS program (with rjags) for estimating the parameters of log-normal distributed data.
 
This post updates the R+JAGS code from a previous post. Please see that previous post for more background. The previous post had code appropriate for the 1st edition of the book; this post updates a few details for use with the 2nd edition of the book. In particular, the new code shows new MCMC diagnostic plots and uses a more appropriate prior on meanOfLogY. Here are some of its output graphs; R code follows below.


Here is the full R script:

#------------------------------------------------------------------------------

# Jags-Ymet-Xnom1grp-MlogNormal-Script.R
# April 19, 2016. John K. Kruschke.
# Requires auxiliary R scripts that accompany the book,
# Kruschke, J. K. (2015). Doing Bayesian Data Analysis, Second Edition:
# A Tutorial with R, JAGS, and Stan. Academic Press / Elsevier.

# graphics.off()
# rm(list=ls(all=TRUE))

source("DBDA2E-utilities.R")
fileNameRoot="Jags-Ymet-Xnom1grp-MlogNormal-Script"
graphFileType="png"

#------------------------------------------------------------------------------
# THE DATA.

# Generate some random data from known parameter values:
set.seed(47405)
trueLogM = 5.0 # true mean of log(y)
trueLogSD = 0.5 # true sd of log(y)
y = rnorm( n=125 )  # normal distribution of log-scale values
LogY = (y-mean(y))/sd(y)*trueLogSD + trueLogM
y = exp(LogY) # y is exponentiated values of log(y)

# OR JUST PUT IN YOUR ACTUAL DATA HERE:
# y = ...

# Package the data for shipping to JAGS:
dataList = list(
  y = y ,
  N = length(y) ,
  meanOfLogY = mean(log(y)) ,
  sdOfLogY = sd(log(y))
)

#------------------------------------------------------------------------------
# THE MODEL.

modelstring = "
  model {
    for( i in 1 : N ) {
      y[i] ~ dlnorm( muOfLogY , 1/sigmaOfLogY^2 )
    }
  sigmaOfLogY ~ dunif( 0.001*sdOfLogY , 1000*sdOfLogY )

  muOfLogY ~ dnorm( meanOfLogY , 1/(10*sdOfLogY)^2 ) # updated 8/16/2017
  muOfY <- exp(muOfLogY+sigmaOfLogY^2/2)
  modeOfY <- exp(muOfLogY-sigmaOfLogY^2)
  sigmaOfY <- sqrt(exp(2*muOfLogY+sigmaOfLogY^2)*(exp(sigmaOfLogY^2)-1))
}
" # close quote for modelstring
writeLines(modelstring,con="model.txt")

#------------------------------------------------------------------------------
# INTIALIZE THE CHAINS.
# Let JAGS do it

#------------------------------------------------------------------------------
# RUN THE CHAINS

require(rjags)

parameters = c("muOfLogY" , "sigmaOfLogY" , "muOfY" , "modeOfY" , "sigmaOfY" )
adaptSteps = 1000         # Number of steps to "tune" the samplers.
burnInSteps = 1000        # Number of steps to "burn-in" the samplers.
nChains = 3               # Number of chains to run.
numSavedSteps=20000       # Total number of steps in chains to save.
thinSteps=1               # Number of steps to "thin" (1=keep every step).
nPerChain = ceiling( ( numSavedSteps * thinSteps ) / nChains ) # Steps per chain.
# Create, initialize, and adapt the model:
jagsModel = jags.model( "model.txt" , data=dataList ,
                        n.chains=nChains , n.adapt=adaptSteps )
# Burn-in:
cat( "Burning in the MCMC chain...\n" )
update( jagsModel , n.iter=burnInSteps )
# The saved MCMC chain:
cat( "Sampling final MCMC chain...\n" )
mcmcCoda = coda.samples( jagsModel , variable.names=parameters ,
                            n.iter=nPerChain , thin=thinSteps )

#------------------------------------------------------------------------------
# EXAMINE THE RESULTS

# Display diagnostics of chain, for specified parameters:
parameterNames = varnames(mcmcCoda) # get all parameter names
for ( parName in parameterNames ) {
  diagMCMC( codaObject=mcmcCoda , parName=parName ,
            saveName=fileNameRoot , saveType=graphFileType )
}


# Convert coda-object codaSamples to matrix object for easier handling.
mcmcChain = as.matrix( mcmcCoda )
chainLength = NROW(mcmcChain)

openGraph(width=10,height=6)
layout(matrix(1:6,nrow=2,byrow=TRUE))
# posterior predictive
hist( dataList$y , xlab="y" , main="Data w. Post. Pred." , breaks=30 ,
      col="pink" , border="white" , prob=TRUE , cex.lab=1.5)
pltIdx = floor(seq(1,chainLength,length=20))
xComb = seq( min(dataList$y) , max(dataList$y) , length=501 )
for ( chnIdx in pltIdx ) {
  lines( xComb ,
         dlnorm( xComb, mcmcChain[chnIdx,"muOfLogY"], mcmcChain[chnIdx,"sigmaOfLogY"] ),
         col="skyblue" )
}
# param's of log(y)
postInfo = plotPost( mcmcChain[,"muOfLogY"] , xlab="mu of log(y)" )
postInfo = plotPost( mcmcChain[,"sigmaOfLogY"] , xlab="sigma of log(y)" )
# param's of y
postInfo = plotPost( mcmcChain[,"modeOfY"] , xlab="mode of y" )
postInfo = plotPost( mcmcChain[,"muOfY"] , xlab="mu of y" )
postInfo = plotPost( mcmcChain[,"sigmaOfY"] , xlab="sigma of y" , cenTend="mode")

saveGraph(file=fileNameRoot,type=graphFileType)

#-------------------------------------------------------------------------------

13 comments:

  1. Hi John,
    Thank you very much for this code. It is helping me to estimate a log-normal distribution of some real data I want to analyze. Once you have run your code on some data Y, how can you ask the predicted posterior distribution about the probability that the data are above a certain value?
    Kind regards

    ReplyDelete
    Replies
    1. If I understand your question, do it as follows. First, call the value you're wondering about v. Then, at every step in the MCMC chain, compute the cumulative probability distribution in the lognormal distribution above v. That gives you a posterior distribution on the predicted probability that y>v. The cumulative probability above v is 1-Phi((log(v)-mu)/sigma), for mu and sigma on the log(y) scale.

      Delete
  2. I can do it like this (I want to calculate prob x>30 for two data-sets):
    grid <- seq(0,100,.1)
    plot(grid,dlnorm(grid,0.231,1.44),type="l",xlab="Compound (microgr/m3)",ylab="f(x)", ylim=c(0,0.05), lwd=2)
    lines(grid,dlnorm(grid,1.95,1.58),col='red', lwd=3)
    abline(v=30, col='blue', lwd=2)
    legend("topright",c("July-2006 Density","January-2017 Density"),lty=1,col=1:2)

    JUL <- plnorm(30, meanlog = 0.231, sdlog = 1.44, lower.tail = FALSE, log.p = FALSE)
    JAN <- plnorm(30, meanlog = 1.95, sdlog = 1.58, lower.tail = FALSE, log.p = FALSE)

    JAN/JUL

    But, is there a different way to do it (asking all posterior draws)?

    ReplyDelete
    Replies
    1. I forgot to add that I obained the meanlog and sdlog values from your code: I used meanlog= mu of log(y) and sdlog= sigma of log(y).

      Delete
  3. hai. i want to ask a question. if my data or likelihood was normal and my prior are believe to be log normal, can i use this r code?

    ReplyDelete
    Replies
    1. No. But you could easily use code from DBDA2E (Ch 16) with a log-normal prior on mu.

      Delete
  4. I must commend that this code has been helpful in carrying out my project analysis. But, when i try running the code it gives me error in jags.model and also error in muOfLogY. Please, what could have been wrong?

    ReplyDelete
    Replies
    1. I copied and pasted the R script from the blog post into RStudio and it ran without problem, so I do not know why it does not work for you. Conceivably, there is a font mismatch from your web browser to RStudio, so make sure that special symbols such as "~" are appearing properly in RStudio.

      Delete
    2. I did the same thing but the issue i have now is that it pops up an error message that 'unnamed variable in data list' on running the line jagsModel = jags.model( "model.txt" , data=dataList ,
      n.chains=nChains , n.adapt=adaptSteps). Kindly help on possible solutions to the problem

      Delete
  5. Thank you very much!
    I have a question. If I want to estimate the difference between two conditions, is it appropriate to do the comparison with the transformed data in the original-scale? Or should I stick with the log scale?

    Best regards

    ReplyDelete
  6. Marine Science StudentApril 5, 2019 at 1:06 PM

    Hi John, thanks for all your hard work explaining this material in a way that is comprehensible. I was curious about your choice for the prior on the Mu of Log Y as being normally distributed. You do state that "the prior on the mean and precision have to be appropriate for log(y), as opposed to y" but it is not clear why the normal distribution would be an appropriate selection for this prior. I am curious why the normal distribution would be more appropriate for this parameter vs. a uniform distribution (or other distributions)? I'd love to better understand when you have a minute. Thank you!

    ReplyDelete
  7. John,

    I stumbled on this post and it was incredibly useful. Working my way through your book. I'm no poet but hopefully you'll accept a silly limerick as thanks.

    There was an old guy named Chuck,
    With Bayes Rule he pressed his luck.
    Thank gawd for John's book,
    It was worth a look!
    The posterior saved was Chuck's

    ReplyDelete
    Replies
    1. 😄 Fabulous! Thanks for the limerick! I hope the book serves you well. Thanks again.

      Delete