Monday, March 23, 2015

The impact of outliers on the arithmetic mean (or, do people like this book?)

Consider these ratings of a target item (1 to 5 stars):
Based on these ratings, what is your impression of the item? Kinda so-so? Maybe look elsewhere? That's the power of outliers on the arithmetic mean: A few outliers can really pull a mean away from the bulk of the responses. It takes a ton of ratings in the mode to counteract only a few outliers.

These are real data, of course, namely from DBDA2E on The 1-star ratings have comments that clearly state that they are not rating the content of the book, but still they are 1-star ratings that have a lot of impact on the mean. If you think the mode needs bulking up, you know what to do! :-) And if you have had issues like the 1-star raters have had, please let me know so we can attempt to rectify any problems. (By the way, go here for a link to a discount on the book.)

In general, how can we analyze data that have outliers? One way is describing the data by using a heavy-tailed distribution, which DBDA2E explains extensively in Chapters 16 and 17 (and ordinal data analysis is treated in Chapter 23).

BTW, here's the R code I used for making the graph:

x = c(1,2,3,4,5)
y = c(2,0,0,2,8)
plot( x , y , type="h" , lwd=70 , lend=1 , col="gold" , xlab="Stars" , ylab="Frequency" , main="Ratings" , xlim=c(0.5,5.5) , ylim=c(0,9) , cex.lab=1.5 , cex.main=1.5 )
text( sum(x*y)/sum(y) , max(y) , bquote(mean==.(round(sum(x*y)/sum(y),2))) , adj=c(1,1) , cex=1.5 )

Wednesday, February 25, 2015

"Is the call to abandon p-values the red herring of the replicability crisis?"

In an opinion article [here] titled "Is the call to abandon p-values the red herring of the replicability crisis?", Victoria Savalei and Elizabeth Dunn concluded, "at present we lack empirical evidence that encouraging researchers to abandon p-values will fundamentally change the credibility and replicability of psychological research in practice. In the face of crisis, researchers should return to their core, shared value by demanding rigorous empirical evidence before instituting major changes."

I posted a comment which said in part, "people have been promoting a transition away from null hypothesis significance testing to Bayesian methods for decades, long before the recent replicability crisis made headlines. The main reasons to switch to Bayesian have little directly to do with the replicability crisis." Moreover, "It is important for readers not to think that Bayesian analysis merely amounts to using Bayes factors for hypothesis testing instead of using p values for hypothesis testing. In fact, the larger part of Bayesian analysis is a rich framework for estimating the magnitudes of parameters (such as effect size) and their uncertainties. Bayesian methods are also rich tools for meta-analysis and cumulative analysis. Therefore, Bayesian methods achieve all the goals of the New Statistics (Cumming, 2014) but without using p values and confidence intervals."

See the full article and comment at the link above.

Monday, February 23, 2015

Journal bans null hypothesis significance tests

In a recent editorial [here], the journal Basic and Applied Social Psychology has banned the null hypothesis significance testing procedure (NHSTP). "We hope and anticipate that banning the NHSTP will have the effect of increasing the quality of submitted manuscripts by liberating authors from the stultified structure of NHSTP thinking thereby eliminating an important obstacle to creative thinking. The NHSTP has dominated psychology for decades; we hope that by instituting the first NHSTP ban, we demonstrate that psychology does not need the crutch of the NHSTP, and that other journals follow suit."

In a short bit about Bayesian analysis, the editorial says, "The usual problem with Bayesian procedures is that they depend on some sort of Laplacian assumption to generate numbers where none exist." I think here the editors are too focused on Bayesian hypothesis tests instead of on the much broader application of Bayesian methods to parameter estimation. For example, in the 750 pages of DBDA2E, I never mention the Laplacian assumption because the procedures do not depend on it. Despite their narrow view of Bayesian methods, I am encouraged by the bold move that might help dislodge NHST.

Sunday, February 8, 2015

I've got variable Y that I want to predict from variables X1, X2, etc. What should I do?

For questions like yours -- I've got variable Y that I want to predict from variables X1, X2, etc.; What should I do? -- the best answer is usually informed by background knowledge of the domain. Generic models, like multiple linear regression, don't always make the most meaningful answer.

For example, suppose you're trying to predict the amount of fencing (Y) you'll need for rectangular lots of length X1 and width X2. Then a linear regression would serve you well. Why? Because we know (from background knowledge) that perimeter is a linear function of length and width.

But suppose you're trying to predict how much grass seed you'll need for the same lot. Then you'd want a model that includes the multiplicative product of X1 and X2, because that provides the area of the lot.

As another example, suppose you're trying to predict the installed length of a piece of pipe (Y) as a function of the date (X). You know that pipe expands and contracts as some function of temperature. And you also know that temperature cycles sinusoidally (across the seasons of a year) as a function of date. So, to predict pipe length as function of date, you'd use some trend that incorporates the expansion function on top of a sinusoidal function of date.

Whatever model you end up wanting, it can probably be implemented in JAGS (or BUGS or Stan). That's one of the beauties of the Bayesian approach with its general purpose MCMC software.

Monday, January 26, 2015

Institutionalized publication thresholds, p values, and XKCD

XKCD today is about p values (see image at right). I think that what XKCD is pointing out is not so much a problem with p values as with strongly institutionalized publication thresholds and the ritual of mindless statistics, as Gigerenzer would say. The same problem could arise with strongly institutionalized publication thresholds for Bayes factors, or even for HDI-and-ROPEs. One thing that's nice about the HDI-and-ROPE approach is that it's explicitly about magnitude and uncertainty, to help nudge thinking away from mindless decision thresholds.

(Thank you to Kevin J. McCann for pointing me to XKCD today.)

P.S. added 30-January-2015: Gigerenzer has a new article, extending the one linked above to Bayes factors. Links: and Surrogate Science: The Idol of a Universal Method for Scientific Inference. Gerd Gigerenzer and Julian N. Marewski. Journal of Management, Vol. 41, No. 2, February 2015, 421–440.
In this article, we make three points.
1. There is no universal method of scientific inference but, rather, a toolbox of useful statistical methods. In the absence of a universal method, its followers worship surrogate idols, such as significant p values. The inevitable gap between the ideal and its surrogate is bridged with delusions—for instance, that a p value of 1% indicates a 99% chance of replication. These mistaken beliefs do much harm: among others, by promoting irreproducible results.
2. If the proclaimed “Bayesian revolution” were to take place, the danger is that the idol of a universal method might survive in a new guise, proclaiming that all uncertainty can be reduced to subjective probabilities. And the automatic calculation of significance levels could be revived by similar routines for Bayes factors. That would turn the revolution into a re-volution— back to square one.
These first two points are not “philosophical” but have very practical consequences, because
3. Statistical methods are not simply applied to a discipline; they change the discipline itself, and vice versa. In the social sciences, statistical tools have changed the nature of research, making inference its major concern and degrading replication, the minimization of measurement error, and other core values to secondary importance.