Thursday, 25 August 2011

Using RExcel for Likelihood Ratios

The likelihood ratio (Law of Likelihood) provides a measure of the relative strength of evidence for competing models/hypotheses given the observed data. Richard Royall has championed this approach, e.g. this paper. A paper by Scott Glover & Peter Dixon (2004) nicely illustrated the use of likelihood ratios in regression and ANOVA. Likelihood ratios give much more flexibility than the usual null hypothesis testing approach. They allow you to ask very specific questions about your data.

Likelihood ratios near 1 indicate that insufficient data have been collected, while a ratio of 4 represents "weak" evidence, 8 is "fairly strong", 32 is "strong". There is no cut-off value like the .05 used in null hypothesis significance testing.

The likelihood ratio:



Since the likelihood for a model is inversely related to the residual (G&D paper gives the proof), we get the following:





Where R
is the amount of variance explained by each model and N is the total number of observations. This gives:

The likelihood ratio (LR) can also be obtained from the Akaike information criterion (AIC) that is produced by R statistics for model comparison:

AIC = -2ln(L) + 2k

The likelihood ratio is simply available by:


λ = exp((AIC2 – AIC1)/2)

The AIC method takes into account model complexity (number of parameters) while the 
R2 method doesn't. However, both methods require applying a correction for small samples. Note that at least 2 of the formulae given in the G&D paper for this correction have simple errors. For corrected LR using R2 see the G&D paper, and for AIC see http://en.wikipedia.org/wiki/Akaike_criterion under AICc.


Excel and RExcel Workbook (click here to download)
The Excel workbook here has 3 different sheets, each of which incorporate calls to R through RExcel. All cells are locked except those that are intended to be altered manually, but can be unprotected (from Review menu) without a password. Select automatic under Calculation Options so that any changes are automatically re-calculated.

Regression models

The first illustrates the example from the G&D paper on comparing a linear to a quadratic regression fit to the same data - see the plots below.  




The data are similar, but not identical, to that used in the paper. The values in the X & Y data columns can be changed - which results in immediate recalculation of all the LRs, the plots & fits etc (this is the joy of using RExcel). 

2-way ANOVA

The next sheet uses some data that I use in teaching 2-way ANOVA, the so-called "Beergoggles" example (the more drunk males are the more attractive they find others). The sheet calculates the LRs for specific hypotheses. The first hypothesis compares an interactive model (where we assume there is an interaction) with an additive model (main effects only). The LR of 123 is strongly in favour of an interactive model. 

The second hypothesis compares a specific contrast. Contrasts are easily calculated using:



where n is the number in each ith condition, X̅ is the mean and c is the contrast coefficient for each condition. In the sheet the contrast consisting of the drunk male group (8.36) vs the remaining 5 conditions is compared with the full model (main effects + interaction). The LR value of 3.76 here represents weak evidence that the contrast is a better model than the full model. (Note, none of the values in this sheet can be altered, all cells locked).

t test and specified effect

The final sheet shows a t test where the means and SDs for the two groups can be chosen and random normally distributed data generated in the data (BP - diastolic blood pressure in hypertensive patients) column. Pressing F9 generates new random data, and new statistics automatically re-calculated. R calculates the p value in the ANOVA summary table, and LR is calculated for the model that includes a difference in conditions against the null model. The first λis the LR in favour of the difference model, while its inverse below it is the LR in favour of the null model. A small p value will produce a small value for the latter.

Below this the 
λC is for a specific value for the drug effect (e.g. 86 mmHg) versus the null model. This value might be the expected drug effect or the minimal drug effect of clinical importance. Note that the two models have the same number of parameters (2: the mean and variance). 

To the right of the sheet is shown the minimum Bayes factor and a range of posterior probabilities for the null according to prior odds. This will be explained in a future post.



Conclusion
The use of LRs is a flexible approach to assess the strength of evidence for one model over another. It is not subject to the same difficulties encountered by p values in null hypothesis significance testing, such as multiple testing, stopping rules, and planned vs unplanned tests. Finally, the LRs from different studies can be combined by simply multiplying them together. 


References:
Glover, S., & Dixon, P. (2004). Likelihood ratios: A simple and flexible statistic for empirical psychologists. Psychonomic Bulletin & Review, 11(5), 791-806. or available at doi:10.3758/BF03196706




Wednesday, 27 July 2011

Which side are you on?

Problems with orthodox statistics
Zoltan Dienes has written an interesting and stimulating paper in support of Bayesian inference over orthodox Neyman-Pearson/Fisherian inference. The paper is based upon his 2008 book (book reviewed by Baguley & Kaye (2010)). In his book Dienes had favoured the Likelihood approach, but in this paper he appears to have moved much further to supporting the Bayesian approach. Dienes writes well: and it's worth reading the book that fleshes out issues briefly mentioned in the paper.

The issue of over-reliance on p values contrasts with confidence interval and effect size approaches (see for example Ziliak & McCloskey (2008) and Meehl (1978)). Dienes attempts to show that orthodox inference is inconsistent & flawed, and that we have no choice but to do things differently. The issues run deep and are mired by a long history of controversy.

In his paper Dienes gives 3 research scenarios as follows, and I quote:

1. Stopping rule
You have run the 20 subjects you planned and have obtained a p value of .08. Despite predicting a difference, you know this won’t be convincing to any editor and run 20 more subjects. SPSS now gives a p of .01. Would you:
a) Submit the study with all 40 participants and report an overall p of .01?
b) Regard the study as nonsignificant at the 5% level and stop pursuing the effect in question, as each individual 20-subject study had a p of .08?
c) Use a method of evaluating evidence that is not sensitive to your intentions concerning when you planned to stop collecting subjects, and base conclusions on all the data?


2. Planned versus post hoc
After collecting data in a three-way design, you find an unexpected partial two-way interaction, specifically you obtain a two-way interaction (p = .03) for just the males and not the females. After talking to some colleagues and reading the literature, you realize there is a neat way of accounting for these results: Certain theories can be used to predict the interaction for the males but they say nothing about females. Would you:
a) Write up the introduction based on the theories leading to a planned contrast for the males, which is then significant?
b) Treat the partial two-way as nonsignificant, as the three-way interaction was not significant, and the partial interaction won’t survive corrections for post hoc testing?
c) Determine how strong the evidence of the partial two-way interaction is for the theory you put together to explain it, with no regard to whether you happen to think of the theory before seeing the data or afterwards, as all sorts of arbitrary factors could influence when you thought of a theory?


3. Multiple testing
You explore five possible ways of inducing subliminal perception as measured with priming. Each method interferes with vision in a different way. The test for each method has a power of 80% for a 5% significance level to detect the size of priming produced by conscious perception. Of these methods, the results for four are nonsignificant and one, Continuous Flash Suppression, is significant, p = .03, with a priming effect numerically similar in size to that found with conscious perception. Would you:
a) Report the test as p = .03 and conclude there is subliminal perception for this method?
b) Note that all tests are nonsignificant when a Bonferroni-corrected significance value of .05/5 is used, and conclude that subliminal perception does not exist by any of these methods?
c) Regard the strength of evidence provided by these data for subliminal perception produced by Continuous Flash Suppression to be the same regardless of whether or not four other rather different methods were tested?

Most active researchers will have personally encountered these scenarios. To each of the 3 scenarios: choosing option a) is the temptation that many researchers succumb to, and is wrong. Choosing b) is what you should do if you adopt a disciplined Neyman-Pearson approach (as he says "maybe your conscience told you so"). Finally choosing c) appears to be the most desirable and can only be implemented using a Bayesian approach.

The Likelihood
It is claimed that the orthodox answers (b) violate the likelihood principle and hence the axioms of probability. Well, what is the likelihood principle? It is: "All the information relevant to inference contained in data is provided by the likelihood."

The likelihood ratio which represents the relative evidence of one theory over another has been elevated to the "Law of likelihood", as coined by Hacking (1965) (a difficult book, not recommended). Basically then, this states that the actual data obtained most strongly supports the theory that predicted it. The strong likelihood principle can be applied to sequential experiments (e.g. Armitage et al (2002), p615).

The likelihood is a relative probability, and differs from the p value. The p value is the probability of obtaining the data or more extreme data, given a (null) hypothesis and a decision procedure. It is an area under the probability curve. In contrast, the likelihood is the height of the probability curve at the point representing the obtained data. The strength of evidence (the likelihood) can be separated from the probability of obtaining the evidence (the relative costings of the two types of errors, α and β, in the Neyman-Pearson formulation). The p value in a statistical test combines and confuses the strength of evidence and the probability of obtaining that evidence - which is why we run into problems in the 3 scenarios above.

Bayesian approach using the Bayes factor
Bayesian methods aim to determine the probability of a theory given the data. The posterior odds is equal to the likelihood ratio multiplied by the the prior odds (the odds of the theory before data is collected). Rather than calculate the posterior odds, Dienes recommends calculating the Bayes factor. This is the ratio of likelihoods for the theory and the null (how much the data supports your theory versus the null). According to Jeffreys (1961) a Bayes factor greater than 3 or less than 1/3 indicates substantial evidence for or against a theory (a Bayes factor of unity represents no evidence either way and usually indicates low sensitivity - i.e. not enough data). With "moderate numbers" a Bayes factor of 3 corresponds roughly to the conventional 5% significance level. Effect size and shape of the prior distribution (typically uniform or normal) need to be considered in order to calculate the Bayes factor. Explicitly using effect sizes forces the researcher to think about and engage with the data in their research domain. As Dienes (2011) says: "People will think more carefully about theoretical mechanisms so as to link experiments to past research to predict relevant effect sizes. Results sections will become focused, concise, and more persuasive."

Extra data can be added at any time where the probability of producing a Bayes factor that is weak (close to 1). Misleading (wrong direction) Bayes factors decrease the more data there is. This is not true for p values or for the probability of making a Type I error. When the null hypothesis is true then p values are not driven in any direction (Type I error remains at 5%) as sample size increases, while the Bayes factor is driven to 0.

Conclusion
Dienes has written a persuasive paper. The recommended Bayesian approach should be explored and tried out by researchers, and is easier than previous suggestions (e.g. Berger, 2003). It will be interesting to see how successful authors are at getting it past journal reviewers. The Bayesian approach is intuitively appealing: it is much easier to think about a ratio of likelihoods for competing theories than it is to appreciate the probability of obtaining extreme data assuming the null hypothesis is true. One problem with Bayesian probability/factor calculation is that they currently appear awkward and complicated. Hopefully this will change in time and encourage researchers to think more about their data and the theories they support.

Addendum
Another interesting approach suggested by Goodman (1999) in which the minimum Bayes Factor for the null hypothesis is calculated directly from z, t or χ2. This can be compared with the conventional p value, but more than that it informs us of the maximum decrease in the prior probability of a null hypothesis.

How to calculate a simple Bayes factor
Matlab code is provided at Dienes' website and is translated into R by Baguley & Kaye (2010) also available from Kaye's website. I reproduce it here formatted (and also because copying directly from the Baguley & Kaye (2010) paper didn't work for me). Here first is an Excel file containing 3 worksheets:
1) VBA macro with graphic, enter values into yellow cells and click on Calculate button. The graphic shows the position of the obtained mean, the null likelihood, the theory probability distribution (not taking into account the obtained data), and the product between the theory distribution and data likelihood. This latter product does not represent a probability density as such (but is plotted anyway and looks like the (diminutive) posterior probability distribution). The product values are integrated to provide the theory likelihood, p(D | Theory) - thanks to Zoltan Dienes for clarifying these issues (hoping I've got it right this time!)
2) RExcel where values for statistics & parameters can be entered in the yellow cells, and outputs automatically re-calculated by Excel (have it on Automatic calculation option. Follow the instructions in the green cells.
3) a sheet illustrating Goodman's minimum Bayes Factor.

All sheets are locked but can easily be unlocked without entering a password.



In the RExcel worksheet: After installing R and RExcel (Randfriends is the easiest way) you need to do the following:
1. Start R in Add-Ins RExcel menu
2. Select blue cell, remove ' (apostrophe) then press F2 and then Enter (this runs the function)
3. Select the 3 blue cells and in formula bar remove ' then Ctrl + Shift + Enter


Else the following code can be run in R directly:

Bf <- function(sd, obtained, uniform, lower = 0, upper = 1,
  meanoftheory = 0, sdtheory = 1, tail = 2)
{
#test data can be found starting at p100 of Dienes (2008) adapted from Baguley & Kaye (2010)
#
area <- 0
if (identical (uniform, 1)){
  theta <- lower
  range <- upper-lower
  incr <- range/2000
  for (A in -1000:1000) {
    theta <- theta + incr
    dist_theta <- 1/range
    height <- dist_theta * dnorm(obtained, theta, sd)
    area <- area + height * incr
    }
  } else {
    theta <- meanoftheory-5 * sdtheory
    incr <- sdtheory/200
    for (A in -1000:1000) {
      theta <- theta + incr
      dist_theta <- dnorm(theta, meanoftheory, sdtheory)
      if (identical(tail, 1)) {
        if (theta <= 0){
        dist_theta <- 0
        } else {
      dist_theta <- dist_theta * 2
      }
    }
    height <- dist_theta * dnorm(obtained, theta, sd)
    area <- area + height * incr
  }
}
LikelihoodTheory <- area
Likelihoodnull <- dnorm(obtained, 0, sd)
BayesFactor <- LikelihoodTheory/Likelihoodnull
ret <- list("LikelihoodTheory" = LikelihoodTheory,
"Likelihoodnull" = Likelihoodnull, "BayesFactor" = BayesFactor)
ret
}

Then the following commands can be issued:

Bf(1.09,-2.8,1,lower=-600,upper=0)
giving:
$LikelihoodTheory = 0:00166435
$Likelihoodnull = 0:01350761
$BayesFactor = 0:1232157

Bf(1.09,-2.8,0,meanoftheory=0,sdtheory=2,tail=1)
giving:
$LikelihoodTheory = 0:001955743
$Likelihoodnull = 0:01350761
$BayesFactor = 0:1447882

Bf(1.09,-2.8,0,meanoftheory=0,sdtheory=2,tail= 2)
giving:
$LikelihoodTheory = 0:08227421
$Likelihoodnull = 0:01350761
$BayesFactor = 6:090951

References:

Armitage, P., Berry, G. & Matthews, J.N.S. (2002) Statistical Methods in Medical Research. Oxford: Blackwell Science.

Baguley, T. & Kaye, D. (2010) Book review of Dienes' book, British Journal of Mathematical and Statistical Psychology, 63: 695–698. Available here: http://bit.ly/kq7iXh and R code available at: http://www.danny-kaye.co.uk/Docs/Dienes_functions.txt

Berger, J.O. (2003) Could Fisher, Jeffreys and Neyman Have Agreed on Testing? Statistical Science, 18(1): 1-32.

Dienes, Z. (2008) Understanding psychology as a science: An introduction to scientific and statistical inference, Palgrave Macmillan, Paperback, ISBN: 978-0-230-54231-0

Dienes, Z. (2011) Bayesian Versus Orthodox Statistics: Which Side Are You On? Perspectives on Psychological Science, 6(3): 274–290.

Goodman, S.N. (1999) Toward Evidence-Based Medical Statistics. 2: The Bayes Factor, Annals of Internal Medicine, 130(12):1005-13

Hacking, I. (1965) Logic of statistical inference, Cambridge University Press.

Jeffreys, H. (1961) The Theory of Probability, Oxford University Press.

Meehl, P. E. (1978) Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. Journal of Consulting and Clinical Psychology, 46(4): 806-834.

Ziliak, S.T., & McCloskey, D.N. (2008) The cult of statistical significance: How the standard error cost us jobs, justice and lives, Ann Arbor: University of Michigan Press.

Sunday, 6 March 2011

Matrix plot with confidence intervals for r values

The Iris data.
I like this way of plotting bivariate data, where histograms of the variables are given, and for the bivariate plots correlations with their 95% confidence intervals are displayed. The plots have LOESS curves.

























Here is the R code:

## put histograms on the diagonal
panel.hist <- function(x, ...)
{
    usr <- par("usr"); on.exit(par(usr))
    par(usr = c(usr[1:2], 0, 1.5) )
    h <- hist(x, plot = FALSE)
    breaks <- h$breaks; nB <- length(breaks)
    y <- h$counts; y <- y/max(y)
rect(breaks[-nB], 0, breaks[-1], y, col="lavender", ...)
}
## put correlations & 95% CIs on the upper panels,
panel.cor <- function(x, y, digits=2, prefix="", cex.cor, ...)
{
    usr <- par("usr"); on.exit(par(usr))
    par(usr = c(0, 1, 0, 1))
    r <- cor(x, y,use="complete.obs")
    txt <- format(c(r, 0.123456789), digits=digits)[1]
    prefix <- "r = "
    rc <- cor.test(x,y)
    rci <- rc$conf.int
    txt2 <- format(c(rci, 0.123456789), digits=digits)[1]
    txt3 <- format(c(rci, 0.123456789), digits=digits)[2]
    prefix2 <- "\nCI = "
    txt <- paste(prefix, txt, prefix2, txt2, ", ", txt3, sep="")
    if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt)
    text(0.5, 0.5, txt, cex = 1)
}
pairs(iris[1:4], lower.panel=panel.smooth, cex = .8, pch = 21, bg="steelblue",
       diag.panel=panel.hist, cex.labels = 1.2, font.labels=2, upper.panel=panel.cor)

Wednesday, 16 February 2011

This is the R code used to produce the Motion Chart:

install.packages('googleVis')
library(RJSONIO)
library(googleVis)
M <- gvisMotionChart(Smoking, idvar="Disorder", timevar="Age", options=list(state='{\"showTrails\":false};'))
plot(M)
cat(M$html$chart, file="tmp.html")    #this produces html for pasting into blog




The table below is taken from Figure 4 of Woloshin et al (2008)

Grey shading means fewer than 1 death per 1000 women. All ratios are >= 0.75 (grey shaded data gave ratios of 1).




Lung


Vascular
Disease
Cancer

Infection

Disease

Age
Smoking status 
Heart

Lung
Breast
Colon
Ovarian
Cervical



All

Disease
Stroke
Cancer
Cancer
Cancer
Cancer
Cancer
Pneumonia
Flu
AIDS
COPD
Accidents
Combined
35
Never smoker
1


1





1

2
14
Smoker
1
1
1
1





1

2
14
40
Never smoker
1


2
1




1

2
19
Smoker
4
2
4
2





1
1
2
27
45
Never smoker
2
1
1
3
1
1



1

2
25
Smoker
9
3
7
3
1
1

1

1
2
2
45
50
Never smoker
4
1
1
4
1
1





2
37
Smoker
13
5
14
4
1
1

1


4
2
69
55
Never smoker
8
2
2
6
2
2
1
1


1
2
55
Smoker
20
6
26
5
2
2
1
1


9
2
110
60
Never smoker
14
4
3
7
3
3
1
1


2
2
84
Smoker
31
8
41
6
3
3
1
2


18
2
167
65
Never smoker
25
7
5
8
5
4
1
2


3
3
131
Smoker
45
15
55
7
5
3
1
4


31
3
241
70
Never smoker
46
14
7
9
7
4
1
4


5
4
207
Smoker
66
25
61
8
6
4
1
7


44
4
335
75
Never smoker
86
30
7
10
10
5
1
8


6
7
335
Smoker
99
34
58
10
9
4

14


61
7
463




Here is the data (Smoking) based on the table and used by the R code:

DisorderAgeTypeN_per_1000Ratio
Heart Disease2035Vascular disease11.000
Heart Disease2040Vascular disease44.000
Heart Disease2045Vascular disease94.500
Heart Disease2050Vascular disease133.250
Heart Disease2055Vascular disease202.500
Heart Disease2060Vascular disease312.214
Heart Disease2065Vascular disease451.800
Heart Disease2070Vascular disease661.435
Heart Disease2075Vascular disease991.151
Stroke2035Vascular disease11.000
Stroke2040Vascular disease21.000
Stroke2045Vascular disease33.000
Stroke2050Vascular disease55.000
Stroke2055Vascular disease63.000
Stroke2060Vascular disease82.000
Stroke2065Vascular disease152.143
Stroke2070Vascular disease251.786
Stroke2075Vascular disease341.133
Lung Cancer2035Cancer11.000
Lung Cancer2040Cancer41.000
Lung Cancer2045Cancer77.000
Lung Cancer2050Cancer1414.000
Lung Cancer2055Cancer2613.000
Lung Cancer2060Cancer4113.667
Lung Cancer2065Cancer5511.000
Lung Cancer2070Cancer618.714
Lung Cancer2075Cancer588.286
Breast Cancer2035Cancer11.000
Breast Cancer2040Cancer21.000
Breast Cancer2045Cancer31.000
Breast Cancer2050Cancer41.000
Breast Cancer2055Cancer50.833
Breast Cancer2060Cancer60.857
Breast Cancer2065Cancer70.875
Breast Cancer2070Cancer80.889
Breast Cancer2075Cancer101.000
Colon Cancer2035Cancer01.000
Colon Cancer2040Cancer01.000
Colon Cancer2045Cancer11.000
Colon Cancer2050Cancer11.000
Colon Cancer2055Cancer21.000
Colon Cancer2060Cancer31.000
Colon Cancer2065Cancer51.000
Colon Cancer2070Cancer60.857
Colon Cancer2075Cancer90.900
Ovarian Cancer2035Cancer01.000
Ovarian Cancer2040Cancer01.000
Ovarian Cancer2045Cancer11.000
Ovarian Cancer2050Cancer11.000
Ovarian Cancer2055Cancer21.000
Ovarian Cancer2060Cancer31.000
Ovarian Cancer2065Cancer30.750
Ovarian Cancer2070Cancer41.000
Ovarian Cancer2075Cancer40.800
Cervical Cancer2035Cancer01.000
Cervical Cancer2040Cancer01.000
Cervical Cancer2045Cancer01.000
Cervical Cancer2050Cancer01.000
Cervical Cancer2055Cancer11.000
Cervical Cancer2060Cancer11.000
Cervical Cancer2065Cancer11.000
Cervical Cancer2070Cancer11.000
Cervical Cancer2075Cancer01.000
Pneumonia2035Infection01.000
Pneumonia2040Infection01.000
Pneumonia2045Infection11.000
Pneumonia2050Infection11.000
Pneumonia2055Infection11.000
Pneumonia2060Infection22.000
Pneumonia2065Infection42.000
Pneumonia2070Infection71.750
Pneumonia2075Infection141.750
AIDS2035Infection11.000
AIDS2040Infection11.000
AIDS2045Infection11.000
AIDS2050Infection01.000
AIDS2055Infection01.000
AIDS2060Infection01.000
AIDS2065Infection01.000
AIDS2070Infection01.000
AIDS2075Infection01.000
COPD2035Lung disease01.000
COPD2040Lung disease11.000
COPD2045Lung disease21.000
COPD2050Lung disease41.000
COPD2055Lung disease99.000
COPD2060Lung disease189.000
COPD2065Lung disease3110.333
COPD2070Lung disease448.800
COPD2075Lung disease6110.167
Accidents2035Accidents21.000
Accidents2040Accidents21.000
Accidents2045Accidents21.000
Accidents2050Accidents21.000
Accidents2055Accidents21.000
Accidents2060Accidents21.000
Accidents2065Accidents31.000
Accidents2070Accidents41.000
Accidents2075Accidents71.000