The likelihood ratio (Law of Likelihood) provides a measure of the relative strength of evidence for competing models/hypotheses given the observed data. Richard Royall has championed this approach, e.g. this paper. A paper by Scott Glover & Peter Dixon (2004) nicely illustrated the use of likelihood ratios in regression and ANOVA. Likelihood ratios give much more flexibility than the usual null hypothesis testing approach. They allow you to ask very specific questions about your data.
Likelihood ratios near 1 indicate that insufficient data have been collected, while a ratio of 4 represents "weak" evidence, 8 is "fairly strong", 32 is "strong". There is no cut-off value like the .05 used in null hypothesis significance testing.
The likelihood ratio:
Since the likelihood for a model is inversely related to the residual (G&D paper gives the proof), we get the following:
Where R2 is the amount of variance explained by each model and N is the total number of observations. This gives:
The likelihood ratio (LR) can also be obtained from the Akaike information criterion (AIC) that is produced by R statistics for model comparison:
AIC = -2ln(L) + 2k
The likelihood ratio is simply available by:
Excel and RExcel Workbook (click here to download)
The Excel workbook here has 3 different sheets, each of which incorporate calls to R through RExcel. All cells are locked except those that are intended to be altered manually, but can be unprotected (from Review menu) without a password. Select automatic under Calculation Options so that any changes are automatically re-calculated.
Regression models
The first illustrates the example from the G&D paper on comparing a linear to a quadratic regression fit to the same data - see the plots below.
The data are similar, but not identical, to that used in the paper. The values in the X & Y data columns can be changed - which results in immediate recalculation of all the LRs, the plots & fits etc (this is the joy of using RExcel).
2-way ANOVA
The next sheet uses some data that I use in teaching 2-way ANOVA, the so-called "Beergoggles" example (the more drunk males are the more attractive they find others). The sheet calculates the LRs for specific hypotheses. The first hypothesis compares an interactive model (where we assume there is an interaction) with an additive model (main effects only). The LR of 123 is strongly in favour of an interactive model.
The second hypothesis compares a specific contrast. Contrasts are easily calculated using:
where n is the number in each ith condition, X̅ is the mean and c is the contrast coefficient for each condition. In the sheet the contrast consisting of the drunk male group (8.36) vs the remaining 5 conditions is compared with the full model (main effects + interaction). The LR value of 3.76 here represents weak evidence that the contrast is a better model than the full model. (Note, none of the values in this sheet can be altered, all cells locked).
t test and specified effect
The final sheet shows a t test where the means and SDs for the two groups can be chosen and random normally distributed data generated in the data (BP - diastolic blood pressure in hypertensive patients) column. Pressing F9 generates new random data, and new statistics automatically re-calculated. R calculates the p value in the ANOVA summary table, and LR is calculated for the model that includes a difference in conditions against the null model. The first λC is the LR in favour of the difference model, while its inverse below it is the LR in favour of the null model. A small p value will produce a small value for the latter.
Below this the λC is for a specific value for the drug effect (e.g. 86 mmHg) versus the null model. This value might be the expected drug effect or the minimal drug effect of clinical importance. Note that the two models have the same number of parameters (2: the mean and variance).
To the right of the sheet is shown the minimum Bayes factor and a range of posterior probabilities for the null according to prior odds. This will be explained in a future post.
Conclusion
The use of LRs is a flexible approach to assess the strength of evidence for one model over another. It is not subject to the same difficulties encountered by p values in null hypothesis significance testing, such as multiple testing, stopping rules, and planned vs unplanned tests. Finally, the LRs from different studies can be combined by simply multiplying them together.
References:
Glover, S., & Dixon, P. (2004). Likelihood ratios: A simple and flexible statistic for empirical psychologists. Psychonomic Bulletin & Review, 11(5), 791-806. or available at doi:10.3758/BF03196706
Likelihood ratios near 1 indicate that insufficient data have been collected, while a ratio of 4 represents "weak" evidence, 8 is "fairly strong", 32 is "strong". There is no cut-off value like the .05 used in null hypothesis significance testing.
The likelihood ratio:
Since the likelihood for a model is inversely related to the residual (G&D paper gives the proof), we get the following:
Where R2 is the amount of variance explained by each model and N is the total number of observations. This gives:
The likelihood ratio (LR) can also be obtained from the Akaike information criterion (AIC) that is produced by R statistics for model comparison:
AIC = -2ln(L) + 2k
The likelihood ratio is simply available by:
λ = exp((AIC2 – AIC1)/2)
The AIC method takes into account model complexity (number of parameters) while the R2 method doesn't. However, both methods require applying a correction for small samples. Note that at least 2 of the formulae given in the G&D paper for this correction have simple errors. For corrected LR using R2 see the G&D paper, and for AIC see http://en.wikipedia.org/wiki/Akaike_criterion under AICc.
The AIC method takes into account model complexity (number of parameters) while the R2 method doesn't. However, both methods require applying a correction for small samples. Note that at least 2 of the formulae given in the G&D paper for this correction have simple errors. For corrected LR using R2 see the G&D paper, and for AIC see http://en.wikipedia.org/wiki/Akaike_criterion under AICc.
Excel and RExcel Workbook (click here to download)
The Excel workbook here has 3 different sheets, each of which incorporate calls to R through RExcel. All cells are locked except those that are intended to be altered manually, but can be unprotected (from Review menu) without a password. Select automatic under Calculation Options so that any changes are automatically re-calculated.
Regression models
The first illustrates the example from the G&D paper on comparing a linear to a quadratic regression fit to the same data - see the plots below.
The data are similar, but not identical, to that used in the paper. The values in the X & Y data columns can be changed - which results in immediate recalculation of all the LRs, the plots & fits etc (this is the joy of using RExcel).
2-way ANOVA
The next sheet uses some data that I use in teaching 2-way ANOVA, the so-called "Beergoggles" example (the more drunk males are the more attractive they find others). The sheet calculates the LRs for specific hypotheses. The first hypothesis compares an interactive model (where we assume there is an interaction) with an additive model (main effects only). The LR of 123 is strongly in favour of an interactive model.
The second hypothesis compares a specific contrast. Contrasts are easily calculated using:
where n is the number in each ith condition, X̅ is the mean and c is the contrast coefficient for each condition. In the sheet the contrast consisting of the drunk male group (8.36) vs the remaining 5 conditions is compared with the full model (main effects + interaction). The LR value of 3.76 here represents weak evidence that the contrast is a better model than the full model. (Note, none of the values in this sheet can be altered, all cells locked).
t test and specified effect
The final sheet shows a t test where the means and SDs for the two groups can be chosen and random normally distributed data generated in the data (BP - diastolic blood pressure in hypertensive patients) column. Pressing F9 generates new random data, and new statistics automatically re-calculated. R calculates the p value in the ANOVA summary table, and LR is calculated for the model that includes a difference in conditions against the null model. The first λC is the LR in favour of the difference model, while its inverse below it is the LR in favour of the null model. A small p value will produce a small value for the latter.
Below this the λC is for a specific value for the drug effect (e.g. 86 mmHg) versus the null model. This value might be the expected drug effect or the minimal drug effect of clinical importance. Note that the two models have the same number of parameters (2: the mean and variance).
To the right of the sheet is shown the minimum Bayes factor and a range of posterior probabilities for the null according to prior odds. This will be explained in a future post.
Conclusion
The use of LRs is a flexible approach to assess the strength of evidence for one model over another. It is not subject to the same difficulties encountered by p values in null hypothesis significance testing, such as multiple testing, stopping rules, and planned vs unplanned tests. Finally, the LRs from different studies can be combined by simply multiplying them together.
References:
Glover, S., & Dixon, P. (2004). Likelihood ratios: A simple and flexible statistic for empirical psychologists. Psychonomic Bulletin & Review, 11(5), 791-806. or available at doi:10.3758/BF03196706
No comments:
Post a Comment