empirical.Rd
Function to simulate empirically-derived null distributions of various methods for combining \(p\)-values using pseudo replicates.
empirical(R, method, side = 2, size = 10000, batchsize, ...)
a \(k \times k\) symmetric matrix that contains the correlations among the test statistics.
character string to specify for which method to simulate the null distribution (either "fisher"
, "stouffer"
, "invchisq"
, "binomtest"
, "bonferroni"
, or "tippett"
).
scalar to specify the sidedness of the \(p\)-values that are used to simulate the null distribution (2, by default, for two-sided tests; 1 for one-sided tests).
size of the empirically-derived null distribution that should be generated.
optional scalar to specify the batch size for generating the null distribution. When unspecified (the default), this is done in a single batch.
other arguments.
This function simulates the null distribution of a particular method for combining \(p\)-values when the test statistics that generate the \(p\)-values to be combined can be assumed to follow a multivariate normal distribution and a matrix is available that reflects the correlations among the test statistics (which is specified via the R
argument). In this case, test statistics are repeatedly simulated from a multivariate normal distribution under the joint null hypothesis, converted into one- or two-sided \(p\)-values (depending on the side
argument), and the chosen method is applied. Repeating this process size
times yields the null distribution.
If batchsize
is unspecified, the null distribution is simulated in a single batch, which requires temporarily storing a matrix with dimensions [size,k]
. When size*k
is large, allocating the memory for this matrix might not be possible. Instead, one can specify a batchsize
value, in which case a matrix with dimensions [batchsize,k]
is repeatedly simulated until the desired size of the null distribution has been obtained.
A vector of combined \(p\)-values as simulated under the joint null hypothesis for a given method.
The R
matrix must be positive definite. If it is not, the function uses nearPD
to find the nearest positive definite matrix (Higham, 2002) before simulating the null distribution.
Cinar, O. & Viechtbauer, W. (2022). The poolr package for combining independent and dependent p values. Journal of Statistical Software, 101(1), 1–42. https://doi.org/10.18637/jss.v101.i01
Higham, N. J. (2002). Computing the nearest correlation matrix: A problem from finance. IMA Journal of Numerical Analysis, 22(3), 329–343.
# create an example correlation matrix with constant positive correlations
R <- matrix(0.6, nrow = 10, ncol = 10)
diag(R) <- 1
# generate null distribution for Fisher's method (setting the seed for reproducibility)
set.seed(1234)
psim <- empirical(R, method = "fisher")
# Fisher's method is liberal in this scenario (i.e., its actual Type I error
# rate is around .14 instead of the nominal significance level of .05)
mean(psim <= .05)
#> [1] 0.142
# estimate the actual Type I error rate of the other methods in this scenario
psim <- empirical(R, method = "stouffer")
mean(psim <= .05)
#> [1] 0.1734
psim <- empirical(R, method = "invchisq")
mean(psim <= .05)
#> [1] 0.1369
psim <- empirical(R, method = "binomtest")
mean(psim <= .05)
#> [1] 0.0584
psim <- empirical(R, method = "bonferroni")
mean(psim <= .05)
#> [1] 0.0327
psim <- empirical(R, method = "tippett")
mean(psim <= .05)
#> [1] 0.0346
# Stouffer's and the inverse chi-square method also have clearly inflated
# Type I error rates and the binomial test just barely. As expected, the
# Bonferroni method is overly conservative and so is Tippett's method.