Simulate Empirically-Derived Null Distributions

Function to simulate empirically-derived null distributions of various methods for combining \(p\)-values using pseudo replicates.

empirical(R, method, side = 2, size = 10000, batchsize, ...)

Arguments

R: a \(k \times k\) symmetric matrix that contains the correlations among the test statistics.
method: character string to specify for which method to simulate the null distribution (either "fisher", "stouffer", "invchisq", "binomtest", "bonferroni", or "tippett").
side: scalar to specify the sidedness of the \(p\)-values that are used to simulate the null distribution (2, by default, for two-sided tests; 1 for one-sided tests).
size: size of the empirically-derived null distribution that should be generated.
batchsize: optional scalar to specify the batch size for generating the null distribution. When unspecified (the default), this is done in a single batch.
...: other arguments.

Details

This function simulates the null distribution of a particular method for combining \(p\)-values when the test statistics that generate the \(p\)-values to be combined can be assumed to follow a multivariate normal distribution and a matrix is available that reflects the correlations among the test statistics (which is specified via the R argument). In this case, test statistics are repeatedly simulated from a multivariate normal distribution under the joint null hypothesis, converted into one- or two-sided \(p\)-values (depending on the side argument), and the chosen method is applied. Repeating this process size times yields the null distribution.

If batchsize is unspecified, the null distribution is simulated in a single batch, which requires temporarily storing a matrix with dimensions [size,k]. When size*k is large, allocating the memory for this matrix might not be possible. Instead, one can specify a batchsize value, in which case a matrix with dimensions [batchsize,k] is repeatedly simulated until the desired size of the null distribution has been obtained.

Value

A vector of combined \(p\)-values as simulated under the joint null hypothesis for a given method.

Note

The R matrix must be positive definite. If it is not, the function uses nearPD to find the nearest positive definite matrix (Higham, 2002) before simulating the null distribution.

Author

Ozan Cinar ozancinar86@gmail.com
Wolfgang Viechtbauer wvb@wvbauer.com

References

Cinar, O. & Viechtbauer, W. (2022). The poolr package for combining independent and dependent p values. Journal of Statistical Software, 101(1), 1–42. https://doi.org/10.18637/jss.v101.i01

Higham, N. J. (2002). Computing the nearest correlation matrix: A problem from finance. IMA Journal of Numerical Analysis, 22(3), 329–343. https://doi.org/10.1093/imanum/22.3.329

Examples

# create an example correlation matrix with constant positive correlations
R <- matrix(0.6, nrow = 10, ncol = 10)
diag(R) <- 1

# generate null distribution for Fisher's method (setting the seed for reproducibility)
set.seed(1234)
psim <- empirical(R, method = "fisher")

# Fisher's method is liberal in this scenario (i.e., its actual Type I error
# rate is around .14 instead of the nominal significance level of .05)
mean(psim <= .05)
#> [1] 0.142

# estimate the actual Type I error rate of the other methods in this scenario
psim <- empirical(R, method = "stouffer")
mean(psim <= .05)
#> [1] 0.1734
psim <- empirical(R, method = "invchisq")
mean(psim <= .05)
#> [1] 0.1369
psim <- empirical(R, method = "binomtest")
mean(psim <= .05)
#> [1] 0.0584
psim <- empirical(R, method = "bonferroni")
mean(psim <= .05)
#> [1] 0.0327
psim <- empirical(R, method = "tippett")
mean(psim <= .05)
#> [1] 0.0346

# Stouffer's and the inverse chi-square method also have clearly inflated
# Type I error rates and the binomial test just barely. As expected, the
# Bonferroni method is overly conservative and so is Tippett's method.