Function to carry out Stouffer's method.

stouffer(p, adjust = "none", R, m,
         size = 10000, threshold, side = 2, batchsize, nearpd = TRUE, ...)

Arguments

p

vector of length \(k\) with the (one- or two-sided) p-values to be combined.

adjust

character string to specify an adjustment method to account for dependence. The default is "none", in which case no adjustment is applied. Methods "nyholt", "liji", "gao", or "galwey" are adjustments based on an estimate of the effective number of tests (see meff). Adjustment method "empirical" uses an empirically-derived null distribution using pseudo replicates. Finally, method "generalized" uses a generalization of Stouffer's method based on multivariate theory. See ‘Details’.

R

a \(k \times k\) symmetric matrix that reflects the dependence structure among the tests. Must be specified if adjust is set to something other than "none". See ‘Details’.

m

optional scalar (between 1 and \(k\)) to manually specify the effective number of tests (instead of estimating it via one of the methods described above).

size

size of the empirically-derived null distribution. Can also be a numeric vector of sizes, in which case a stepwise algorithm is used. This (and the following arguments) are only relevant when adjust = "empirical".

threshold

numeric vector to specify the significance thresholds for the stepwise algorithm (only relevant when size is a vector).

side

scalar to specify the sidedness of the \(p\)-values that are used to simulate the null distribution (2, by default, for two-sided tests; 1 for one-sided tests).

batchsize

optional scalar to specify the batch size for generating the null distribution. When unspecified (the default), this is done in a single batch.

nearpd

logical indicating if a negative definite R matrix should be turned into the nearest positive definite matrix (only relevant when adjust = "empirical" or adjust = "generalized").

...

other arguments.

Details

Stouffer's Method

By default (i.e., when adjust = "none"), the function applies Stouffer's method to the \(p\)-values (Stouffer et al., 1949). Letting \(p_1, p_2, \ldots, p_k\) denote the individual (one- or two-sided) \(p\)-values of the \(k\) hypothesis tests to be combined, the test statistic is then computed with \[z = \sum_{i = 1}^k z_i / \sqrt{k}\] where \(z_i = \Phi^{-1}(1 - p_i)\) and \(\Phi^{-1}(\cdot)\) denotes the inverse of the cumulative distribution function of a standard normal distribution. Under the joint null hypothesis, the test statistic follows a standard normal distibution which is used to compute the combined \(p\)-value.

Stouffer's method assumes that the \(p\)-values to be combined are independent. If this is not the case, the method can either be conservative (not reject often enough) or liberal (reject too often), depending on the dependence structure among the tests. In this case, one can adjust the method to account for such dependence (to bring the Type I error rate closer to some desired nominal significance level).

Adjustment Based on the Effective Number of Tests

When adjust is set to "nyholt", "liji", "gao" or "galwey", Stouffer's method is adjusted based on an estimate of the effective number of tests (see meff for details on these methods for estimating the effective number of tests). In this case, argument R needs to be set to a matrix that reflects the dependence structure among the tests.

There is no general solution for constructing such a matrix, as this depends on the type of test that generated the \(p\)-values and the sidedness of these tests. If the \(p\)-values are obtained from tests whose test statistics can be assumed to follow a multivariate normal distribution and a matrix is available that reflects the correlations among the test statistics, then the mvnconv function can be used to convert this correlation matrix into the correlations among the (one- or two-sided) \(p\)-values, which can then be passed to the R argument. See ‘Examples’.

Once the effective number of tests, \(m\), is estimated based on R using one of the four methods described above, the test statistic of Stouffer's method can be modified with \[\tilde{z} = \sqrt{\frac{m}{k}} \times z\] which is then assumed to follow a standard normal distibution.

Alternatively, one can also directly specify the effective number of tests via the m argument (e.g., if some other method not implemented in the poolr package is used to estimate the effective number of tests). Argument R is then irrelevant and doesn't need to be specified.

Adjustment Based on an Empirically-Derived Null Distribution

When adjust = "empirical", the combined \(p\)-value is computed based on an empirically-derived null distribution using pseudo replicates (using the empirical function). This is appropriate if the test statistics that generated the \(p\)-values to be combined can be assumed to follow a multivariate normal distribution and a matrix is available that reflects the correlations among the test statistics (which is specified via the R argument). In this case, test statistics are repeatedly simulated from a multivariate normal distribution under the joint null hypothesis, converted into one- or two-sided \(p\)-values (depending on the side argument), and Stouffer's method is applied. Repeating this process size times yields a null distribution based on which the combined \(p\)-value can be computed, or more precisely, estimated, since repeated applications of this method will yield (slightly) different results. To obtain a stable estimate of the combined \(p\)-value, size should be set to a large value (the default is 10000, but this can be increased for a more precise estimate). If we consider the combined \(p\)-value an estimate of the ‘true’ combined \(p\)-value that would be obtained for a null distribution of infinite size, we can also construct a 95% (pseudo) confidence interval based on a binomial distribution.

If batchsize is unspecified, the null distribution is simulated in a single batch, which requires temporarily storing a matrix with dimensions [size,k]. When size*k is large, allocating the memory for this matrix might not be possible. Instead, one can specify a batchsize value, in which case a matrix with dimensions [batchsize,k] is repeatedly simulated until the desired size of the null distribution has been obtained.

One can also specify a vector for the size argument, in which case one must also specify a corresponding vector for the threshold argument. In that case, a stepwise algorithm is used that proceeds as follows. For j = 1, ..., length(size),

  1. estimate the combined \(p\)-value based on size[j]

  2. if the combined \(p\)-value is \(\ge\) than threshold[j], stop (and report the combined \(p\)-value), otherwise go back to 1.

By setting size to increasing values (e.g., size = c(1000, 10000, 100000)) and threshold to decreasing values (e.g., threshold = c(.10, .01, 0)), one can quickly obtain a fairly accurate estimate of the combined \(p\)-value if it is far from significant (e.g., \(\ge\) .10), but hone in on a more accurate estimate for a combined \(p\)-value that is closer to 0. Note that the last value of threshold should be 0 (and is forced to be inside of the function), so that the algorithm is guaranteed to terminate (hence, one can also leave out the last value of threshold, so threshold = c(.10, .01) would also work in the example above). One can also specify a single threshold (which is replicated as often as necessary depending on the length of size).

Adjustment Based on Multivariate Theory

When adjust = "generalized", Stouffer's method is computed based on a multivariate normal distribution that accounts for the dependence among the tests, assuming that the test statistics that generated the \(p\)-values follow a multivariate normal distribution. In that case, R needs to be set equal to a matrix that contains the covariances among the \(z_i\) values. If a matrix is available that reflects the correlations among the test statistics, this can be converted into the required covariance matrix using the mvnconv function. See ‘Examples’.

This generalization of Stouffer's method is sometimes called Strube's method, based on Strube (1986), although the paper only describes the method for combining one-sided \(p\)-values. Both one- and two-sided versions of Strube's method are implemented in poolr, but caution must be exercised when applying it to two-sided \(p\)-values (even if the test statistics follow a multivariate normal distribution, \([z_1, z_2, \ldots, z_k]\) is then not multivariate normal, but this is implicitly assumed by the method).

Value

An object of class "poolr". The object is a list containing the following components:

p

combined \(p\)-value.

ci

confidence interval for the combined \(p\)-value (only when adjust = "empirical"; otherwise NULL).

k

number of \(p\)-values that were combined.

m

estimate of the effective number of tests (only when adjust is one of "nyholt", "liji", "gao" or "galwey"; otherwise NULL).

adjust

chosen adjustment method.

statistic

value of the (adjusted) test statistic.

fun

name of calling function.

Note

The methods underlying adjust = "empirical" and adjust = "generalized" assume that the test statistics that generated the \(p\)-values to be combined follow a multivariate normal distribution. Hence, the matrix specified via R must be positive definite. If it is not and nearpd = TRUE, it will be turned into one (based on Higham, 2002, and a slightly simplified version of nearPD from the Matrix package).

Author

Ozan Cinar ozancinar86@gmail.com
Wolfgang Viechtbauer wvb@wvbauer.com

References

Cinar, O. & Viechtbauer, W. (2022). The poolr package for combining independent and dependent p values. Journal of Statistical Software, 101(1), 1–42. https://doi.org/10.18637/jss.v101.i01

Higham, N. J. (2002). Computing the nearest correlation matrix: A problem from finance. IMA Journal of Numerical Analysis, 22(3), 329–343.

Stouffer, S. A., Suchman, E. A., DeVinney, L. C., Star, S. A., & Williams, R. M., Jr. (1949). The American Soldier: Adjustment During Army Life (Vol. 1). Princeton, NJ: Princeton University Press.

Strube, M. J. (1985). Combining and comparing significance levels from nonindependent hypothesis tests. Psychological Bulletin, 97(2), 334–341.

Examples

# copy p-values and LD correlation matrix into p and r
# (see help(grid2ip) for details on these data)
p <- grid2ip.p
r <- grid2ip.ld

# apply Stouffer's method
stouffer(p)
#> combined p-values with:      Stouffer's method
#> number of p-values combined: 23 
#> test statistic:              5.915 ~ N(0,1) 
#> adjustment:                  none 
#> combined p-value:            1.655433e-09 

# use mvnconv() to convert the LD correlation matrix into a matrix with the
# correlations among the (two-sided) p-values assuming that the test
# statistics follow a multivariate normal distribution with correlation
# matrix r (note: 'side = 2' by default in mvnconv())
mvnconv(r, target = "p", cov2cor = TRUE)[1:5,1:5] # show only rows/columns 1-5
#>            [,1]       [,2]        [,3]        [,4]        [,5]
#> [1,] 1.00000000 0.02160864 0.022809124 0.010804322 0.097238896
#> [2,] 0.02160864 1.00000000 0.013205282 0.000000000 0.042016807
#> [3,] 0.02280912 0.01320528 1.000000000 0.006002401 0.006002401
#> [4,] 0.01080432 0.00000000 0.006002401 1.000000000 0.000000000
#> [5,] 0.09723890 0.04201681 0.006002401 0.000000000 1.000000000

# adjustment based on estimates of the effective number of tests
stouffer(p, adjust = "nyholt", R = mvnconv(r, target = "p", cov2cor = TRUE))
#> combined p-values with:      Stouffer's method
#> number of p-values combined: 23 
#> test statistic:              5.785 ~ N(0,1) 
#> adjustment:                  effective number of tests (m = 22; Nyholt, 2004) 
#> combined p-value:            3.62e-09 
stouffer(p, adjust = "liji",   R = mvnconv(r, target = "p", cov2cor = TRUE))
#> combined p-values with:      Stouffer's method
#> number of p-values combined: 23 
#> test statistic:              5.652 ~ N(0,1) 
#> adjustment:                  effective number of tests (m = 21; Li & Ji, 2005) 
#> combined p-value:            7.91e-09 
stouffer(p, adjust = "gao",    R = mvnconv(r, target = "p", cov2cor = TRUE))
#> combined p-values with:      Stouffer's method
#> number of p-values combined: 23 
#> test statistic:              5.915 ~ N(0,1) 
#> adjustment:                  effective number of tests (m = 23; Gao, 2008) 
#> combined p-value:            1.66e-09 
stouffer(p, adjust = "galwey", R = mvnconv(r, target = "p", cov2cor = TRUE))
#> combined p-values with:      Stouffer's method
#> number of p-values combined: 23 
#> test statistic:              5.516 ~ N(0,1) 
#> adjustment:                  effective number of tests (m = 20; Galwey, 2009) 
#> combined p-value:            1.73e-08 

# setting argument 'm' manually
stouffer(p, m = 12)
#> combined p-values with:      Stouffer's method
#> number of p-values combined: 23 
#> test statistic:              4.273 ~ N(0,1) 
#> adjustment:                  effective number of tests (m = 12; user-defined) 
#> combined p-value:            9.65e-06 

# adjustment based on an empirically-derived null distribution (setting the
# seed for reproducibility)
set.seed(1234)
stouffer(p, adjust = "empirical", R = r)
#> combined p-values with:      Stouffer's method
#> number of p-values combined: 23 
#> test statistic:              5.915 ~ N(0,1) 
#> adjustment:                  empirical distribution (size = 10000) 
#> combined p-value:            5e-04 (95% CI: 0.000162, 0.00117) 

# generate the empirical distribution in batches of size 100
stouffer(p, adjust = "empirical", R = r, batchsize = 100)
#> combined p-values with:      Stouffer's method
#> number of p-values combined: 23 
#> test statistic:              5.915 ~ N(0,1) 
#> adjustment:                  empirical distribution (size = 10000) 
#> combined p-value:            0.0012 (95% CI: 0.00062, 0.0021) 

# using the stepwise algorithm
stouffer(p, adjust = "empirical", R = r, size = c(1000, 10000, 100000), threshold = c(.10, .01))
#> combined p-values with:      Stouffer's method
#> number of p-values combined: 23 
#> test statistic:              5.915 ~ N(0,1) 
#> adjustment:                  empirical distribution (size = 100000) 
#> combined p-value:            0.00074 (95% CI: 0.000581, 0.000929) 

# use mvnconv() to convert the LD correlation matrix into a matrix with the
# covariances among the (two-sided) 'z_i' values assuming that the
# test statistics follow a multivariate normal distribution with correlation
# matrix r (note: 'side = 2' by default in mvnconv())
mvnconv(r, target = "z")[1:5,1:5] # show only rows/columns 1-5
#>        [,1]   [,2]   [,3]   [,4]   [,5]
#> [1,] 1.0000 0.0243 0.0256 0.0117 0.1073
#> [2,] 0.0243 1.0000 0.0146 0.0001 0.0473
#> [3,] 0.0256 0.0146 1.0000 0.0065 0.0068
#> [4,] 0.0117 0.0001 0.0065 1.0000 0.0003
#> [5,] 0.1073 0.0473 0.0068 0.0003 1.0000

# adjustment based on generalized method
stouffer(p, adjust = "generalized", R = mvnconv(r, target = "z"))
#> combined p-values with:      Stouffer's method
#> number of p-values combined: 23 
#> test statistic:              3.69 ~ N(0,1) 
#> adjustment:                  Strube's method 
#> combined p-value:            0.000112 

# when using mvnconv() inside stouffer() with adjust = "generalized", the
# 'target' argument is automatically set and doesn't need to be specified
stouffer(p, adjust = "generalized", R = mvnconv(r))
#> combined p-values with:      Stouffer's method
#> number of p-values combined: 23 
#> test statistic:              3.69 ~ N(0,1) 
#> adjustment:                  Strube's method 
#> combined p-value:            0.000112