Estimate the Effective Number of Tests

Estimate the effective number of tests.

meff(R, eigen, method, ...)

Arguments

R: a \(k \times k\) symmetric matrix that reflects the correlation structure among the tests.
eigen: optional vector to directly supply the eigenvalues to the function (instead of computing them from the matrix given via R).
method: character string to specify the method to be used to estimate the effective number of tests (either "nyholt", "liji", "gao", "galwey", or "chen"). See ‘Details’.
...: other arguments.

Details

The function estimates the effective number of tests based on one of five different methods. All methods except the one by Chen and Liu (2011) work by extracting the eigenvalues from the \(R\) matrix supplied via the R argument (or from the eigenvalues directly passed via the eigen argument). Letting \(\lambda_i\) denote the \(i\)th eigenvalue of this matrix (with \(i = 1, \ldots, k\)) in decreasing order, the effective number of tests (\(m\)) is estimated as follows.

Method by Nyholt (2004)

\[m = 1 + (k - 1) \left(1 - \frac{\mbox{Var}(\lambda)}{k}\right)\] where \(\mbox{Var}(\lambda)\) is the observed sample variance of the \(k\) eigenvalues.

Method by Li & Ji (2005)

\[m = \sum_{i = 1}^k f(|\lambda_i|)\] where \(f(x) = I(x \ge 1) + (x - \lfloor x \rfloor)\) and \(\lfloor \cdot \rfloor\) is the floor function.

Method by Gao et al. (2008)

\[m = \min(x) \; \mbox{such that} \; \frac{\sum_{i = 1}^x \lambda_{i}}{\sum_{i = 1}^k \lambda_{i}} > C\] where \(C\) is a pre-defined parameter which is set to 0.995 by default, but can be adjusted (see ‘Note’).

Method by Galwey (2009)

\[m = \frac{\left(\sum_{i = 1}^k \sqrt{\lambda_i'}\right)^2}{\sum_{i = 1}^k \lambda_i'}\] where \(\lambda_i' = \max[0, \lambda_i]\).

Method by Chen & Liu (2011)

\[m = \sum_{i = 1}^k \frac{1}{R_i}\] where \(R_i = \sum_{j = 1}^k |r_{ij}|^C\) for \(i = 1, \ldots, k\) and \(r_{ij}\) denotes the element in the \(R\) matrix in row \(i\) and column \(j\). By default, the value of \(C\) is set to 7, but can be adjusted (see ‘Note’).

Note: For all methods that can yield a non-integer estimate (all but the method by Gao et al., 2008), the resulting estimate \(m\) is rounded down to the nearest integer.

Specifying the R Matrix

The \(R\) matrix should reflect the dependence structure among the tests. There is no general solution on how such a matrix should be constructed, as this depends on the type of test and the sidedness of these tests. For example, we can use the correlations among related but changing elements across the analyses/tests, or a function thereof, as a proxy for the dependence structure. For example, when conducting \(k\) analyses with the same dependent variable and \(k\) different independent variables, the correlations among the independent variables could serve as such a proxy. Analogously, if analyses are conducted for \(k\) dependent variables with the same set of independent variables, the correlations among the dependent variables could be used instead.

If the tests of interest have test statistics that can be assumed to follow a multivariate normal distribution and a matrix is available that reflects the correlations among the test statistics (which might be approximated by the correlations among the interchanging independent or dependent variables), then the mvnconv function can be used to convert this correlation matrix into the correlations among the (one- or two-sided) \(p\)-values, which in turn can then be passed to the R argument. See ‘Examples’.

Non-Positive Semi-Definite R

Depending on the way \(R\) was constructed, it may happen that this matrix is not positive semi-definite, leading to negative eigenvalues. The methods given above can all still be carried out in this case. However, another possibility is to handle such a case by using an algorithm that finds the nearest positive (semi-)definite matrix (e.g., Higham 2002) before passing this matrix to the function (see nearPD from the Matrix package for a corresponding implementation).

Value

A scalar giving the estimate of the effective number of tests.

Note

For method = "gao", C = 0.995 by default, but a different value of C can be passed to the function via ... (e.g., meff(R, method = "gao", C = 0.95)). For method = "chen", C = 7 by default, but a different value of C can be passed to the function via ... (e.g., meff(R, method = "chen", C = 6)).

Author

Ozan Cinar ozancinar86@gmail.com
Wolfgang Viechtbauer wvb@wvbauer.com

References

Chen, Z. X., & Liu, Q. Z. (2011). A new approach to account for the correlations among single nucleotide polymorphisms in genome-wide association studies. Human Heredity, 72(1), 1–9. https://doi.org/10.1159/000330135

Cinar, O. & Viechtbauer, W. (2022). The poolr package for combining independent and dependent p values. Journal of Statistical Software, 101(1), 1–42. https://doi.org/10.18637/jss.v101.i01

Gao, X., Starmer, J., & Martin, E. R. (2008). A multiple testing correction method for genetic association studies using correlated single nucleotide polymorphisms. Genetic Epidemiology, 32(4), 361–369. https://doi.org/10.1002/gepi.20310

Galwey, N. W. (2009). A new measure of the effective number of tests, a practical tool for comparing families of non-independent significance tests. Genetic Epidemiology, 33(7), 559–568. https://doi.org/10.1002/gepi.20408

Higham, N. J. (2002). Computing the nearest correlation matrix: A problem from finance. IMA Journal of Numerical Analysis, 22(3), 329–343. https://doi.org/10.1093/imanum/22.3.329

Li, J., & Ji, L. (2005). Adjusting multiple testing in multilocus analyses using the eigenvalues of a correlation matrix. Heredity, 95(3), 221–227. https://doi.org/10.1038/sj.hdy.6800717

Nyholt, D. R. (2004). A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other. American Journal of Human Genetics, 74(4), 765–769. https://doi.org/10.1086/383251

Examples

# copy LD correlation matrix into r (see help(grid2ip) for details on these data)
r <- grid2ip.ld

# estimate the effective number of tests based on the LD correlation matrix
meff(r, method = "nyholt")
#> [1] 20
meff(r, method = "liji")
#> [1] 15
meff(r, method = "gao")
#> [1] 18
meff(r, method = "galwey")
#> [1] 13
meff(r, method = "chen")
#> [1] 18

# use mvnconv() to convert the LD correlation matrix into a matrix with the
# correlations among the (two-sided) p-values assuming that the test
# statistics follow a multivariate normal distribution with correlation
# matrix r (note: 'side = 2' by default in mvnconv())
mvnconv(r, target = "p", cov2cor = TRUE)[1:5,1:5] # show only rows/columns 1-5
#>            [,1]       [,2]        [,3]        [,4]        [,5]
#> [1,] 1.00000000 0.02160864 0.022809124 0.010804322 0.097238896
#> [2,] 0.02160864 1.00000000 0.013205282 0.000000000 0.042016807
#> [3,] 0.02280912 0.01320528 1.000000000 0.006002401 0.006002401
#> [4,] 0.01080432 0.00000000 0.006002401 1.000000000 0.000000000
#> [5,] 0.09723890 0.04201681 0.006002401 0.000000000 1.000000000

# use this matrix instead for estimating the effective number of tests
meff(mvnconv(r, target = "p", cov2cor = TRUE), method = "nyholt")
#> [1] 22
meff(mvnconv(r, target = "p", cov2cor = TRUE), method = "liji")
#> [1] 21
meff(mvnconv(r, target = "p", cov2cor = TRUE), method = "gao")
#> [1] 23
meff(mvnconv(r, target = "p", cov2cor = TRUE), method = "galwey")
#> [1] 20
meff(mvnconv(r, target = "p", cov2cor = TRUE), method = "chen")
#> [1] 22