meff.Rd
Estimate the effective number of tests.
meff(R, eigen, method, ...)
a \(k \times k\) symmetric matrix that reflects the correlation structure among the tests.
optional vector to directly supply the eigenvalues to the function (instead of computing them from the matrix given via R
).
character string to specify the method to be used to estimate the effective number of tests (either "nyholt"
, "liji"
, "gao"
, or "galwey"
). See ‘Details’.
other arguments.
The function estimates the effective number of tests based on one of four different methods. All methods work by extracting the eigenvalues from the \(R\) matrix supplied via the R
argument (or from the eigenvalues directly passed via the eigen
argument). Letting \(\lambda_i\) denote the \(i\)th eigenvalue of this matrix (with \(i = 1, \ldots, k\)) in decreasing order, the effective number of tests (\(m\)) is estimated as follows.
Method by Nyholt (2004)
\[m = 1 + (k - 1) \left(1 - \frac{\mbox{Var}(\lambda)}{k}\right)\] where \(\mbox{Var}(\lambda)\) is the observed sample variance of the \(k\) eigenvalues.
Method by Li & Ji (2005)
\[m = \sum_{i = 1}^k f(|\lambda_i|)\] where \(f(x) = I(x \ge 1) + (x - \lfloor x \rfloor)\) and \(\lfloor \cdot \rfloor\) is the floor function.
Method by Gao et al. (2008)
\[m = \min(x) \; \mbox{such that} \; \frac{\sum_{i = 1}^x \lambda_{i}}{\sum_{i = 1}^k \lambda_{i}} > C\] where \(C\) is a pre-defined parameter which is set to 0.995 by default.
Method by Galwey (2009)
\[m = \frac{\left(\sum_{i = 1}^k \sqrt{\lambda_i'}\right)^2}{\sum_{i = 1}^k \lambda_i'}\] where \(\lambda_i' = \max[0, \lambda_i]\).
Note: For all methods that can yield a non-integer estimate (all but the method by Gao et al., 2008), the resulting estimate \(m\) is rounded down to the nearest integer.
Specifying the R Matrix
The \(R\) matrix should reflect the dependence structure among the tests. There is no general solution on how such a matrix should be constructed, as this depends on the type of test and the sidedness of these tests. For example, we can use the correlations among related but changing elements across the analyses/tests, or a function thereof, as a proxy for the dependence structure. For example, when conducting \(k\) analyses with the same dependent variable and \(k\) different independent variables, the correlations among the independent variables could serve as such a proxy. Analogously, if analyses are conducted for \(k\) dependent variables with the same set of independent variables, the correlations among the dependent variables could be used instead.
If the tests of interest have test statistics that can be assumed to follow a multivariate normal distribution and a matrix is available that reflects the correlations among the test statistics (which might be approximated by the correlations among the interchanging independent or dependent variables), then the mvnconv
function can be used to convert this correlation matrix into the correlations among the (one- or two-sided) \(p\)-values, which in turn can then be passed to the R
argument. See ‘Examples’.
Not Positive Semi-Definite R
Depending on the way \(R\) was constructed, it may happen that this matrix is not positive semi-definite, leading to negative eigenvalues. The methods given above can all still be carried out in this case. However, another possibility is to handle such a case by using an algorithm that finds the nearest positive (semi-)definite matrix (e.g., Higham 2002) before passing this matrix to the function (see nearPD
from the Matrix package for a corresponding implementation).
A scalar giving the estimate of the effective number of tests.
For method = "gao"
, C = 0.995
by default, but a different value of C
can be passed to the function via ...
(e.g., meff(R, method = "gao", C = 0.95)
).
Cinar, O. & Viechtbauer, W. (2022). The poolr package for combining independent and dependent p values. Journal of Statistical Software, 101(1), 1–42. https://doi.org/10.18637/jss.v101.i01
Gao, X., Starmer, J., & Martin, E. R. (2008). A multiple testing correction method for genetic association studies using correlated single nucleotide polymorphisms. Genetic Epidemiology, 32(4), 361–369.
Galwey, N. W. (2009). A new measure of the effective number of tests, a practical tool for comparing families of non-independent significance tests. Genetic Epidemiology, 33(7), 559–568.
Higham, N. J. (2002). Computing the nearest correlation matrix: A problem from finance. IMA Journal of Numerical Analysis, 22(3), 329–343.
Li, J., & Ji, L. (2005). Adjusting multiple testing in multilocus analyses using the eigenvalues of a correlation matrix. Heredity, 95(3), 221–227.
Nyholt, D. R. (2004). A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other. American Journal of Human Genetics, 74(4), 765–769.
# copy LD correlation matrix into r (see help(grid2ip) for details on these data)
r <- grid2ip.ld
# estimate the effective number of tests based on the LD correlation matrix
meff(r, method = "nyholt")
#> [1] 20
meff(r, method = "liji")
#> [1] 15
meff(r, method = "gao")
#> [1] 18
meff(r, method = "galwey")
#> [1] 13
# use mvnconv() to convert the LD correlation matrix into a matrix with the
# correlations among the (two-sided) p-values assuming that the test
# statistics follow a multivariate normal distribution with correlation
# matrix r (note: 'side = 2' by default in mvnconv())
mvnconv(r, target = "p", cov2cor = TRUE)[1:5,1:5] # show only rows/columns 1-5
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1.00000000 0.02160864 0.022809124 0.010804322 0.097238896
#> [2,] 0.02160864 1.00000000 0.013205282 0.000000000 0.042016807
#> [3,] 0.02280912 0.01320528 1.000000000 0.006002401 0.006002401
#> [4,] 0.01080432 0.00000000 0.006002401 1.000000000 0.000000000
#> [5,] 0.09723890 0.04201681 0.006002401 0.000000000 1.000000000
# use this matrix instead for estimating the effective number of tests
meff(mvnconv(r, target = "p", cov2cor = TRUE), method = "nyholt")
#> [1] 22
meff(mvnconv(r, target = "p", cov2cor = TRUE), method = "liji")
#> [1] 21
meff(mvnconv(r, target = "p", cov2cor = TRUE), method = "gao")
#> [1] 23
meff(mvnconv(r, target = "p", cov2cor = TRUE), method = "galwey")
#> [1] 20