When working with a large matrix \(M\), we can distinguish between algorithms that require access to arbitrary elements of \(M\) and algorithms that only require the ability to multiply by \(M\). The latter are called *matrix-free* algorithms; they work with \(M\) as a linear operator rather than with its representation as a matrix.

The advantage of matrix-free algorithms is that for specific \(M\) there may be much faster ways to compute \(Mx\) than by matrix multiplication. As one extreme example, consider the centering operator \(x\mapsto x-\bar x\) on a length-\(n\) vector, which can be computed in linear time from its definition but would take time quadratic in \(n\) if the operator were represented as multiplication by an \(n\times n\) matrix. As another, consider multiplying by a diagonal matrix: the matrix can be represented in linear space and the multiplication performed in linear time by just ignoring all the zero off-diagonal elements.

One important application of `bigQF`

is the SKAT test. This involves the eigenvalues of a matrix that is the product of a sparse matrix and a projection matrix. Multiplying by the sparse component is fast for essentially the same reasons that multiplying by a diagonal matrix is fast. Multiplying by the projection component is fast for essentially the same reasons that centering is fast.

Both the stochastic SVD and Lanczos-type algorithms have matrix-free implementations, and the package provides an object-oriented mechanism to use these implementations to compute the distribution of a quadratic form. The `ssvd`

function also accepts these objects.

An object of class `matrix-free`

is a list with the following components

`mult`

Function to multiply by \(M\)`tmult`

Function to multiply by \(M^T\)`trace`

numeric, the trace of \(M^TM\) (needed for`pQF`

but not`ssvd`

)`ncol`

integer, the number of columns of \(M\)`nrow`

integer, the number of rows of \(M\)

As a simple example, suppose `M`

is a sparse matrix stored using the Matrix package. We can define (see `sparse.matrixfree`

)

```
rval <- list(
mult = function(X) M %*% X,
tmult = function(X) crossprod(M, X),
trace = sum(M^2),
ncol = ncol(M),
nrow = nrow(M)
)
class(rval) <- "matrixfree"
```

The computations for `trace`

, `ncol`

, and `nrow`

are done at the time the object is constructed. The `mult`

and `tmult`

functions will be efficient because they use the sparse-matrix algorithms in the Matrix package.

The `SKAT.matrixfree`

objects have a more complicated implementation. The matrix is of the form \(M=\Pi G W/\sqrt{2}\), where \(W\) is a diagonal matrix of weights, \(G\) is a sparse genotype matrix, and \(\Pi\) is the projection matrix on to the residual space of a linear regression model. Since `M`

is not sparse, it is not sufficient just to use the sparse-matrix code of the previous example. Instead we specify the multiplication function as

```
function(X) {
base::qr.resid(qr, as.matrix(spG %*% X))/sqrt(2)
}
```

where `spG`

is a sparse Matrix object containing \(GW\) and `qr`

is the QR decomposition of the design matrix from the linear model. The transpose multiplication function is

```
function(X) {
crossprod(spG, qr.resid(qr, X))/sqrt(2)
}
```

Multiplying by the sparse component takes time proportional to the number of non-zero entries of \(GW\) and the projection takes time proportional to \(np^2\) where \(n\) is the number of observations and \(p\) is the number of predictors in the linear model. When the genotype matrix is sparse and \(p^2\ll n\), the matrix-free algorithm will be fast.