Performs integrative non-negative matrix factorization (iNMF) (J.D. Welch, 2019) using block coordinate descent (alternating non-negative least squares, ANLS) to return factorized \(H\), \(W\), and \(V\) matrices. The objective function is stated as
$$\arg\min_{H\ge0,W\ge0,V\ge0}\sum_{i}^{d}||E_i-(W+V_i)Hi||^2_F+ \lambda\sum_{i}^{d}||V_iH_i||_F^2$$
where \(E_i\) is the input non-negative matrix of the i'th dataset, \(d\) is the total number of datasets. \(E_i\) is of size \(m \times n_i\) for \(m\) variable genes and \(n_i\) cells, \(H_i\) is of size \(n_i \times k\), \(V_i\) is of size \(m \times k\), and \(W\) is of size \(m \times k\).
The factorization produces a shared \(W\) matrix (genes by k), and for each dataset, an \(H\) matrix (k by cells) and a \(V\) matrix (genes by k). The \(H\) matrices represent the cell factor loadings. \(W\) is held consistent among all datasets, as it represents the shared components of the metagenes across datasets. The \(V\) matrices represent the dataset-specific components of the metagenes.
This function adopts highly optimized fast and memory efficient
implementation extended from Planc (Kannan, 2016). Pre-installation of
extension package RcppPlanc
is required. The underlying algorithm
adopts the identical ANLS strategy as optimizeALS
in the old
version of LIGER.
Usage
runINMF(object, k = 20, lambda = 5, ...)
# S3 method for liger
runINMF(
object,
k = 20,
lambda = 5,
nIteration = 30,
nRandomStarts = 1,
HInit = NULL,
WInit = NULL,
VInit = NULL,
seed = 1,
nCores = 2L,
verbose = getOption("ligerVerbose", TRUE),
...
)
# S3 method for Seurat
runINMF(
object,
k = 20,
lambda = 5,
datasetVar = "orig.ident",
layer = "ligerScaleData",
assay = NULL,
reduction = "inmf",
nIteration = 30,
nRandomStarts = 1,
HInit = NULL,
WInit = NULL,
VInit = NULL,
seed = 1,
nCores = 2L,
verbose = getOption("ligerVerbose", TRUE),
...
)
Arguments
- object
A liger object or a Seurat object with non-negative scaled data of variable features (Done with
scaleNotCenter
).- k
Inner dimension of factorization (number of factors). Generally, a higher
k
will be needed for datasets with more sub-structure. Default20
.- lambda
Regularization parameter. Larger values penalize dataset-specific effects more strongly (i.e. alignment should increase as
lambda
increases). Default5
.- ...
Arguments passed to methods.
- nIteration
Total number of block coordinate descent iterations to perform. Default
30
.- nRandomStarts
Number of restarts to perform (iNMF objective function is non-convex, so taking the best objective from multiple successive initialization is recommended). For easier reproducibility, this increments the random seed by 1 for each consecutive restart, so future factorization of the same dataset can be run with one rep if necessary. Default
1
.- HInit
Initial values to use for \(H\) matrices. A list object where each element is the initial \(H\) matrix of each dataset. Default
NULL
.- WInit
Initial values to use for \(W\) matrix. A matrix object. Default
NULL
.- VInit
Initial values to use for \(V\) matrices. A list object where each element is the initial \(V\) matrix of each dataset. Default
NULL
.- seed
Random seed to allow reproducible results. Default
1
.- nCores
The number of parallel tasks to speed up the computation. Default
2L
. Only supported for platform with OpenMP support.- verbose
Logical. Whether to show information of the progress. Default
getOption("ligerVerbose")
orTRUE
if users have not set.- datasetVar
Metadata variable name that stores the dataset source annotation. Default
"orig.ident"
.- layer
For Seurat>=4.9.9, the name of layer to retrieve input non-negative scaled data. Default
"ligerScaleData"
. For older Seurat, always retrieve fromscale.data
slot.- assay
Name of assay to use. Default
NULL
uses current active assay.- reduction
Name of the reduction to store result. Also used as the feature key. Default
"inmf"
.
Value
liger method - Returns updated input liger object
A list of all \(H\) matrices can be accessed with
getMatrix(object, "H")
A list of all \(V\) matrices can be accessed with
getMatrix(object, "V")
The \(W\) matrix can be accessed with
getMatrix(object, "W")
Seurat method - Returns updated input Seurat object
\(H\) matrices for all datasets will be concatenated and transposed (all cells by k), and form a DimReduc object in the
reductions
slot named by argumentreduction
.\(W\) matrix will be presented as
feature.loadings
in the same DimReduc object.\(V\) matrices, an objective error value and the dataset variable used for the factorization is currently stored in
misc
slot of the same DimReduc object.
Difference from optimizeALS()
In the old version implementation, we compute the objective error at the end
of each iteration, and then compares if the algorithm is reaching a
convergence, using an argument thresh
. Now, since the computation of
objective error is indeed expensive, we canceled this feature and directly
runs a default of 30 (nIteration
) iterations, which empirically leads
to a convergence most of the time. Given that the new version is highly
optimized, running this many iteration should be acceptable.
References
Joshua D. Welch and et al., Single-Cell Multi-omic Integration Compares and Contrasts Features of Brain Cell Identity, Cell, 2019
Examples
pbmc <- normalize(pbmc)
#> ℹ Normalizing datasets "ctrl"
#> ℹ Normalizing datasets "stim"
#> ✔ Normalizing datasets "stim" ... done
#>
#> ℹ Normalizing datasets "ctrl"
#> ✔ Normalizing datasets "ctrl" ... done
#>
pbmc <- selectGenes(pbmc)
#> ℹ Selecting variable features for dataset "ctrl"
#> ✔ ... 168 features selected out of 249 shared features.
#> ℹ Selecting variable features for dataset "stim"
#> ✔ ... 166 features selected out of 249 shared features.
#> ✔ Finally 173 shared variable features are selected.
pbmc <- scaleNotCenter(pbmc)
#> ℹ Scaling dataset "ctrl"
#> ✔ Scaling dataset "ctrl" ... done
#>
#> ℹ Scaling dataset "stim"
#> ✔ Scaling dataset "stim" ... done
#>
if (requireNamespace("RcppPlanc", quietly = TRUE)) {
pbmc <- runINMF(pbmc)
}