Please turn to runOnlineINMF
or
runIntegration
.
Perform online integrative non-negative matrix factorization to represent multiple single-cell datasets in terms of H, W, and V matrices. It optimizes the iNMF objective function using online learning (non-negative least squares for H matrix, hierarchical alternating least squares for W and V matrices), where the number of factors is set by k. The function allows online learning in 3 scenarios: (1) fully observed datasets; (2) iterative refinement using continually arriving datasets; and (3) projection of new datasets without updating the existing factorization. All three scenarios require fixed memory independent of the number of cells.
For each dataset, this factorization produces an H matrix (cells by k), a V matrix (k by genes), and a shared W matrix (k by genes). The H matrices represent the cell factor loadings. W is identical among all datasets, as it represents the shared components of the metagenes across datasets. The V matrices represent the dataset-specific components of the metagenes.
Arguments
- object
liger
object with data stored in HDF5 files. Should normalize, select genes, and scale before calling.- X_new
List of new datasets for scenario 2 or scenario 3. Each list element should be the name of an HDF5 file.
- projection
Perform data integration by shared metagene (W) projection (scenario 3). (default FALSE)
- W.init
Optional initialization for W. (default NULL)
- V.init
Optional initialization for V (default NULL)
- H.init
Optional initialization for H (default NULL)
- A.init
Optional initialization for A (default NULL)
- B.init
Optional initialization for B (default NULL)
- k
Inner dimension of factorization--number of metagenes (default 20). A value in the range 20-50 works well for most analyses.
- lambda
Regularization parameter. Larger values penalize dataset-specific effects more strongly (ie. alignment should increase as lambda increases). We recommend always using the default value except possibly for analyses with relatively small differences (biological replicates, male/female comparisons, etc.) in which case a lower value such as 1.0 may improve reconstruction quality. (default 5.0).
- max.epochs
Maximum number of epochs (complete passes through the data). (default 5)
- miniBatch_max_iters
Maximum number of block coordinate descent (HALS algorithm) iterations to perform for each update of W and V (default 1). Changing this parameter is not recommended.
- miniBatch_size
Total number of cells in each minibatch (default 5000). This is a reasonable default, but a smaller value such as 1000 may be necessary for analyzing very small datasets. In general, minibatch size should be no larger than the number of cells in the smallest dataset.
- h5_chunk_size
Chunk size of input hdf5 files (default 1000). The chunk size should be no larger than the batch size.
- seed
Random seed to allow reproducible results (default 123).
- verbose
Print progress bar/messages (TRUE by default)