Perform factorization for new data

Uses an efficient strategy for updating that takes advantage of the information in the existing factorization. Assumes that variable features are presented in the new datasets. Two modes are supported (controlled by merge):

Append new data to existing datasets specified by useDatasets. Here the existing \(V\) matrices for the target datasets will directly be used as initialization, and new \(H\) matrices for the merged matrices will be initialized accordingly.
Set new data as new datasets. Initial \(V\) matrices for them will be copied from datasets specified by useDatasets, and new \(H\) matrices will be initialized accordingly.

Usage

optimizeNewData(
  object,
  dataNew,
  useDatasets,
  merge = TRUE,
  lambda = NULL,
  nIteration = 30,
  seed = 1,
  verbose = getOption("ligerVerbose"),
  new.data = dataNew,
  which.datasets = useDatasets,
  add.to.existing = merge,
  max.iters = nIteration,
  thresh = NULL
)

Arguments

object: A liger object. Should have integrative factorization performed e.g. (runINMF) in advance.
dataNew: Named list of raw count matrices, genes by cells.
useDatasets: Selection of datasets to append new data to if merge = TRUE, or the datasets to inherit \(V\) matrices from and initialize the optimization when merge = FALSE. Should match the length and order of dataNew.
merge: Logical, whether to add the new data to existing datasets or treat as totally new datasets (i.e. calculate new \(V\) matrices). Default TRUE.
lambda: Numeric regularization parameter. By default NULL, this will use the lambda value used in the latest factorization.
nIteration: Number of block coordinate descent iterations to perform. Default 30.
seed: Random seed to allow reproducible results. Default 1. Used by runINMF factorization.
verbose: Logical. Whether to show information of the progress. Default getOption("ligerVerbose") which is TRUE if users have not set.
new.data, which.datasets, add.to.existing, max.iters: These arguments are now replaced by others and will be removed in the future. Please see usage for replacement.
thresh: Deprecated. New implementation of iNMF does not require a threshold for convergence detection. Setting a large enough nIteration will bring it to convergence.

Value

object with W slot updated with the new \(W\)

matrix, and the H and V slots of each ligerDataset object in the datasets slot updated with the new dataset specific \(H\) and \(V\) matrix, respectively.

Examples

pbmc <- normalize(pbmc)
#> ℹ Normalizing datasets "ctrl"
#> ℹ Normalizing datasets "stim"
#> ✔ Normalizing datasets "stim" ... done
#> 
#> ℹ Normalizing datasets "ctrl"

#> ✔ Normalizing datasets "ctrl" ... done
#> 
pbmc <- selectGenes(pbmc)
#> ℹ Selecting variable features for dataset "ctrl"
#> ✔ ... 168 features selected out of 249 shared features.
#> ℹ Selecting variable features for dataset "stim"
#> ✔ ... 166 features selected out of 249 shared features.
#> ✔ Finally 173 shared variable features are selected.
pbmc <- scaleNotCenter(pbmc)
#> ℹ Scaling dataset "ctrl"
#> ✔ Scaling dataset "ctrl" ... done
#> 
#> ℹ Scaling dataset "stim"
#> ✔ Scaling dataset "stim" ... done
#> 
# Only running a few iterations for fast examples
if (requireNamespace("RcppPlanc", quietly = TRUE)) {
    pbmc <- runINMF(pbmc, k = 20, nIteration = 2)
    # Create fake new data by increasing all non-zero count in "ctrl" by 1,
    # and make unique cell identifiers
    ctrl2 <- rawData(dataset(pbmc, "ctrl"))
    ctrl2@x <- ctrl2@x + 1
    colnames(ctrl2) <- paste0(colnames(ctrl2), 2)
    pbmcNew <- optimizeNewData(pbmc, dataNew = list(ctrl2 = ctrl2),
                               useDatasets = "ctrl", nIteration = 2)
}
#> Fri Oct 25 15:12:33 2024 ... Initializing with new data merged to existing datasets...
#> ℹ Updated QC variables: "nUMI" and "nGene"
#> ℹ Normalizing datasets "ctrl"
#> ✔ Normalizing datasets "ctrl" ... done
#> 
#> ℹ Scaling dataset "ctrl"
#> ✔ Scaling dataset "ctrl" ... done
#>

Usage

Arguments

Value

See also

Examples