Uses an efficient strategy for updating that takes advantage of
the information in the existing factorization. Assumes that variable features
are presented in the new datasets. Two modes are supported (controlled by
merge
):
Append new data to existing datasets specified by
useDatasets
. Here the existing \(V\) matrices for the target datasets will directly be used as initialization, and new \(H\) matrices for the merged matrices will be initialized accordingly.Set new data as new datasets. Initial \(V\) matrices for them will be copied from datasets specified by
useDatasets
, and new \(H\) matrices will be initialized accordingly.
Usage
optimizeNewData(
object,
dataNew,
useDatasets,
merge = TRUE,
lambda = NULL,
nIteration = 30,
seed = 1,
verbose = getOption("ligerVerbose"),
new.data = dataNew,
which.datasets = useDatasets,
add.to.existing = merge,
max.iters = nIteration,
thresh = NULL
)
Arguments
- object
A liger object. Should have integrative factorization performed e.g. (
runINMF
) in advance.- dataNew
Named list of raw count matrices, genes by cells.
- useDatasets
Selection of datasets to append new data to if
merge = TRUE
, or the datasets to inherit \(V\) matrices from and initialize the optimization whenmerge = FALSE
. Should match the length and order ofdataNew
.- merge
Logical, whether to add the new data to existing datasets or treat as totally new datasets (i.e. calculate new \(V\) matrices). Default
TRUE
.- lambda
Numeric regularization parameter. By default
NULL
, this will use the lambda value used in the latest factorization.- nIteration
Number of block coordinate descent iterations to perform. Default
30
.- seed
Random seed to allow reproducible results. Default
1
. Used byrunINMF
factorization.- verbose
Logical. Whether to show information of the progress. Default
getOption("ligerVerbose")
which isTRUE
if users have not set.- new.data, which.datasets, add.to.existing, max.iters
These arguments are now replaced by others and will be removed in the future. Please see usage for replacement.
- thresh
Deprecated. New implementation of iNMF does not require a threshold for convergence detection. Setting a large enough
nIteration
will bring it to convergence.
Value
object
with W
slot updated with the new \(W\)
matrix, and the H
and V
slots of each
ligerDataset object in the datasets
slot updated with
the new dataset specific \(H\) and \(V\) matrix, respectively.
Examples
pbmc <- normalize(pbmc)
#> ℹ Normalizing datasets "ctrl"
#> ℹ Normalizing datasets "stim"
#> ✔ Normalizing datasets "stim" ... done
#>
#> ℹ Normalizing datasets "ctrl"
#> ✔ Normalizing datasets "ctrl" ... done
#>
pbmc <- selectGenes(pbmc)
#> ℹ Selecting variable features for dataset "ctrl"
#> ✔ ... 168 features selected out of 249 shared features.
#> ℹ Selecting variable features for dataset "stim"
#> ✔ ... 166 features selected out of 249 shared features.
#> ✔ Finally 173 shared variable features are selected.
pbmc <- scaleNotCenter(pbmc)
#> ℹ Scaling dataset "ctrl"
#> ✔ Scaling dataset "ctrl" ... done
#>
#> ℹ Scaling dataset "stim"
#> ✔ Scaling dataset "stim" ... done
#>
# Only running a few iterations for fast examples
if (requireNamespace("RcppPlanc", quietly = TRUE)) {
pbmc <- runINMF(pbmc, k = 20, nIteration = 2)
# Create fake new data by increasing all non-zero count in "ctrl" by 1,
# and make unique cell identifiers
ctrl2 <- rawData(dataset(pbmc, "ctrl"))
ctrl2@x <- ctrl2@x + 1
colnames(ctrl2) <- paste0(colnames(ctrl2), 2)
pbmcNew <- optimizeNewData(pbmc, dataNew = list(ctrl2 = ctrl2),
useDatasets = "ctrl", nIteration = 2)
}
#> Fri Oct 25 15:12:33 2024 ... Initializing with new data merged to existing datasets...
#> ℹ Updated QC variables: "nUMI" and "nGene"
#> ℹ Normalizing datasets "ctrl"
#> ✔ Normalizing datasets "ctrl" ... done
#>
#> ℹ Scaling dataset "ctrl"
#> ✔ Scaling dataset "ctrl" ... done
#>