This metric quantifies how much the factorization and alignment distorts the geometry of the original datasets. The greater the agreement, the less distortion of geometry there is. This is calculated by performing dimensionality reduction on the original and integrated (factorized or plus aligned) datasets, and measuring similarity between the k nearest neighbors for each cell in original and integrated datasets. The Jaccard index is used to quantify similarity, and is the final metric averages across all cells.
Note that for most datasets, the greater the chosen nNeighbor
, the
greater the agreement in general. Although agreement can theoretically
approach 1, in practice it is usually no higher than 0.2-0.3.
Usage
calcAgreement(
object,
ndims = 40,
nNeighbors = 15,
useRaw = FALSE,
byDataset = FALSE,
seed = 1,
dr.method = NULL,
k = nNeighbors,
use.aligned = NULL,
rand.seed = seed,
by.dataset = byDataset
)
Arguments
- object
liger
object. Should callalignFactors
before calling.- ndims
Number of factors to produce in NMF. Default
40
.- nNeighbors
Number of nearest neighbors to use in calculating Jaccard index. Default
15
.- useRaw
Whether to evaluate just factorized \(H\) matrices instead of using aligned \(H.norm\) matrix. Default
FALSE
uses aligned matrix.- byDataset
Whether to return agreement calculated for each dataset instead of the average for all datasets. Default
FALSE
.- seed
Random seed to allow reproducible results. Default
1
.- dr.method
- k, rand.seed, by.dataset
- use.aligned
Value
A numeric vector of agreement metric. A single value if
byDataset = FALSE
or each dataset a value otherwise.
Examples
if (requireNamespace("RcppPlanc", quietly = TRUE)) {
pbmc <- pbmc %>%
normalize %>%
selectGenes %>%
scaleNotCenter %>%
runINMF %>%
alignFactors
calcAgreement(pbmc)
}
#> ℹ Normalizing datasets "ctrl"
#> ℹ Normalizing datasets "stim"
#> ✔ Normalizing datasets "stim" ... done
#>
#> ℹ Normalizing datasets "ctrl"
#> ✔ Normalizing datasets "ctrl" ... done
#>
#> ℹ Selecting variable features for dataset "ctrl"
#> ✔ ... 168 features selected out of 249 shared features.
#> ℹ Selecting variable features for dataset "stim"
#> ✔ ... 166 features selected out of 249 shared features.
#> ✔ Finally 173 shared variable features are selected.
#> ℹ Scaling dataset "ctrl"
#> ✔ Scaling dataset "ctrl" ... done
#>
#> ℹ Scaling dataset "stim"
#> ✔ Scaling dataset "stim" ... done
#>
#> ℹ Using largest dataset of recommended type as reference: "ctrl" with 300 cells
#> [1] 0.3723238