This metric quantifies how well-aligned two or more datasets are. We randomly downsample all datasets to have as many cells as the smallest one. We construct a nearest-neighbor graph and calculate for each cell how many of its neighbors are from the same dataset. We average across all cells and compare to the expected value for perfectly mixed datasets, and scale the value from 0 to 1. Note that in practice, alignment can be greater than 1 occasionally.
Usage
calcAlignment(
object,
clustersUse = NULL,
clusterVar = NULL,
nNeighbors = NULL,
cellIdx = NULL,
cellComp = NULL,
resultBy = c("all", "dataset", "cell"),
seed = 1,
k = nNeighbors,
rand.seed = seed,
cells.use = cellIdx,
cells.comp = cellComp,
clusters.use = clustersUse,
by.cell = NULL,
by.dataset = NULL
)
Arguments
- object
A liger object, with
alignFactors
already run.- clustersUse
The clusters to consider for calculating the alignment. Should be a vector of existing levels in
clusterVar
. DefaultNULL
. See Details.- clusterVar
The name of one variable in
cellMeta(object)
. DefaultNULL
uses default clusters.- nNeighbors
Number of neighbors to use in calculating alignment. Default
NULL
usesfloor(0.01*ncol(object))
, with a lower bound of 10 in all cases except where the total number of sampled cells is less than 10.- cellIdx, cellComp
Character, logical or numeric index that can subscribe cells. Default
NULL
. See Details.- resultBy
Select from
"all"
,"dataset"
or"cell"
. On which level should the mean alignment be calculated. Default"all"
.- seed
Random seed to allow reproducible results. Default
1
.- k, rand.seed, cells.use, cells.comp, clusters.use
- by.cell, by.dataset
Details
\(\bar{x}\) is the average number of neighbors belonging to any cells' same dataset, \(N\) is the number of datasets, \(k\) is the number of neighbors in the KNN graph. $$1 - \frac{\bar{x} - \frac{k}{N}}{k - \frac{k}{N}}$$
The selection on cells to be measured can be done in various way and represent different scenarios:
By default, all cells are considered and the alignment across all datasets will be calculated.
Select
clustersUse
fromclusterVar
to use cells from the clusters of interests. This measures the alignment across all covered datasets within the specified clusters.Only Specify
cellIdx
for flexible selection. This measures the alignment across all covered datasets within the specified cells. A none-NULLcellIdx
privileges overclustersUse
.Specify
cellIdx
andcellComp
at the same time, so that the original dataset source will be ignored and cells specified by each argument will be regarded as from each a dataset. This measures the alignment between cells specified by the two arguments.cellComp
can contain cells already specified incellIdx
.
Examples
if (requireNamespace("RcppPlanc", quietly = TRUE)) {
pbmc <- pbmc %>%
normalize %>%
selectGenes %>%
scaleNotCenter %>%
runINMF %>%
alignFactors
calcAlignment(pbmc)
}
#> ℹ Normalizing datasets "ctrl"
#> ℹ Normalizing datasets "stim"
#> ✔ Normalizing datasets "stim" ... done
#>
#> ℹ Normalizing datasets "ctrl"
#> ✔ Normalizing datasets "ctrl" ... done
#>
#> ℹ Selecting variable features for dataset "ctrl"
#> ✔ ... 168 features selected out of 249 shared features.
#> ℹ Selecting variable features for dataset "stim"
#> ✔ ... 166 features selected out of 249 shared features.
#> ✔ Finally 173 shared variable features are selected.
#> ℹ Scaling dataset "ctrl"
#> ✔ Scaling dataset "ctrl" ... done
#>
#> ℹ Scaling dataset "stim"
#> ✔ Scaling dataset "stim" ... done
#>
#> ℹ Using largest dataset of recommended type as reference: "ctrl" with 300 cells
#> [1] 0.8996667