Calculate alignment metric after integration

This metric quantifies how well-aligned two or more datasets are. We randomly downsample all datasets to have as many cells as the smallest one. We construct a nearest-neighbor graph and calculate for each cell how many of its neighbors are from the same dataset. We average across all cells and compare to the expected value for perfectly mixed datasets, and scale the value from 0 to 1. Note that in practice, alignment can be greater than 1 occasionally.

Usage

calcAlignment(
  object,
  clustersUse = NULL,
  clusterVar = NULL,
  nNeighbors = NULL,
  cellIdx = NULL,
  cellComp = NULL,
  resultBy = c("all", "dataset", "cell"),
  seed = 1,
  k = nNeighbors,
  rand.seed = seed,
  cells.use = cellIdx,
  cells.comp = cellComp,
  clusters.use = clustersUse,
  by.cell = NULL,
  by.dataset = NULL
)

Arguments

object: A liger object, with alignFactors already run.
clustersUse: The clusters to consider for calculating the alignment. Should be a vector of existing levels in clusterVar. Default NULL. See Details.
clusterVar: The name of one variable in cellMeta(object). Default NULL uses default clusters.
nNeighbors: Number of neighbors to use in calculating alignment. Default NULL uses floor(0.01*ncol(object)), with a lower bound of 10 in all cases except where the total number of sampled cells is less than 10.
cellIdx, cellComp: Character, logical or numeric index that can subscribe cells. Default NULL. See Details.
resultBy: Select from "all", "dataset" or "cell". On which level should the mean alignment be calculated. Default "all".
seed: Random seed to allow reproducible results. Default 1.
k, rand.seed, cells.use, cells.comp, clusters.use: Please see Usage for replacement.
by.cell, by.dataset: Use resultBy instead.

Value

The alignment metric.

Details

$\bar{x}$ is the average number of neighbors belonging to any cells' same dataset, $N$ is the number of datasets, $k$ is the number of neighbors in the KNN graph. $$1 - \frac{\bar{x} - \frac{k}{N}}{k - \frac{k}{N}}$$

The selection on cells to be measured can be done in various way and represent different scenarios:

By default, all cells are considered and the alignment across all datasets will be calculated.
Select clustersUse from clusterVar to use cells from the clusters of interests. This measures the alignment across all covered datasets within the specified clusters.
Only Specify cellIdx for flexible selection. This measures the alignment across all covered datasets within the specified cells. A none-NULL cellIdx privileges over clustersUse.
Specify cellIdx and cellComp at the same time, so that the original dataset source will be ignored and cells specified by each argument will be regarded as from each a dataset. This measures the alignment between cells specified by the two arguments. cellComp can contain cells already specified in cellIdx.

Examples

if (requireNamespace("RcppPlanc", quietly = TRUE)) {
    pbmc <- pbmc %>%
    normalize %>%
    selectGenes %>%
    scaleNotCenter %>%
    runINMF %>%
    alignFactors
    calcAlignment(pbmc)
}
#> ℹ Normalizing datasets "ctrl"
#> ℹ Normalizing datasets "stim"
#> ✔ Normalizing datasets "stim" ... done
#> 
#> ℹ Normalizing datasets "ctrl"

#> ✔ Normalizing datasets "ctrl" ... done
#> 
#> ℹ Selecting variable features for dataset "ctrl"
#> ✔ ... 168 features selected out of 249 shared features.
#> ℹ Selecting variable features for dataset "stim"
#> ✔ ... 166 features selected out of 249 shared features.
#> ✔ Finally 173 shared variable features are selected.
#> ℹ Scaling dataset "ctrl"
#> ✔ Scaling dataset "ctrl" ... done
#> 
#> ℹ Scaling dataset "stim"
#> ✔ Scaling dataset "stim" ... done
#> 
#> ℹ Using largest dataset of recommended type as reference: "ctrl" with 300 cells
#> [1] 0.8996667