Quantile Align (Normalize) Factor Loadings

This process builds a shared factor neighborhood graph to jointly cluster cells, then quantile normalizes corresponding clusters.

The first step, building the shared factor neighborhood graph, is performed in SNF(), and produces a graph representation where edge weights between cells (across all datasets) correspond to their similarity in the shared factor neighborhood space. An important parameter here is nNeighbors, the number of neighbors used to build the shared factor space.

Next we perform quantile alignment for each dataset, factor, and cluster (by stretching/compressing datasets' quantiles to better match those of the reference dataset).

Usage

quantileNorm(object, ...)

# S3 method for liger
quantileNorm(
  object,
  quantiles = 50,
  reference = NULL,
  minCells = 20,
  nNeighbors = 20,
  useDims = NULL,
  center = FALSE,
  maxSample = 1000,
  eps = 0.9,
  refineKNN = TRUE,
  clusterName = "quantileNorm_cluster",
  seed = 1,
  verbose = getOption("ligerVerbose", TRUE),
  ...
)

# S3 method for Seurat
quantileNorm(
  object,
  reduction = "inmf",
  quantiles = 50,
  reference = NULL,
  minCells = 20,
  nNeighbors = 20,
  useDims = NULL,
  center = FALSE,
  maxSample = 1000,
  eps = 0.9,
  refineKNN = TRUE,
  clusterName = "quantileNorm_cluster",
  seed = 1,
  verbose = getOption("ligerVerbose", TRUE),
  ...
)

Arguments

object: A liger or Seurat object with valid factorization result available (i.e. runIntegration performed in advance).
...: Arguments passed to other S3 methods of this function.
quantiles: Number of quantiles to use for quantile normalization. Default 50.
reference: Character, numeric or logical selection of one dataset, out of all available datasets in object, to use as a "reference" for quantile normalization. Default NULL tries to find an RNA dataset with the largest number of cells; if no RNA dataset available, use the globally largest dataset.
minCells: Minimum number of cells to consider a cluster shared across datasets. Default 20.
nNeighbors: Number of nearest neighbors for within-dataset knn graph. Default 20.
useDims: Indices of factors to use for shared nearest factor determination. Default NULL uses all factors.
center: Whether to center the data when scaling factors. Could be useful for less sparse modalities like methylation data. Default FALSE.
maxSample: Maximum number of cells used for quantile normalization of each cluster and factor. Default 1000.
eps: The error bound of the nearest neighbor search. Lower values give more accurate nearest neighbor graphs but take much longer to compute. Default 0.9.
refineKNN: whether to increase robustness of cluster assignments using KNN graph. Default TRUE.
clusterName: Variable name that will store the clustering result in metadata of a liger object or a Seurat object. Default "quantileNorm_cluster"
seed: Random seed to allow reproducible results. Default 1.
verbose: Logical. Whether to show information of the progress. Default getOption("ligerVerbose") or TRUE if users have not set.
reduction: Name of the reduction where LIGER integration result is stored. Default "inmf".

Value

Updated input object

liger method
- Update the H.norm slot for the alignment cell factor loading, ready for running graph based community detection clustering or dimensionality reduction for visualization.
- Update the cellMata slot with a cluster assignment basing on cell factor loading
Seurat method
- Update the reductions slot with a new DimReduc object containing the aligned cell factor loading.
- Update the metadata with a cluster assignment basing on cell factor loading

Examples

pbmc <- quantileNorm(pbmcPlot)
#> ℹ Using largest dataset of recommended type as reference: "ctrl" with 300 cells