General QC for liger object — runGeneralQC • rliger

Calculate number of UMIs, number of detected features and percentage of feature subset (e.g. mito, ribo and hemo) expression per cell.

Usage

runGeneralQC(
  object,
  organism,
  features = NULL,
  pattern = NULL,
  overwrite = FALSE,
  useDatasets = NULL,
  chunkSize = 1000,
  verbose = getOption("ligerVerbose", TRUE),
  mito = NULL,
  ribo = NULL,
  hemo = NULL
)

Arguments

object: liger object with rawData available in each ligerDataset embedded
organism: Specify the organism of the dataset to identify the mitochondrial, ribosomal and hemoglobin genes. Available options are "mouse", "human", "zebrafish", "rat" and "drosophila". Set NULL to disable mito, ribo and hemo calculation.
features: Feature names matching the feature subsets that users want to calculate the expression percentage with. A vector for a single subset, or a named list for multiple subset. Default NULL.
pattern: Regex patterns for matching the feature subsets that users want to calculate the expression percentage with. A vector for a single subset, or a named list for multiple subset. Default NULL.
overwrite: Whether to overwrite existing QC metric variables. Default FALSE do not update existing result. Use TRUE for updating all. Use a character vector to specify which to update. See Details.
useDatasets: A character vector of the names, a numeric or logical vector of the index of the datasets to be included for QC. Default NULL performs QC on all datasets.
chunkSize: Integer number of cells to include in a chunk when working on HDF5 based dataset. Default 1000
verbose: Logical. Whether to show information of the progress. Default getOption("ligerVerbose") or TRUE if users have not set.
mito, ribo, hemo: Now will always compute the percentages of mitochondrial, ribosomal and hemoglobin gene counts. These arguments will be ignored.

Value

Updated object with the cellMeta(object) updated as intended by users. See Details for more information.

Details

This function by default calculates:

nUMI - The column sum of the raw data matrix per cell. Represents the total number of UMIs per cell if given raw counts.
nGene - Number of detected features per cell
mito - Percentage of mitochondrial gene expression per cell
ribo - Percentage of ribosomal gene expression per cell
hemo - Percentage of hemoglobin gene expression per cell

Users can also specify their own feature subsets with argument features, or regular expression patterns that match to genes of interests with argument pattern, to calculate the expression percentage. If a character vector is given to features, a QC metric variable named "featureSubset_name" will be computed. If a named list of multiple subsets is given, the names will be used as the variable names. If a single pattern is given to pattern, a QC metric variable named "featureSubset_pattern" will be computed. If a named list of multiple patterns is given, the names will be used as the variable names. Duplicated QC metric names between these two arguments and the default five listed above should be avoided.

This function is automatically operated at the creation time of each liger object to capture the raw status. Argument overwrite is set to FALSE by default to avoid mistakenly updating existing metrics after filtering the object. Users can still opt to update all newly calculated metrics (including the default five) by setting overwrite = TRUE, or only some of newly calculated ones by providing a character vector of the names of the metrics to update. Intended overwriting only happens to datasets selected with useDatasets.

Examples

pbmc <- runGeneralQC(pbmc, "human", overwrite = TRUE)
#> ! No human mitochondrial gene found in the union of dataset "ctrl" and "stim"
#> ℹ calculating QC for dataset "ctrl"
#> ℹ Updated QC variables: "nUMI", "nGene", "mito", "ribo", and "hemo"
#> ℹ calculating QC for dataset "ctrl"

#> ✔ calculating QC for dataset "ctrl" ... done
#> 
#> ℹ calculating QC for dataset "stim"
#> ℹ Updated QC variables: "nUMI", "nGene", "mito", "ribo", and "hemo"
#> ℹ calculating QC for dataset "stim"

#> ✔ calculating QC for dataset "stim" ... done
#>