Calculate number of UMIs, number of detected features and percentage of feature subset (e.g. mito, ribo and hemo) expression per cell.
Usage
runGeneralQC(
object,
organism,
features = NULL,
pattern = NULL,
overwrite = FALSE,
useDatasets = NULL,
chunkSize = 1000,
verbose = getOption("ligerVerbose", TRUE),
mito = NULL,
ribo = NULL,
hemo = NULL
)
Arguments
- object
liger object with
rawData
available in each ligerDataset embedded- organism
Specify the organism of the dataset to identify the mitochondrial, ribosomal and hemoglobin genes. Available options are
"mouse"
,"human"
,"zebrafish"
,"rat"
and"drosophila"
. SetNULL
to disable mito, ribo and hemo calculation.- features
Feature names matching the feature subsets that users want to calculate the expression percentage with. A vector for a single subset, or a named list for multiple subset. Default
NULL
.- pattern
Regex patterns for matching the feature subsets that users want to calculate the expression percentage with. A vector for a single subset, or a named list for multiple subset. Default
NULL
.- overwrite
Whether to overwrite existing QC metric variables. Default
FALSE
do not update existing result. UseTRUE
for updating all. Use a character vector to specify which to update. See Details.- useDatasets
A character vector of the names, a numeric or logical vector of the index of the datasets to be included for QC. Default
NULL
performs QC on all datasets.- chunkSize
Integer number of cells to include in a chunk when working on HDF5 based dataset. Default
1000
- verbose
Logical. Whether to show information of the progress. Default
getOption("ligerVerbose")
orTRUE
if users have not set.- mito, ribo, hemo
Now will always compute the percentages of mitochondrial, ribosomal and hemoglobin gene counts. These arguments will be ignored.
Value
Updated object
with the cellMeta(object)
updated as
intended by users. See Details for more information.
Details
This function by default calculates:
nUMI
- The column sum of the raw data matrix per cell. Represents the total number of UMIs per cell if given raw counts.nGene
- Number of detected features per cellmito
- Percentage of mitochondrial gene expression per cellribo
- Percentage of ribosomal gene expression per cellhemo
- Percentage of hemoglobin gene expression per cell
Users can also specify their own feature subsets with argument
features
, or regular expression patterns that match to genes of
interests with argument pattern
, to calculate the expression
percentage. If a character vector is given to features
, a QC metric
variable named "featureSubset_name"
will be computed. If a named list
of multiple subsets is given, the names will be used as the variable names.
If a single pattern is given to pattern
, a QC metric variable named
"featureSubset_pattern"
will be computed. If a named list of multiple
patterns is given, the names will be used as the variable names.
Duplicated QC metric names between these two arguments and the default
five listed above should be avoided.
This function is automatically operated at the creation time of each
liger object to capture the raw status. Argument
overwrite
is set to FALSE by default to avoid mistakenly updating
existing metrics after filtering the object. Users can still opt to update
all newly calculated metrics (including the default five) by setting
overwrite = TRUE
, or only some of newly calculated ones by providing
a character vector of the names of the metrics to update. Intended
overwriting only happens to datasets selected with useDatasets
.
Examples
pbmc <- runGeneralQC(pbmc, "human", overwrite = TRUE)
#> ! No human mitochondrial gene found in the union of dataset "ctrl" and "stim"
#> ℹ calculating QC for dataset "ctrl"
#> ℹ Updated QC variables: "nUMI", "nGene", "mito", "ribo", and "hemo"
#> ℹ calculating QC for dataset "ctrl"
#> ✔ calculating QC for dataset "ctrl" ... done
#>
#> ℹ calculating QC for dataset "stim"
#> ℹ Updated QC variables: "nUMI", "nGene", "mito", "ribo", and "hemo"
#> ℹ calculating QC for dataset "stim"
#> ✔ calculating QC for dataset "stim" ... done
#>