Two methods are supported: "pseudoBulk"
and
"wilcoxon"
. Pseudo-bulk method aggregates cells basing on biological
replicates and calls bulk RNAseq DE methods, DESeq2 wald test, while
Wilcoxon rank sum test is performed on single-cell level.
runPairwiseDEG()
is generally used for flexibly comparing two specific
groups of cells, while runMarkerDEG()
is used for a one-vs-rest marker
test strategy.
While using pseudo-bulk method, it is generally recommended that you have these variables available in your object:
The cell type or cluster labeling. This can be obtained from prior study or computed with
runCluster
The biological replicate labeling, most of the time the
"dataset"
variable automatically generated when the liger object is created. Users may use other variables if a "dataset" is merged from multiple replicates.The condition labeling that reflects the study design, such as the treatment or disease status for each sample/dataset.
Please see below for detailed scenarios.
Usage
runPairwiseDEG(
object,
groupTest,
groupCtrl,
variable1 = NULL,
variable2 = NULL,
splitBy = NULL,
method = c("pseudoBulk", "wilcoxon"),
usePeak = FALSE,
useReplicate = "dataset",
nPsdRep = NULL,
minCellPerRep = 3,
printDiagnostic = FALSE,
chunk = NULL,
seed = 1,
verbose = getOption("ligerVerbose", TRUE)
)
runMarkerDEG(
object,
conditionBy = NULL,
splitBy = NULL,
method = c("pseudoBulk", "wilcoxon"),
useDatasets = NULL,
usePeak = FALSE,
useReplicate = "dataset",
nPsdRep = NULL,
minCellPerRep = 3,
printDiagnostic = FALSE,
chunk = NULL,
seed = 1,
verbose = getOption("ligerVerbose", TRUE)
)
runWilcoxon(
object,
data.use = NULL,
compare.method = c("clusters", "datasets")
)
Arguments
- object
A liger object, with normalized data available
- groupTest, groupCtrl, variable1, variable2
Condition specification. See
?runPairwiseDEG
section Pairwise DEG Scenarios for detail.- splitBy
Name(s) of the variable(s) in
cellMeta
to split the comparison. See Details. DefaultNULL
.- method
DEG test method to use. Choose from
"pseudoBulk"
or"wilcoxon"
. Default"pseudoBulk"
- usePeak
Logical. Whether to use peak count instead of gene count. Only supported when ATAC datasets are involved. Default
FALSE
.- useReplicate
cellMeta
variable of biological replicate annotation. Only used withmethod = "pseudoBulk"
. Default"dataset"
.- nPsdRep
Number of pseudo-replicates to create. Only used when
method = "pseudoBulk"
. DefaultNULL
. See Details.- minCellPerRep
Numeric, will not make pseudo-bulk for replicate with less than this number of cells. Default
3
.- printDiagnostic
Logical. Whether to show more detail when
verbose = TRUE
. DefaultFALSE
.- chunk
Number of features to process at a time during Wilcoxon test. Useful when memory is limited. Default
NULL
will process all features at once.- seed
Random seed to use for pseudo-replicate generation. Default
1
.- verbose
Logical. Whether to show information of the progress. Default
getOption("ligerVerbose")
orTRUE
if users have not set.- conditionBy
cellMeta
variable(s). Marker detection will be performed for each level of this variable. Multiple variables will be combined. DefaultNULL
uses default cluster.- useDatasets
Datasets to perform marker detection within. Default
NULL
will use all datasets.- data.use
Same as
useDatasets
.- compare.method
Choose from
"clusters"
(default) or"datasets"
."clusters"
compares each cluster against all other cells, while"datasets"
run within each cluster and compare each dataset against all other datasets.
Value
A data.frame with DEG information with the all or some of the following fields:
- feature
Gene names
- group
Test group name. Multiple tests might be present for each function call. This is the main variable to distinguish the tests. For a pairwise test, a row with a certain group name represents the test result between the this group against the other control group; When split by a variable, it would be presented in "split.group" format, meaning the stats is by comparing the group in the split level against the control group in the same split level. When running marker detection without splitting, a row with group "a" represents the stats of the gene in group "a" against all other cells. When running split marker detection, the group name would be in "split.group" format, meaning the stats is by comparing the group in the split level against all other cells in the same split level.
- logFC
Log fold change
- pval
P-value
- padj
Adjusted p-value
- avgExpr
Mean expression in the test group indicated by the "group" field. Only available for wilcoxon tests.
- statistic
Wilcoxon rank-sum test statistic. Only available for wilcoxon tests.
- auc
Area under the ROC curve. Only available for wilcoxon tests.
- pct_in
Percentage of cells in the test group, indicated by the "group" field, that express the feature. Only available for wilcoxon tests.
- pct_out
Percentage of cells in the control group or other cells, as explained for the "group" field, that express the feature. Only available for wilcoxon tests.
Using Wilcoxon rank-sum test
Wilcoxon rank-sum test works for each gene and is based on the rank of the expression in each cell. LIGER provides dataset integration but does not "correct" the expression values. Projects with strong batch effects or integrate drastically different modalities should be cautious when using this method.
Comparing difference between/across cell types
Most of times, people would want to know what cell types are for each cluster
after clustering. This can be done with a marker detection method that test
each cluster against all the other cells. This can be done with a command
like runMarkerDEG(object, conditionBy = "cluster_var")
. When using
default pseudo-bulk method, users should additionaly determine the
pseudo-bulk setup parameters. If the real biological replicate variable is
available, it should be supplied to argument useReplicate
, otherwise,
pseudo-replicates should be created. See "Pseudo-Replicate" section for more.
Compare between conditions
It is frequently needed to identify the difference between conditions. Users
can simply set conditionBy = "condition_var"
. However, most of time,
such comparisons should be ideally done in a per-cluster manner. This can be
done by setting splitBy = "cluster_var"
. This will run a loop for each
cluster, and within the group of cells, compare each condition against all
other cells in the cluster.
In the scenario when users only need to compare two conditions for each
cluster, running runPairwiseDEG(object, groupTest = "condition1",
groupCtrl = "condition2", variable1 = "condition_var",
splitBy = "cluster_var")
would address the need.
For both use case, if pseudo-bulk (default) method is used, users should determine the pseudo-bulk setup parameters as mentioned in the previous section.
Detailed runMarkerDEG
usage
Marker detection is performed in a one vs. rest manner. The grouping of such
condition is specified by conditionBy
, which should be a column name
in cellMeta
. When splitBy
is specified as another variable
name in cellMeta
, the marker detection will be iteratively done for
within each level of splitBy
variable.
For example, when conditionBy = "celltype"
and splitBy = NULL
,
marker detection will be performed by comparing all cells of "celltype_i"
against all other cells, and etc. This is analogous to the old version when
running runWilcoxon(method = "cluster")
.
When conditionBy = "gender"
and splitBy = "leiden_cluster"
,
marker detection will be performed by comparing "gender_i" cells from "cluster_j"
against other cells from "cluster_j", and etc. This is analogous to the old
version when running runWilcoxon(method = "dataset")
.
Detailed runPairwiseDEG
usage
Users can select classes of cells from a variable in cellMeta
.
variable1
and variable2
are used to specify a column in
cellMeta
, and groupTest
and groupCtrl
are used to specify
existing classes from variable1
and variable2
, respectively.
When variable2
is missing, groupCtrl
will be considered from
variable1
.
For example, when variable1 = "celltype"
and variable2 = NULL
,
groupTest
and groupCtrl
should be valid cell types in
object$celltype
.
When variable1
is "celltype" and variable2
is "gender",
groupTest
should be a valid cell type from object$celltype
and
groupCtrl
should be a valid class from object$gender
.
When both variable1
and variable2
are missing, groupTest
and groupCtrl
should be valid index of cells in object
.
Pseudo-Replicate
Pseudo-replicate assignment is a technique to complement the lack of real
biological replicates when using pseudo-bulk DE methods. LIGER's pseudo-bulk
method generally requires that each comparison group has at least 3
replicates each composed of at least 3 cells, in order to ensure the
statistic power. When less than 3 real replicates are found for a comparison,
the default setting (nPsdRep = NULL
) splits each into 3
pseudo-replicates, otherwise no pseudo-replicates are automatically
generated. When nPsdRep
is given a number, LIGER will always go
through each comparison group and split each real replicate into the given
number of pseudo-replicates.
Examples
# \donttest{
pbmc$leiden_cluster <- pbmcPlot$leiden_cluster
# Identify cluster markers
degStats1 <- runMarkerDEG(pbmc, conditionBy = "leiden_cluster")
#> DESeq2 Wald test ■■■■■■■■■ 25% | ETA: 4s
#> DESeq2 Wald test ■■■■■■■■■■■■■■■■■■■■■■■■■■■ 88% | ETA: 1s
#> ! Ignoring replicates (size in bracket) with too few cells: "7.ctrl.rep1 (2)"
#> DESeq2 Wald test ■■■■■■■■■■■■■■■■■■■■■■■■■■■ 88% | ETA: 1s
#> ℹ Consider decrease minCellPerRep to exclude less replicates or/and lower nPsdRep to generate larger pseudo-replicates.
#> DESeq2 Wald test ■■■■■■■■■■■■■■■■■■■■■■■■■■■ 88% | ETA: 1s
#> DESeq2 Wald test ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 100% | ETA: 0s
# Compare "stim" data against "ctrl" data within each cluster
degStats3 <- runPairwiseDEG(pbmc, groupTest = "stim", groupCtrl = "ctrl",
variable1 = "dataset",
splitBy = "leiden_cluster",
minCellPerRep = 4)
#> ℹ Running DEG within: "0"
#> ℹ Calling pairwise DESeq2 Wald test
#> ✔ Calling pairwise DESeq2 Wald test ... done
#>
#> ℹ Running DEG within: "1"
#> ℹ Calling pairwise DESeq2 Wald test
#> -- note: fitType='parametric', but the dispersion trend was not well captured by the
#> function: y = a/x + b, and a local regression fit was automatically substituted.
#> specify fitType='local' or 'mean' to avoid this message next time.
#> ✔ Calling pairwise DESeq2 Wald test ... done
#>
#> ℹ Running DEG within: "2"
#> ℹ Calling pairwise DESeq2 Wald test
#> ✔ Calling pairwise DESeq2 Wald test ... done
#>
#> ℹ Running DEG within: "3"
#> ℹ Calling pairwise DESeq2 Wald test
#> ✔ Calling pairwise DESeq2 Wald test ... done
#>
#> ℹ Running DEG within: "4"
#> ℹ Calling pairwise DESeq2 Wald test
#> ✔ Calling pairwise DESeq2 Wald test ... done
#>
#> ℹ Running DEG within: "5"
#> ℹ Calling pairwise DESeq2 Wald test
#> ✔ Calling pairwise DESeq2 Wald test ... done
#>
#> ℹ Running DEG within: "6"
#> ℹ Calling pairwise DESeq2 Wald test
#> ✔ Calling pairwise DESeq2 Wald test ... done
#>
#> ℹ Running DEG within: "7"
#> ℹ Calling pairwise DESeq2 Wald test
#> ! Ignoring replicates (size in bracket) with too few cells: "others.ctrl.rep1 (2)", "others.ctrl.rep2 (3)", and "others.ctrl.rep3 (3)"
#> ℹ Calling pairwise DESeq2 Wald test
#> ℹ Consider decrease minCellPerRep to exclude less replicates or/and lower nPsdRep to generate larger pseudo-replicates.
#> ℹ Calling pairwise DESeq2 Wald test
#> ✖ Error when computing on "7.stim": Too few replicates with more than 4 cells (`minCellPerRep`) for condition "others".
#> ℹ Calling pairwise DESeq2 Wald test
#> ! Empty result returned for this test.
#> ℹ Calling pairwise DESeq2 Wald test
#> ✖ Calling pairwise DESeq2 Wald test ... failed
#>
# }