Pick top differentially presented features for similarity calculation
Source:R/selectTopFeatures.R
selectTopFeatures.Rd
Performs wilcoxon rank-sum test on input matrix. While clusterVar
and
vertices
together defines the groups of cells to be set as terminals
of the simplex, this function will test each of these groups against the rest
of the cells. The U-Statistics (statistic
), p-value (pval
) and
adjusted p-value (padj
), together with average presence in group
(avgExpr
), log fold-change (logFC
), AUC (auc
),
percentage in group (pct_in
) and percentage out of group
(pct_out
) will be calculated. Set returnStats = TRUE
to return
the full statistics table.
Top features are selected by sorting primarily on adjusted p-value, and secondarily on log fold-change, after filtering for up-regulated features.
Usage
selectTopFeatures(x, clusterVar, vertices, ...)
# S3 method for default
selectTopFeatures(
x,
clusterVar,
vertices,
nTop = 30,
processed = FALSE,
lfcThresh = 0.1,
returnStats = FALSE,
...
)
# S3 method for Seurat
selectTopFeatures(
x,
clusterVar = NULL,
vertices,
assay = NULL,
layer = "counts",
processed = FALSE,
...
)
# S3 method for SingleCellExperiment
selectTopFeatures(
x,
clusterVar = NULL,
vertices,
assay.type = "counts",
processed = FALSE,
...
)
Arguments
- x
Dense or sparse matrix, observation per column. Preferrably a raw count matrix. Alternatively, a
Seurat
object or aSingleCellExperiment
object.- clusterVar
A vector/factor assigning the cluster variable to each column of the matrix object. For "Seurat" method,
NULL
(default) forIdents(x)
, or a variable name inmeta.data
slot. For "SingleCellExperiment" method,NULL
(default) forcolLabels(x)
, or a variable name incolData
slot.- vertices
Vector of cluster names that will be used for plotting. Or a named list that groups clusters as a terminal vertex. There must not be any overlap between groups.
- ...
Arguments passed to methods.
- nTop
Number of top differentially presented features per terminal. Default
30
.- processed
Logical. Whether the input matrix is already processed.
TRUE
will bypass internal preprocessing and input matrix will be directly used for rank-sum calculation. DefaultFALSE
and raw count input is recommended.- lfcThresh
Threshold on log fold-change to identify up-regulated features. Default
0.1
.- returnStats
Logical. Whether to return the full statistics table rather then returning the selected genes. Default
FALSE
- assay
Assay name of the Seurat object to be used. Default
NULL
.- layer
For "Seurat" method, which layer of the assay to be used. Default
"counts"
.- assay.type
Assay name of the SingleCellExperiment object to be used. Default
"counts"
.
Value
When returnStats = FALSE
(default), a character vector of at
most length(unique(vertices))*nTop
feature names. When
returnStats = TRUE
, a data.frame of wilcoxon rank sum test statistics.
Examples
selectTopFeatures(rnaRaw, rnaCluster, c("OS", "RE"))
#> Selected 30 features for "OS".
#> Selected 30 features for "RE".
#> [1] "Insc" "Pacsin1" "Pcdh7" "Lipc"
#> [5] "Col1a2" "Col1a1" "Col22a1" "Cpz"
#> [9] "Slc36a2" "Cdo1" "Kcnk1" "Ano1"
#> [13] "Mlip" "Col13a1" "Robo2" "Cadm1"
#> [17] "Ifitm5" "Tmem119" "Serpinf1" "Sema3b"
#> [21] "Col24a1" "Shc2" "Kazald1" "Entpd3"
#> [25] "RP23-457J22.1" "Enpp6" "Creb3l3" "Wisp1"
#> [29] "Elmo1" "Mmp16" "Agt" "Rarres2"
#> [33] "Kng1" "Mgst1" "Cxcl14" "Plpp3"
#> [37] "Adipoq" "Kitl" "Gpm6b" "Wisp2"
#> [41] "Vcam1" "Serping1" "Lpl" "Vegfc"
#> [45] "Col4a2" "Col4a1" "Kng2" "Fbn1"
#> [49] "Pdgfrb" "Grem1" "Cxcl12" "Cyp1b1"
#> [53] "Dpep1" "Fbln5" "Lepr" "Igfbp5"
#> [57] "Hp" "Fstl1" "Esm1" "Tgfbr3"
# \donttest{
# Seurat example
library(Seurat)
srt <- CreateSeuratObject(rnaRaw)
#> Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Idents(srt) <- rnaCluster
gene <- selectTopFeatures(srt, vertices = c("OS", "RE"))
#> Selected 30 features for "OS".
#> Selected 30 features for "RE".
# }
# \donttest{
# SingleCellExperiment example
library(SingleCellExperiment)
sce <- SingleCellExperiment(assays = list(counts = rnaRaw))
colLabels(sce) <- rnaCluster
gene <- selectTopFeatures(sce, vertices = c("OS", "RE"))
#> Selected 30 features for "OS".
#> Selected 30 features for "RE".
# }