Pick top differentially presented features for similarity calculation

Performs wilcoxon rank-sum test on input matrix. While clusterVar and vertices together defines the groups of cells to be set as terminals of the simplex, this function will test each of these groups against the rest of the cells. The U-Statistics (statistic), p-value (pval) and adjusted p-value (padj), together with average presence in group (avgExpr), log fold-change (logFC), AUC (auc), percentage in group (pct_in) and percentage out of group (pct_out) will be calculated. Set returnStats = TRUE to return the full statistics table.

Top features are selected by sorting primarily on adjusted p-value, and secondarily on log fold-change, after filtering for up-regulated features.

Usage

selectTopFeatures(x, clusterVar, vertices, ...)

# Default S3 method
selectTopFeatures(
  x,
  clusterVar,
  vertices,
  nTop = 30,
  processed = FALSE,
  lfcThresh = 0.1,
  returnStats = FALSE,
  ...
)

# S3 method for class 'Seurat'
selectTopFeatures(
  x,
  clusterVar = NULL,
  vertices,
  assay = NULL,
  layer = "counts",
  processed = FALSE,
  ...
)

# S3 method for class 'SingleCellExperiment'
selectTopFeatures(
  x,
  clusterVar = NULL,
  vertices,
  assay.type = "counts",
  processed = FALSE,
  ...
)

Arguments

x: Dense or sparse matrix, observation per column. Preferrably a raw count matrix. Alternatively, a Seurat object or a SingleCellExperiment object.
clusterVar: A vector/factor assigning the cluster variable to each column of the matrix object. For "Seurat" method, NULL (default) for Idents(x), or a variable name in meta.data slot. For "SingleCellExperiment" method, NULL (default) for colLabels(x), or a variable name in colData slot.
vertices: Vector of cluster names that will be used for plotting. Or a named list that groups clusters as a terminal vertex. There must not be any overlap between groups.
...: Arguments passed to methods.
nTop: Number of top differentially presented features per terminal. Default 30.
processed: Logical. Whether the input matrix is already processed. TRUE will bypass internal preprocessing and input matrix will be directly used for rank-sum calculation. Default FALSE and raw count input is recommended.
lfcThresh: Threshold on log fold-change to identify up-regulated features. Default 0.1.
returnStats: Logical. Whether to return the full statistics table rather then returning the selected genes. Default FALSE
assay: Assay name of the Seurat object to be used. Default NULL.
layer: For "Seurat" method, which layer of the assay to be used. Default "counts".
assay.type: Assay name of the SingleCellExperiment object to be used. Default "counts".

Value

When returnStats = FALSE (default), a character vector of at most length(unique(vertices))*nTop feature names. When returnStats = TRUE, a data.frame of wilcoxon rank sum test statistics.

Examples

selectTopFeatures(rnaRaw, rnaCluster, c("OS", "RE"))
#> Selected 30 features for "OS".
#> Selected 30 features for "RE".
#>  [1] "Insc"          "Pacsin1"       "Pcdh7"         "Lipc"         
#>  [5] "Col1a2"        "Col1a1"        "Col22a1"       "Cpz"          
#>  [9] "Slc36a2"       "Cdo1"          "Kcnk1"         "Ano1"         
#> [13] "Mlip"          "Col13a1"       "Robo2"         "Cadm1"        
#> [17] "Ifitm5"        "Tmem119"       "Serpinf1"      "Sema3b"       
#> [21] "Col24a1"       "Shc2"          "Kazald1"       "Entpd3"       
#> [25] "RP23-457J22.1" "Enpp6"         "Creb3l3"       "Wisp1"        
#> [29] "Elmo1"         "Mmp16"         "Agt"           "Rarres2"      
#> [33] "Kng1"          "Mgst1"         "Cxcl14"        "Plpp3"        
#> [37] "Adipoq"        "Kitl"          "Gpm6b"         "Wisp2"        
#> [41] "Vcam1"         "Serping1"      "Lpl"           "Vegfc"        
#> [45] "Col4a2"        "Col4a1"        "Kng2"          "Fbn1"         
#> [49] "Pdgfrb"        "Grem1"         "Cxcl12"        "Cyp1b1"       
#> [53] "Dpep1"         "Fbln5"         "Lepr"          "Igfbp5"       
#> [57] "Hp"            "Fstl1"         "Esm1"          "Tgfbr3"       
# \donttest{
# Seurat example
library(Seurat)
srt <- CreateSeuratObject(rnaRaw)
#> Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Idents(srt) <- rnaCluster
gene <- selectTopFeatures(srt, vertices = c("OS", "RE"))
#> Selected 30 features for "OS".
#> Selected 30 features for "RE".
# }
# \donttest{
# SingleCellExperiment example
library(SingleCellExperiment)
sce <- SingleCellExperiment(assays = list(counts = rnaRaw))
colLabels(sce) <- rnaCluster
gene <- selectTopFeatures(sce, vertices = c("OS", "RE"))
#> Selected 30 features for "OS".
#> Selected 30 features for "RE".
# }