Load in data from 10X — read10X • rliger

Enables easy loading of sparse data matrices provided by 10X genomics.

read10X works generally for 10X cellranger pipelines including: CellRanger < 3.0 & >= 3.0 and CellRanger-ARC.

read10XRNA invokes read10X and takes the "Gene Expression" out, so that the result can directly be used to construct a liger object. See Examples for demonstration.

read10XATAC works for both cellRanger-ARC and cellRanger-ATAC pipelines but needs user arguments for correct recognition. Similarly, the returned value can directly be used for constructing a liger object.

Usage

read10X(
  path,
  sampleNames = NULL,
  addPrefix = FALSE,
  useFiltered = NULL,
  reference = NULL,
  geneCol = 2,
  cellCol = 1,
  returnList = FALSE,
  verbose = getOption("ligerVerbose", TRUE),
  sample.dirs = path,
  sample.names = sampleNames,
  use.filtered = useFiltered,
  data.type = NULL,
  merge = NULL,
  num.cells = NULL,
  min.umis = NULL
)

read10XRNA(
  path,
  sampleNames = NULL,
  addPrefix = FALSE,
  useFiltered = NULL,
  reference = NULL,
  returnList = FALSE,
  ...
)

read10XATAC(
  path,
  sampleNames = NULL,
  addPrefix = FALSE,
  useFiltered = NULL,
  pipeline = c("atac", "arc"),
  arcFeatureType = "Peaks",
  returnList = FALSE,
  geneCol = 2,
  cellCol = 1,
  verbose = getOption("ligerVerbose", TRUE)
)

Arguments

path: (A.) A Directory containing the matrix.mtx, genes.tsv (or features.tsv), and barcodes.tsv files provided by 10X. A vector, a named vector, a list or a named list can be given in order to load several data directories. (B.) The 10X root directory where subdirectories of per-sample output folders can be found. Sample names will by default take the name of the vector, list or subfolders.
sampleNames: A vector of names to override the detected or set sample names for what is given to path. Default NULL. If no name detected at all and multiple samples are given, will name them by numbers.
addPrefix: Logical, whether to add sample names as a prefix to the barcodes. Default FALSE.
useFiltered: Logical, if path is given as case B, whether to use the filtered feature barcode matrix instead of raw (unfiltered). Default TRUE.
reference: In case of specifying a CellRanger<3 root folder to path, import the matrix from the output using which reference. Only needed when multiple references present. Default NULL.
geneCol: Specify which column of genes.tsv or features.tsv to use for gene names. Default 2.
cellCol: Specify which column of barcodes.tsv to use for cell names. Default 1.
returnList: Logical, whether to still return a structured list instead of a single matrix object, in the case where only one sample and only one feature type can be found. Otherwise will always return a list. Default FALSE.
verbose: Logical. Whether to show information of the progress. Default getOption("ligerVerbose") or TRUE if users have not set.
sample.dirs, sample.names, use.filtered: These arguments are renamed and will be deprecated in the future. Please see usage for corresponding arguments.
data.type, merge, num.cells, min.umis: These arguments are defuncted because the functionality can/should be fulfilled with other functions.
...: Arguments passed to read10X
pipeline: Which cellRanger pipeline type to find the ATAC data. Choose "atac" to read the peak matrix from cellranger-atac pipeline output folder(s), or "arc" to split the ATAC feature subset out from the multiomic cellranger-arc pipeline output folder(s). Default "atac".
arcFeatureType: When pipeline = "arc", which feature type is for the ATAC data of interests. Default "Peaks". Other possible feature types can be "Chromatin Accessibility". Error message will show available options if argument specification cannot be found.

Value

When only one sample is given or detected, and only one feature type is detected or using CellRanger < 3.0, and returnList = FALSE, a sparse matrix object (dgCMatrix class) will be returned.
When using read10XRNA or read10XATAC, which are modality specific, returns a list named by samples, and each element is the corresponding sparse matrix object (dgCMatrix class).
read10X generally returns a list named by samples. Each sample element will be another list named by feature types even if only one feature type is detected (or using CellRanger < 3.0) for data structure consistency. The feature type "Gene Expression" always comes as the first type if available.

Examples

if (FALSE) {
# For output from CellRanger < 3.0
dir <- 'path/to/data/directory'
list.files(dir) # Should show barcodes.tsv, genes.tsv, and matrix.mtx
mat <- read10X(dir)
class(mat) # Should show dgCMatrix

# For root directory from CellRanger < 3.0
dir <- 'path/to/root'
list.dirs(dir) # Should show sample names
matList <- read10X(dir)
names(matList) # Should show the sample names
class(matList[[1]][["Gene Expression"]]) # Should show dgCMatrix

# For output from CellRanger >= 3.0 with multiple data types
dir <- 'path/to/data/directory'
list.files(dir) # Should show barcodes.tsv.gz, features.tsv.gz, and matrix.mtx.gz
matList <- read10X(dir, sampleNames = "tissue1")
names(matList) # Shoud show "tissue1"
names(matList$tissue1) # Should show feature types, e.g. "Gene Expression" and etc.

# For root directory from CellRanger >= 3.0 with multiple data types
dir <- 'path/to/root'
list.dirs(dir) # Should show sample names, e.g. "rep1", "rep2", "rep3"
matList <- read10X(dir)
names(matList) # Should show the sample names: "rep1", "rep2", "rep3"
names(matList$rep1) # Should show the avalable feature types for rep1
}
if (FALSE) {
# For creating LIGER object from root directory of CellRanger >= 3.0
dir <- 'path/to/root'
list.dirs(dir) # Should show sample names, e.g. "rep1", "rep2", "rep3"
matList <- read10XRNA(dir)
names(matList) # Should show the sample names: "rep1", "rep2", "rep3"
sapply(matList, class) # Should show matrix class all are "dgCMatrix"
lig <- createLigerObject(matList)
}