Create on-disk ligerDataset Object — createH5LigerDataset • rliger

For convenience, the default formatType = "10x" directly fits the structure of cellranger output. formatType = "anndata" works for current AnnData H5AD file specification (see Details). If a customized H5 file structure is presented, any of the rawData, indicesName, indptrName, genesName, barcodesName should be specified accordingly to override the formatType preset.

DO make a copy of the H5AD files because rliger functions write to the files and they will not be able to be read back to Python. This will be fixed in the future.

Usage

createH5LigerDataset(
  h5file,
  formatType = "10x",
  rawData = NULL,
  normData = NULL,
  scaleData = NULL,
  barcodesName = NULL,
  genesName = NULL,
  indicesName = NULL,
  indptrName = NULL,
  anndataX = "X",
  modal = c("default", "rna", "atac", "spatial", "meth"),
  featureMeta = NULL,
  ...
)

Arguments

h5file: Filename of an H5 file
formatType: Select preset of H5 file structure. Default "10X". Alternatively, we also support "anndata" for H5AD files.
rawData, indicesName, indptrName: The path in a H5 file for the raw sparse matrix data. These three types of data stands for the x, i, and p slots of a dgCMatrix-class object. Default NULL uses formatType preset.
normData: The path in a H5 file for the "x" vector of the normalized sparse matrix. Default NULL.
scaleData: The path in a H5 file for the Group that contains the sparse matrix constructing information for the scaled data. Default NULL.
genesName, barcodesName: The path in a H5 file for the gene names and cell barcodes. Default NULL uses formatType preset.
anndataX: The HDF5 path to the raw count data in an H5AD file. See Details. Default "X".
modal: Name of modality for this dataset. Currently options of "default", "rna", "atac", "spatial" and "meth" are supported. Default "default".
featureMeta: Data frame for feature metadata. Default NULL.
...: Additional slot data. See ligerDataset for detail. Given values will be directly placed at corresponding slots.

Value

H5-based ligerDataset object

Details

For H5AD file written from an AnnData object, we allow using formatType = "anndata" for the function to infer the proper structure. However, while a typical AnnData-based analysis tends to in-place update the adata.X attribute and there is no standard/forced convention for where the raw count data, as needed from LIGER, is stored. Therefore, we expose argument anndataX for specifying this information. The default value "X" looks for adata.X. If the raw data is stored in a layer, e.g. adata.layers['count'], then anndataX = "layers/count". If it is stored to adata.raw.X, then anndataX = "raw/X". If your AnnData object does not have the raw count retained, you will have to go back to the Python work flow to have it inserted at desired object space and re-write the H5AD file, or just go from upstream source files with which the AnnData was originally created.

Examples

h5Path <- system.file("extdata/ctrl.h5", package = "rliger")
tempPath <- tempfile(fileext = ".h5")
file.copy(from = h5Path, to = tempPath)
#> [1] TRUE
ld <- createH5LigerDataset(tempPath)
#> Found more than one class "H5D" in cache; using the first, from namespace 'rliger'
#> Also defined by ‘hdf5r’
#> Found more than one class "H5D" in cache; using the first, from namespace 'rliger'
#> Also defined by ‘hdf5r’
#> Found more than one class "H5D" in cache; using the first, from namespace 'rliger'
#> Also defined by ‘hdf5r’
#> Found more than one class "H5D" in cache; using the first, from namespace 'rliger'
#> Also defined by ‘hdf5r’