Skip to contents

Download CELLxGENE Datasets and Return SeuratObject.

Usage

ParseCELLxGENE(
  meta = NULL,
  file.ext = c("rds", "h5ad"),
  out.folder = NULL,
  timeout = 3600,
  quiet = FALSE,
  parallel = TRUE,
  use.cores = NULL,
  return.seu = FALSE,
  merge = TRUE,
  use.census = FALSE,
  census.version = "stable",
  organism = NULL,
  obs.value.filter = NULL,
  obs.keys = NULL,
  include.genes = NULL,
  obsm.layers = FALSE,
  ...
)

Arguments

meta

Metadata used to download, can be from ExtractCELLxGENEMeta, should contain dataset_id, rds_id/h5ad_id (depend on file.ext) and name columns. Skip when use.census is TRUE. Default: NULL.

file.ext

The valid file extension for download. When NULL, use "rds" and "h5ad". Default: c("rds", "h5ad").

out.folder

The output folder. Default: NULL (current working directory).

timeout

Maximum request time. Default: 3600.

quiet

Logical value, whether to show downloading progress. Default: FALSE (show).

parallel

Logical value, whether to download parallelly. Default: TRUE. When "libcurl" is available for download.file, the parallel is done by default (parallel can be FALSE).

use.cores

The number of cores used. Default: NULL (the minimum value of extracted length(download.urls) and parallel::detectCores()).

return.seu

Logical value, whether to load downloaded datasets to Seurat. Valid when rds in file.ext and all datasets download successfully. Default: FALSE.

merge

Logical value, whether to merge Seurat list when there are multiple rds files, used when return.seu is TRUE. Default: FALSE.

use.census

Logical value, whether to use CZ CELLxGENE Census to download and subset datasets. Default: FALSE.

census.version

The version of the Census, e.g., "2024-05-13", or "latest" or "stable". Default: stable.

organism

Organism, should be in lower case and replace space with '_'. Default: FALSE (human).

obs.value.filter

Filter expression for cell's metadata, e.g., cell_type == 'B cell' & tissue_general == 'lung' & disease == 'COVID-19'. Default: NULL.

obs.keys

Columns to fetch for the cell's metadata. e.g., c("cell_type", "tissue_general", "disease", "sex").

include.genes

Genes to include, e.g, include.genes = c('ENSG00000161798', 'ENSG00000188229') same as var_value_filter = "feature_id Default: NULL.

obsm.layers

Names of arrays in obsm to add as the cell embeddings. e.g., c("scvi", "geneformer"). Default: FALSE (suppress loading in any dimensional reductions).

...

Parameters for get_seurat, used when use.census is TRUE.

Value

Dataframe contains failed datasets, SeuratObject (return.seu is TRUE, rds in file.ext) or NULL (return.seu is FALSE or rds not in file.ext).

References

https://gist.github.com/ivirshup/f1a1603db69de3888eacb4bdb6a9317a

Examples

if (FALSE) {
# all available datasets
all.cellxgene.datasets <- ShowCELLxGENEDatasets()
# human 10x v2 and v3 datasets
human.10x.cellxgene.meta <- ExtractCELLxGENEMeta(
  all.samples.df = all.cellxgene.datasets,
  assay = c("10x 3' v2", "10x 3' v3"),
  organism = "Homo sapiens"
)
# download, need to provide the output folder
ParseCELLxGENE(meta = human.10x.cellxgene.meta, out.folder = "/path/to/output")
}