Download CELLxGENE Datasets and Return SeuratObject.
Usage
ParseCELLxGENE(
meta = NULL,
file.ext = c("rds", "h5ad"),
out.folder = NULL,
timeout = 3600,
quiet = FALSE,
parallel = TRUE,
use.cores = NULL,
return.seu = FALSE,
merge = TRUE,
use.census = FALSE,
census.version = "stable",
organism = NULL,
obs.value.filter = NULL,
obs.keys = NULL,
include.genes = NULL,
obsm.layers = FALSE,
...
)
Arguments
- meta
Metadata used to download, can be from
ExtractCELLxGENEMeta
, should contain dataset_id, rds_id/h5ad_id (depend onfile.ext
) and name columns. Skip whenuse.census
is TRUE. Default: NULL.- file.ext
The valid file extension for download. When NULL, use "rds" and "h5ad". Default: c("rds", "h5ad").
- out.folder
The output folder. Default: NULL (current working directory).
- timeout
Maximum request time. Default: 3600.
- quiet
Logical value, whether to show downloading progress. Default: FALSE (show).
- parallel
Logical value, whether to download parallelly. Default: TRUE. When "libcurl" is available for
download.file
, the parallel is done by default (parallel
can be FALSE).- use.cores
The number of cores used. Default: NULL (the minimum value of extracted
length(download.urls)
andparallel::detectCores()
).- return.seu
Logical value, whether to load downloaded datasets to Seurat. Valid when rds in
file.ext
and all datasets download successfully. Default: FALSE.- merge
Logical value, whether to merge Seurat list when there are multiple rds files, used when
return.seu
is TRUE. Default: FALSE.- use.census
Logical value, whether to use CZ CELLxGENE Census to download and subset datasets. Default: FALSE.
- census.version
The version of the Census, e.g., "2024-05-13", or "latest" or "stable". Default: stable.
- organism
Organism, should be in lower case and replace space with '_'. Default: FALSE (human).
- obs.value.filter
Filter expression for cell's metadata, e.g., cell_type == 'B cell' & tissue_general == 'lung' & disease == 'COVID-19'. Default: NULL.
- obs.keys
Columns to fetch for the cell's metadata. e.g., c("cell_type", "tissue_general", "disease", "sex").
- include.genes
Genes to include, e.g,
include.genes
= c('ENSG00000161798', 'ENSG00000188229') same asvar_value_filter
= "feature_id Default: NULL.- obsm.layers
Names of arrays in obsm to add as the cell embeddings. e.g., c("scvi", "geneformer"). Default: FALSE (suppress loading in any dimensional reductions).
- ...
Parameters for
get_seurat
, used whenuse.census
is TRUE.
Value
Dataframe contains failed datasets, SeuratObject (return.seu
is TRUE, rds in file.ext
) or
NULL (return.seu
is FALSE or rds not in file.ext
).
Examples
if (FALSE) {
# all available datasets
all.cellxgene.datasets <- ShowCELLxGENEDatasets()
# human 10x v2 and v3 datasets
human.10x.cellxgene.meta <- ExtractCELLxGENEMeta(
all.samples.df = all.cellxgene.datasets,
assay = c("10x 3' v2", "10x 3' v3"),
organism = "Homo sapiens"
)
# download, need to provide the output folder
ParseCELLxGENE(meta = human.10x.cellxgene.meta, out.folder = "/path/to/output")
}