Introduction
scfetch
is designed to accelerate users download and prepare single-cell datasets from public resources. It can be used to:
-
Download fastq files from
GEO/SRA
, foramt fastq files to standard style that can be identified by 10x softwares (e.g. CellRanger). -
Download bam files from
GEO/SRA
, support downloading original 10x generated bam files (with custom tags) and normal bam files, and convert bam files to fastq files. - Download scRNA-seq matrix and annotation (e.g. cell type) information from
GEO
,PanglanDB
andUCSC Cell Browser
, load the downnloaded matrix toSeurat
. - Download processed objects from
Zeenodo
,CELLxGENE
andHuman Cell Atlas
. -
Formats conversion between widely used single cell objects (
SeuratObject
,AnnData
,SingleCellExperiment
,CellDataSet/cell_data_set
andloom
).
Installation
You can install the development version of scfetch
from GitHub with:
# install.packages("devtools")
devtools::install_github("showteeth/scfetch")
For data structures conversion, scfetch
requires several python pcakages, you can install with:
# install python packages
conda install -c bioconda loompy anndata
# or
pip install anndata loompy
Usage
Function list
Type | Function | Usage |
---|---|---|
Download and format fastq | ExtractRun | Extract runs with GEO accession number or GSM number |
DownloadSRA | Download sra files | |
SplitSRA | Split sra files to fastq files and format to 10x standard style | |
Download and convert bam | DownloadBam | Download bam (support 10x original bam) |
Bam2Fastq | Convert bam files to fastq files | |
Download matrix and load to Seurat | ExtractGEOMeta | Extract sample metadata from GEO |
ParseGEO | Download matrix from GEO and load to Seurat | |
ExtractPanglaoDBMeta | Extract sample metadata from PandlaoDB | |
ExtractPanglaoDBComposition | Extract cell type composition of PanglaoDB datasets | |
ParsePanglaoDB | Download matrix from PandlaoDB and load to Seurat | |
ShowCBDatasets | Show all available datasets in UCSC Cell Browser | |
ExtractCBDatasets | Extract UCSC Cell Browser datasets with attributes | |
ExtractCBComposition | Extract cell type composition of UCSC Cell Browser datasets | |
ParseCBDatasets | Download UCSC Cell Browser datasets and load to Seurat | |
Download objects | ExtractZenodoMeta | Extract sample metadata from Zenodo with DOIs |
ParseZenodo | Download rds/rdata/h5ad/loom from Zenodo with DOIs | |
ShowCELLxGENEDatasets | Show all available datasets in CELLxGENE | |
ExtractCELLxGENEMeta | Extract metadata of CELLxGENE datasets with attributes | |
ParseCELLxGENE | Download rds/h5ad from CELLxGENE | |
ShowHCAProjects | Show all available projects in Human Cell Atlas | |
ExtractHCAMeta | Extract metadata of Human Cell Atlas projects with attributes | |
ParseHCA | Download rds/rdata/h5/h5ad/loom from Human Cell Atlas | |
Convert between different single-cell objects | ExportSeurat | Convert SeuratObject to AnnData, SingleCellExperiment, CellDataSet/cell_data_set and loom |
ImportSeurat | Convert AnnData, SingleCellExperiment, CellDataSet/cell_data_set and loom to SeuratObject | |
SCEAnnData | Convert between SingleCellExperiment and AnnData | |
SCELoom | Convert between SingleCellExperiment and loom | |
Summarize datasets based on attributes | StatDBAttribute | Summarize datasets in PandlaoDB, UCSC Cell Browser and CELLxGENE based on attributes |
Contact
For any question, feature request or bug report please write an email to songyb0519@gmail.com.
Code of Conduct
Please note that the scfetch
project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.