DownloadMatrices

Introduction

scfetch provides functions for users to download count matrices and annotations (e.g. cell type annotation and composition) from GEO and some single-cell databases (e.g. PanglaoDB and UCSC Cell Browser). scfetch also supports loading the downloaded data to Seurat.

Until now, the public resources supported and the returned results:

Resources	URL	Download Type	Returned results
GEO	https://www.ncbi.nlm.nih.gov/geo/	count matrix	SeuratObject or count matrix for bulk RNA-seq
PanglaoDB	https://panglaodb.se/index.html	count matrix	SeuratObject
UCSC Cell Browser	https://cells.ucsc.edu/	count matrix	SeuratObject

GEO

GEO is an international public repository that archives and freely distributes microarray, next-generation sequencing, and other forms of high-throughput functional genomics data submitted by the research community. It provides a very convenient way for users to explore and select interested scRNA-seq datasets.

Extract metadata

scfetch provides ExtractGEOMeta to extract sample metadata, including sample title, source name/tissue, description, cell type, treatment, paper title, paper abstract, organism, protocol, data processing methods, et al.

# library
library(scfetch)

## Setting options('download.file.method.GEOquery'='auto')

## Setting options('GEOquery.inmemory.gpl'=FALSE)

## Registered S3 method overwritten by 'SeuratDisk':
##   method            from  
##   as.sparse.H5Group Seurat

# extract metadata of specified platform
GSE200257.meta <- ExtractGEOMeta(acce = "GSE200257", platform = "GPL24676")

## Found 1 file(s)

## GSE200257_series_matrix.txt.gz

## Rows: 0 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (9): ID_REF, GSM6025648, GSM6025649, GSM6025650, GSM6025651, GSM6025652,...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## File stored at: 
## 
## /var/folders/_4/k4qmvf7s2gx_6789px8n_sxh0000gn/T//RtmpeDiEtS/GPL24676.soft

# set VROOM_CONNECTION_SIZE to avoid error: Error: The size of the connection buffer (786432) was not large enough
Sys.setenv("VROOM_CONNECTION_SIZE"=131072*60)
# extract metadata of all platforms
GSE94820.meta <- ExtractGEOMeta(acce = "GSE94820", platform = NULL)

## Found 2 file(s)
## GSE94820-GPL15520_series_matrix.txt.gz
## Rows: 0 Columns: 651── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (651): ID_REF, GSM2485115, GSM2485116, GSM2485117, GSM2485118, GSM248511...
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.File stored at: 
## /var/folders/_4/k4qmvf7s2gx_6789px8n_sxh0000gn/T//RtmpeDiEtS/GPL15520.soft
## GSE94820-GPL16791_series_matrix.txt.gz
## Rows: 0 Columns: 1735── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (1735): ID_REF, GSM2483594, GSM2483595, GSM2483596, GSM2483597, GSM24835...
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.File stored at: 
## /var/folders/_4/k4qmvf7s2gx_6789px8n_sxh0000gn/T//RtmpeDiEtS/GPL16791.soft

head(GSE94820.meta)

##   Platform                  title geo_accession source_name_ch1
## 1 GPL15520  AXLSIGLEC6_Bin1_S1_S5    GSM2485115            PBMC
## 2 GPL15520 AXLSIGLEC6_Bin1_S10_S2    GSM2485116            PBMC
## 3 GPL15520 AXLSIGLEC6_Bin1_S11_S3    GSM2485117            PBMC
## 4 GPL15520 AXLSIGLEC6_Bin1_S12_S4    GSM2485118            PBMC
## 5 GPL15520 AXLSIGLEC6_Bin1_S13_S5    GSM2485119            PBMC
## 6 GPL15520 AXLSIGLEC6_Bin1_S14_S6    GSM2485120            PBMC
##               description sorted_gate_identity
## 1 paired-end RNA-seq data      AXLSIGLEC6_Bin1
## 2 paired-end RNA-seq data      AXLSIGLEC6_Bin1
## 3 paired-end RNA-seq data      AXLSIGLEC6_Bin1
## 4 paired-end RNA-seq data      AXLSIGLEC6_Bin1
## 5 paired-end RNA-seq data      AXLSIGLEC6_Bin1
## 6 paired-end RNA-seq data      AXLSIGLEC6_Bin1
##                                                                                             Title
## 1 Single-cell RNA-seq reveals new types of human blood dendritic cells, monocytes and progenitors
## 2 Single-cell RNA-seq reveals new types of human blood dendritic cells, monocytes and progenitors
## 3 Single-cell RNA-seq reveals new types of human blood dendritic cells, monocytes and progenitors
## 4 Single-cell RNA-seq reveals new types of human blood dendritic cells, monocytes and progenitors
## 5 Single-cell RNA-seq reveals new types of human blood dendritic cells, monocytes and progenitors
## 6 Single-cell RNA-seq reveals new types of human blood dendritic cells, monocytes and progenitors
##                                                 Type     Organism
## 1 Expression profiling by high throughput sequencing Homo sapiens
## 2 Expression profiling by high throughput sequencing Homo sapiens
## 3 Expression profiling by high throughput sequencing Homo sapiens
## 4 Expression profiling by high throughput sequencing Homo sapiens
## 5 Expression profiling by high throughput sequencing Homo sapiens
## 6 Expression profiling by high throughput sequencing Homo sapiens
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     Abstract
## 1 Peripheral blood mononuclear cells (PBMCs) were isolated from fresh blood using Ficoll-Paque density gradient centrifugation. Single-cell suspensions were stained with different antibody cocktails designed to enrich for particular immune cell populations, which were then single cell sorted in 96-well plates.  Single cell RNA-sequencing libraries were subsequently generated for 2422 single cells using Smart-Seq2 (Picelli et al., Nature Methods, 2014). Cells were sequenced at a depth of 1-2M reads/cell.
## 2 Peripheral blood mononuclear cells (PBMCs) were isolated from fresh blood using Ficoll-Paque density gradient centrifugation. Single-cell suspensions were stained with different antibody cocktails designed to enrich for particular immune cell populations, which were then single cell sorted in 96-well plates.  Single cell RNA-sequencing libraries were subsequently generated for 2422 single cells using Smart-Seq2 (Picelli et al., Nature Methods, 2014). Cells were sequenced at a depth of 1-2M reads/cell.
## 3 Peripheral blood mononuclear cells (PBMCs) were isolated from fresh blood using Ficoll-Paque density gradient centrifugation. Single-cell suspensions were stained with different antibody cocktails designed to enrich for particular immune cell populations, which were then single cell sorted in 96-well plates.  Single cell RNA-sequencing libraries were subsequently generated for 2422 single cells using Smart-Seq2 (Picelli et al., Nature Methods, 2014). Cells were sequenced at a depth of 1-2M reads/cell.
## 4 Peripheral blood mononuclear cells (PBMCs) were isolated from fresh blood using Ficoll-Paque density gradient centrifugation. Single-cell suspensions were stained with different antibody cocktails designed to enrich for particular immune cell populations, which were then single cell sorted in 96-well plates.  Single cell RNA-sequencing libraries were subsequently generated for 2422 single cells using Smart-Seq2 (Picelli et al., Nature Methods, 2014). Cells were sequenced at a depth of 1-2M reads/cell.
## 5 Peripheral blood mononuclear cells (PBMCs) were isolated from fresh blood using Ficoll-Paque density gradient centrifugation. Single-cell suspensions were stained with different antibody cocktails designed to enrich for particular immune cell populations, which were then single cell sorted in 96-well plates.  Single cell RNA-sequencing libraries were subsequently generated for 2422 single cells using Smart-Seq2 (Picelli et al., Nature Methods, 2014). Cells were sequenced at a depth of 1-2M reads/cell.
## 6 Peripheral blood mononuclear cells (PBMCs) were isolated from fresh blood using Ficoll-Paque density gradient centrifugation. Single-cell suspensions were stained with different antibody cocktails designed to enrich for particular immune cell populations, which were then single cell sorted in 96-well plates.  Single cell RNA-sequencing libraries were subsequently generated for 2422 single cells using Smart-Seq2 (Picelli et al., Nature Methods, 2014). Cells were sequenced at a depth of 1-2M reads/cell.
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 Design
## 1 The study was divided into two stages: (1) an exploratory phase, where 1140 single human blood dendritic cells and monocytes were profiled and 12 population samples; (2) a deep characterization phase, where an additional 1261 single cells and 9 population samples were profiled as part of follow-up studies.  A total of 2422 single cell and population samples were processed using Smart-Seq2 protocol (Picelli et al., Nature Methods, 2014), which allows for the generation of full-length single cell cDNA, and sequencing libraries were generated using Illumina Nextera XT DNA library preparation kit.\n\nPlease note that [1] the raw data have been submitted to dbGaP (which has controlled access mechanisms at: http://www.ncbi.nlm.nih.gov/gap; phs001294.v1.p1) due to potential privacy concerns. Please contact the submitter or dbGaP to request access to controlled-access datasets.\n\n[2] a few samples (38) were profiled but excluded from the processed data file since they were either a bulk sample (21) or excluded due to QC (17). Therefore, there are 1,140 and 1,244 data columns in two processed data files respectively, corresponding to total 2384 samples described in the records.
## 2 The study was divided into two stages: (1) an exploratory phase, where 1140 single human blood dendritic cells and monocytes were profiled and 12 population samples; (2) a deep characterization phase, where an additional 1261 single cells and 9 population samples were profiled as part of follow-up studies.  A total of 2422 single cell and population samples were processed using Smart-Seq2 protocol (Picelli et al., Nature Methods, 2014), which allows for the generation of full-length single cell cDNA, and sequencing libraries were generated using Illumina Nextera XT DNA library preparation kit.\n\nPlease note that [1] the raw data have been submitted to dbGaP (which has controlled access mechanisms at: http://www.ncbi.nlm.nih.gov/gap; phs001294.v1.p1) due to potential privacy concerns. Please contact the submitter or dbGaP to request access to controlled-access datasets.\n\n[2] a few samples (38) were profiled but excluded from the processed data file since they were either a bulk sample (21) or excluded due to QC (17). Therefore, there are 1,140 and 1,244 data columns in two processed data files respectively, corresponding to total 2384 samples described in the records.
## 3 The study was divided into two stages: (1) an exploratory phase, where 1140 single human blood dendritic cells and monocytes were profiled and 12 population samples; (2) a deep characterization phase, where an additional 1261 single cells and 9 population samples were profiled as part of follow-up studies.  A total of 2422 single cell and population samples were processed using Smart-Seq2 protocol (Picelli et al., Nature Methods, 2014), which allows for the generation of full-length single cell cDNA, and sequencing libraries were generated using Illumina Nextera XT DNA library preparation kit.\n\nPlease note that [1] the raw data have been submitted to dbGaP (which has controlled access mechanisms at: http://www.ncbi.nlm.nih.gov/gap; phs001294.v1.p1) due to potential privacy concerns. Please contact the submitter or dbGaP to request access to controlled-access datasets.\n\n[2] a few samples (38) were profiled but excluded from the processed data file since they were either a bulk sample (21) or excluded due to QC (17). Therefore, there are 1,140 and 1,244 data columns in two processed data files respectively, corresponding to total 2384 samples described in the records.
## 4 The study was divided into two stages: (1) an exploratory phase, where 1140 single human blood dendritic cells and monocytes were profiled and 12 population samples; (2) a deep characterization phase, where an additional 1261 single cells and 9 population samples were profiled as part of follow-up studies.  A total of 2422 single cell and population samples were processed using Smart-Seq2 protocol (Picelli et al., Nature Methods, 2014), which allows for the generation of full-length single cell cDNA, and sequencing libraries were generated using Illumina Nextera XT DNA library preparation kit.\n\nPlease note that [1] the raw data have been submitted to dbGaP (which has controlled access mechanisms at: http://www.ncbi.nlm.nih.gov/gap; phs001294.v1.p1) due to potential privacy concerns. Please contact the submitter or dbGaP to request access to controlled-access datasets.\n\n[2] a few samples (38) were profiled but excluded from the processed data file since they were either a bulk sample (21) or excluded due to QC (17). Therefore, there are 1,140 and 1,244 data columns in two processed data files respectively, corresponding to total 2384 samples described in the records.
## 5 The study was divided into two stages: (1) an exploratory phase, where 1140 single human blood dendritic cells and monocytes were profiled and 12 population samples; (2) a deep characterization phase, where an additional 1261 single cells and 9 population samples were profiled as part of follow-up studies.  A total of 2422 single cell and population samples were processed using Smart-Seq2 protocol (Picelli et al., Nature Methods, 2014), which allows for the generation of full-length single cell cDNA, and sequencing libraries were generated using Illumina Nextera XT DNA library preparation kit.\n\nPlease note that [1] the raw data have been submitted to dbGaP (which has controlled access mechanisms at: http://www.ncbi.nlm.nih.gov/gap; phs001294.v1.p1) due to potential privacy concerns. Please contact the submitter or dbGaP to request access to controlled-access datasets.\n\n[2] a few samples (38) were profiled but excluded from the processed data file since they were either a bulk sample (21) or excluded due to QC (17). Therefore, there are 1,140 and 1,244 data columns in two processed data files respectively, corresponding to total 2384 samples described in the records.
## 6 The study was divided into two stages: (1) an exploratory phase, where 1140 single human blood dendritic cells and monocytes were profiled and 12 population samples; (2) a deep characterization phase, where an additional 1261 single cells and 9 population samples were profiled as part of follow-up studies.  A total of 2422 single cell and population samples were processed using Smart-Seq2 protocol (Picelli et al., Nature Methods, 2014), which allows for the generation of full-length single cell cDNA, and sequencing libraries were generated using Illumina Nextera XT DNA library preparation kit.\n\nPlease note that [1] the raw data have been submitted to dbGaP (which has controlled access mechanisms at: http://www.ncbi.nlm.nih.gov/gap; phs001294.v1.p1) due to potential privacy concerns. Please contact the submitter or dbGaP to request access to controlled-access datasets.\n\n[2] a few samples (38) were profiled but excluded from the processed data file since they were either a bulk sample (21) or excluded due to QC (17). Therefore, there are 1,140 and 1,244 data columns in two processed data files respectively, corresponding to total 2384 samples described in the records.
##   SampleCount  Molecule
## 1         650 polyA RNA
## 2         650 polyA RNA
## 3         650 polyA RNA
## 4         650 polyA RNA
## 5         650 polyA RNA
## 6         650 polyA RNA
##                                                                                                                                                                                                                                                                ExtractProtocol
## 1 Enriched immune cell fractions isolated from healthy blood PBMCs were FACS sorted in 96-well plates (single cell sorted) containing lysis buffer (TCL together with 1% of 2-Mercaptoethanol).. Smart-seq2 (Picelli et al., Nature Methods, 2014). Full-length RNA-sequencing
## 2 Enriched immune cell fractions isolated from healthy blood PBMCs were FACS sorted in 96-well plates (single cell sorted) containing lysis buffer (TCL together with 1% of 2-Mercaptoethanol).. Smart-seq2 (Picelli et al., Nature Methods, 2014). Full-length RNA-sequencing
## 3 Enriched immune cell fractions isolated from healthy blood PBMCs were FACS sorted in 96-well plates (single cell sorted) containing lysis buffer (TCL together with 1% of 2-Mercaptoethanol).. Smart-seq2 (Picelli et al., Nature Methods, 2014). Full-length RNA-sequencing
## 4 Enriched immune cell fractions isolated from healthy blood PBMCs were FACS sorted in 96-well plates (single cell sorted) containing lysis buffer (TCL together with 1% of 2-Mercaptoethanol).. Smart-seq2 (Picelli et al., Nature Methods, 2014). Full-length RNA-sequencing
## 5 Enriched immune cell fractions isolated from healthy blood PBMCs were FACS sorted in 96-well plates (single cell sorted) containing lysis buffer (TCL together with 1% of 2-Mercaptoethanol).. Smart-seq2 (Picelli et al., Nature Methods, 2014). Full-length RNA-sequencing
## 6 Enriched immune cell fractions isolated from healthy blood PBMCs were FACS sorted in 96-well plates (single cell sorted) containing lysis buffer (TCL together with 1% of 2-Mercaptoethanol).. Smart-seq2 (Picelli et al., Nature Methods, 2014). Full-length RNA-sequencing
##   LibraryStrategy
## 1         RNA-Seq
## 2         RNA-Seq
## 3         RNA-Seq
## 4         RNA-Seq
## 5         RNA-Seq
## 6         RNA-Seq
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  DataProcessing
## 1 Read were aligned to the UCSC hg19 transcriptomee using Bowtie v0.12.7. Expression levels were quantified using RSEM v1.2.1 (TPM values). UCSC genome table browser was used to map UCSC gene ID (kgID) and the gene name (geneSymbol) for all genes in hg19. If multiple UCSC gene IDs are assigned to the same geneSymbol, the TPM values of all kgIDs that share the same geneSymbol are summed. Genome_build: hg19. Supplementary_files_format_and_content: Text file with tab delimiters
## 2 Read were aligned to the UCSC hg19 transcriptomee using Bowtie v0.12.7. Expression levels were quantified using RSEM v1.2.1 (TPM values). UCSC genome table browser was used to map UCSC gene ID (kgID) and the gene name (geneSymbol) for all genes in hg19. If multiple UCSC gene IDs are assigned to the same geneSymbol, the TPM values of all kgIDs that share the same geneSymbol are summed. Genome_build: hg19. Supplementary_files_format_and_content: Text file with tab delimiters
## 3 Read were aligned to the UCSC hg19 transcriptomee using Bowtie v0.12.7. Expression levels were quantified using RSEM v1.2.1 (TPM values). UCSC genome table browser was used to map UCSC gene ID (kgID) and the gene name (geneSymbol) for all genes in hg19. If multiple UCSC gene IDs are assigned to the same geneSymbol, the TPM values of all kgIDs that share the same geneSymbol are summed. Genome_build: hg19. Supplementary_files_format_and_content: Text file with tab delimiters
## 4 Read were aligned to the UCSC hg19 transcriptomee using Bowtie v0.12.7. Expression levels were quantified using RSEM v1.2.1 (TPM values). UCSC genome table browser was used to map UCSC gene ID (kgID) and the gene name (geneSymbol) for all genes in hg19. If multiple UCSC gene IDs are assigned to the same geneSymbol, the TPM values of all kgIDs that share the same geneSymbol are summed. Genome_build: hg19. Supplementary_files_format_and_content: Text file with tab delimiters
## 5 Read were aligned to the UCSC hg19 transcriptomee using Bowtie v0.12.7. Expression levels were quantified using RSEM v1.2.1 (TPM values). UCSC genome table browser was used to map UCSC gene ID (kgID) and the gene name (geneSymbol) for all genes in hg19. If multiple UCSC gene IDs are assigned to the same geneSymbol, the TPM values of all kgIDs that share the same geneSymbol are summed. Genome_build: hg19. Supplementary_files_format_and_content: Text file with tab delimiters
## 6 Read were aligned to the UCSC hg19 transcriptomee using Bowtie v0.12.7. Expression levels were quantified using RSEM v1.2.1 (TPM values). UCSC genome table browser was used to map UCSC gene ID (kgID) and the gene name (geneSymbol) for all genes in hg19. If multiple UCSC gene IDs are assigned to the same geneSymbol, the TPM values of all kgIDs that share the same geneSymbol are summed. Genome_build: hg19. Supplementary_files_format_and_content: Text file with tab delimiters
##                                                                                                                                                                                                                                                  SupplementaryFile
## 1 ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE94nnn/GSE94820/suppl/GSE94820_raw.expMatrix_DCnMono.discovery.set.submission.txt.gz, ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE94nnn/GSE94820/suppl/GSE94820_raw.expMatrix_deeper.characterization.set.submission.txt.gz
## 2 ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE94nnn/GSE94820/suppl/GSE94820_raw.expMatrix_DCnMono.discovery.set.submission.txt.gz, ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE94nnn/GSE94820/suppl/GSE94820_raw.expMatrix_deeper.characterization.set.submission.txt.gz
## 3 ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE94nnn/GSE94820/suppl/GSE94820_raw.expMatrix_DCnMono.discovery.set.submission.txt.gz, ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE94nnn/GSE94820/suppl/GSE94820_raw.expMatrix_deeper.characterization.set.submission.txt.gz
## 4 ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE94nnn/GSE94820/suppl/GSE94820_raw.expMatrix_DCnMono.discovery.set.submission.txt.gz, ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE94nnn/GSE94820/suppl/GSE94820_raw.expMatrix_deeper.characterization.set.submission.txt.gz
## 5 ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE94nnn/GSE94820/suppl/GSE94820_raw.expMatrix_DCnMono.discovery.set.submission.txt.gz, ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE94nnn/GSE94820/suppl/GSE94820_raw.expMatrix_deeper.characterization.set.submission.txt.gz
## 6 ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE94nnn/GSE94820/suppl/GSE94820_raw.expMatrix_DCnMono.discovery.set.submission.txt.gz, ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE94nnn/GSE94820/suppl/GSE94820_raw.expMatrix_deeper.characterization.set.submission.txt.gz
##          Contact               PMID
## 1 Nir,,Hacohen;  28428369, 30405621
## 2 Nir,,Hacohen;  28428369, 30405621
## 3 Nir,,Hacohen;  28428369, 30405621
## 4 Nir,,Hacohen;  28428369, 30405621
## 5 Nir,,Hacohen;  28428369, 30405621
## 6 Nir,,Hacohen;  28428369, 30405621

Download matrix and load to Seurat

After manually check the extracted metadata, users can download count matrix and load the count matrix to Seurat with ParseGEO.

For count matrix, ParseGEO supports downloading the matrix from supplementary files and extracting from ExpressionSet, users can control the source by specifying down.supp or detecting automatically (ParseGEO will extract the count matrix from ExpressionSet first, if the count matrix is NULL or contains non-integer values, ParseGEO will download supplementary files). While the supplementary files have two main types: single count matrix file containing all cells and CellRanger-style outputs (barcode, matrix, feature/gene), users are required to choose the type of supplementary files with supp.type.

With the count matrix, ParseGEO will load the matrix to Seurat automatically. If multiple samples available, users can choose to merge the SeuratObject with merge.

# for cellranger output
GSE200257.seu <- ParseGEO(
  acce = "GSE200257", platform = NULL, supp.idx = 1, down.supp = TRUE, supp.type = "10x",
  out.folder = "/Users/soyabean/Desktop/tmp/scdown/dwonload_geo"
)
# for count matrix, no need to specify out.folder, download count matrix to tmp folder
GSE94820.seu <- ParseGEO(acce = "GSE94820", platform = NULL, supp.idx = 1, down.supp = TRUE, supp.type = "count")

The structure of downloaded matrix for 10x:

tree /Users/soyabean/Desktop/tmp/scdown/dwonload_geo

## /Users/soyabean/Desktop/tmp/scdown/dwonload_geo
## ├── GSM6025652_1
## │   ├── barcodes.tsv.gz
## │   ├── features.tsv.gz
## │   └── matrix.mtx.gz
## ├── GSM6025653_2
## │   ├── barcodes.tsv.gz
## │   ├── features.tsv.gz
## │   └── matrix.mtx.gz
## ├── GSM6025654_3
## │   ├── barcodes.tsv.gz
## │   ├── features.tsv.gz
## │   └── matrix.mtx.gz
## └── GSM6025655_4
##     ├── barcodes.tsv.gz
##     ├── features.tsv.gz
##     └── matrix.mtx.gz
## 
## 5 directories, 12 files

For bulk RNA-seq, set data.type = "bulk" in ParseGEO, this will return count matrix.

PanglaoDB

PanglaoDB is a database which contains scRNA-seq datasets from mouse and human. Up to now, it contains 5,586,348 cells from 1368 datasets (1063 from Mus musculus and 305 from Homo sapiens). It has well organized metadata for every dataset, including tissue, protocol, species, number of cells and cell type annotation (computationally identified). Daniel Osorio has developed rPanglaoDB to access PanglaoDB data, the functions of scfetch here are based on rPanglaoDB.

Since PanglaoDB is no longer maintained, scfetch has cached all metadata and cell type composition and use these cached data by default to accelerate, users can access the cached data with PanglaoDBMeta (all metadata) and PanglaoDBComposition (all cell type composition).

Summary attributes

scfetch provides StatDBAttribute to summary attributes of PanglaoDB:

# use cached metadata
StatDBAttribute(df = PanglaoDBMeta, filter = c("species", "protocol"), database = "PanglaoDB")

## $species
##          Value  Num     Key
## 1 Mus musculus 1063 species
## 2 Homo sapiens  305 species
## 
## $protocol
##           Value  Num      Key
## 1  10x chromium 1046 protocol
## 2      drop-seq  204 protocol
## 3 microwell-seq   74 protocol
## 4    Smart-seq2   26 protocol
## 5   C1 Fluidigm   16 protocol
## 6       CEL-seq    1 protocol
## 7       inDrops    1 protocol

Extract metadata

scfetch provides ExtractPanglaoDBMeta to select interested datasets with specified species, protocol, tissue and cell number (The available values of these attributes can be obtained with StatDBAttribute). User can also choose to whether to add cell type annotation to every dataset with show.cell.type.

scfetch uses cached metadata and cell type composition by default, users can change this by setting local.data = FALSE.

hsa.meta <- ExtractPanglaoDBMeta(species = "Homo sapiens", protocol = c("Smart-seq2", "10x chromium"), show.cell.type = TRUE, cell.num = c(1000, 2000))
head(hsa.meta)

##         SRA        SRS                             Tissue     Protocol
## 1 SRA550660 SRS2089635 Peripheral blood mononuclear cells 10x chromium
## 2 SRA550660 SRS2089636 Peripheral blood mononuclear cells 10x chromium
## 3 SRA550660 SRS2089638 Peripheral blood mononuclear cells 10x chromium
## 4 SRA605365 SRS2492922            Nasal airway epithelium 10x chromium
## 5 SRA608611 SRS2517316                   Lung progenitors 10x chromium
## 6 SRA608353 SRS2517519           Hepatocellular carcinoma 10x chromium
##        Species Cells
## 1 Homo sapiens  1860
## 2 Homo sapiens  1580
## 3 Homo sapiens  1818
## 4 Homo sapiens  1932
## 5 Homo sapiens  1077
## 6 Homo sapiens  1230
##                                                                      CellType
## 1                                                           Unknown, NK cells
## 2                              Unknown, T cells, Plasmacytoid dendritic cells
## 3 Unknown, Gamma delta T cells, Dendritic cells, Plasmacytoid dendritic cells
## 4       Luminal epithelial cells, Basal cells, Keratinocytes, Ependymal cells
## 5                                           Unknown, Hepatocytes, Basal cells
## 6                                        Unknown, Hepatocytes, Foveolar cells
##   CellNum
## 1    1860
## 2    1580
## 3    1818
## 4    1932
## 5    1077
## 6    1230

Extract cell type composition

scfetch provides ExtractPanglaoDBComposition to extract cell type annotation and composition (use cached data by default to accelerate, users can change this by setting local.data = FALSE).

hsa.composition <- ExtractPanglaoDBComposition(species = "Homo sapiens", protocol = c("Smart-seq2", "10x chromium"))
head(hsa.composition)

##           SRA        SRS                        Tissue     Protocol
## 1.1 SRA553822 SRS2119548 Cultured embryonic stem cells 10x chromium
## 1.2 SRA553822 SRS2119548 Cultured embryonic stem cells 10x chromium
## 1.3 SRA553822 SRS2119548 Cultured embryonic stem cells 10x chromium
## 1.4 SRA553822 SRS2119548 Cultured embryonic stem cells 10x chromium
## 1.5 SRA553822 SRS2119548 Cultured embryonic stem cells 10x chromium
## 1.6 SRA553822 SRS2119548 Cultured embryonic stem cells 10x chromium
##          Species Cluster Cells Cell Type
## 1.1 Homo sapiens       0  1572   Unknown
## 1.2 Homo sapiens       1   563   Unknown
## 1.3 Homo sapiens       2   280   Unknown
## 1.4 Homo sapiens       3   270   Unknown
## 1.5 Homo sapiens       4   220   Unknown
## 1.6 Homo sapiens       5   192   Unknown

Download matrix and load to Seurat

After manually check the extracted metadata, scfetch provides ParsePanglaoDB to download count matrix and load the count matrix to Seurat. With available cell type annotation, uses can filter datasets without specified cell type with cell.type. Users can also include/exclude cells expressing specified genes with include.gene/exclude.gene.

With the count matrix, ParsePanglaoDB will load the matrix to Seurat automatically. If multiple datasets available, users can choose to merge the SeuratObject with merge.

hsa.seu <- ParsePanglaoDB(hsa.meta, merge = TRUE)

UCSC Cell Browser

The UCSC Cell Browser is a web-based tool that allows scientists to interactively visualize scRNA-seq datasets. It contains 1040 single cell datasets from 17 different species. And, it is organized with the hierarchical structure, which can help users quickly locate the datasets they are interested in.

Show available datasets

scfetch provides ShowCBDatasets to show all available datasets. Due to the large number of datasets, ShowCBDatasets enables users to perform lazy load of dataset json files instead of downloading the json files online (time-consuming!!!). This lazy load requires users to provide json.folder to save json files and set lazy = TRUE (for the first time of run, ShowCBDatasets will download current json files to json.folder, for next time of run, with lazy = TRUE, ShowCBDatasets will load the downloaded json files from json.folder.). And, ShowCBDatasets supports updating the local datasets with update = TRUE.

# first time run, the json files are stored under json.folder
# ucsc.cb.samples = ShowCBDatasets(lazy = TRUE, json.folder = "/Users/soyabean/Desktop/tmp/scdown/cell_browser/json", update = TRUE)

# second time run, load the downloaded json files
ucsc.cb.samples <- ShowCBDatasets(lazy = TRUE, json.folder = "/Users/soyabean/Desktop/tmp/scdown/cell_browser/json", update = FALSE)

## Lazy mode is on, load downloaded json from /Users/soyabean/Desktop/tmp/scdown/cell_browser/json

head(ucsc.cb.samples)

##                               name                 shortLabel
## 1  adult-brain-vasc/am-endothelial Vasculature of Adult Brain
## 2       adult-brain-vasc/am-immune Vasculature of Adult Brain
## 3 adult-brain-vasc/am-perivascular Vasculature of Adult Brain
## 4     adult-brain-vasc/endothelial Vasculature of Adult Brain
## 5    adult-brain-vasc/perivascular Vasculature of Adult Brain
## 6                     adult-testis               Adult Testis
##                                              subLabel tags body_parts
## 1  Endothelial - Arteriovenous Malformation & Control           brain
## 2       Immune - Arteriovenous Malformation & Control           brain
## 3 Perivascular - Arteriovenous Malformation & Control           brain
## 4                             Adult Brain Endothelial           brain
## 5                            Adult Brain Perivascular           brain
## 6                                                <NA>          testis
##         diseases                 organisms projects life_stages domains sources
## 1 Healthy|parent Human (H. sapiens)|parent                           NA      NA
## 2 Healthy|parent Human (H. sapiens)|parent                           NA      NA
## 3 Healthy|parent Human (H. sapiens)|parent                           NA      NA
## 4 Healthy|parent Human (H. sapiens)|parent                           NA      NA
## 5 Healthy|parent Human (H. sapiens)|parent                           NA      NA
## 6        Healthy        Human (H. sapiens)                                     
##   sampleCount assays            matrix         barcode         feature
## 1        9541        exprMatrix.tsv.gz                                
## 2       55255            matrix.mtx.gz barcodes.tsv.gz features.tsv.gz
## 3      101317            matrix.mtx.gz barcodes.tsv.gz features.tsv.gz
## 4        5018        exprMatrix.tsv.gz                                
## 5       49553        exprMatrix.tsv.gz                                
## 6        6199        exprMatrix.tsv.gz                                
##   matrixType                                               title
## 1     matrix  Arteriovenous malformation and control endothelial
## 2        10x       Arteriovenous malformation and control immune
## 3        10x Arteriovenous malformation and control perivascular
## 4     matrix                  Adult Brain Endothelial Cell Types
## 5     matrix                 Adult Brain Perivascular Cell Types
## 6     matrix   The adult human testis transcriptional cell atlas
##                                                                          paper
## 1                                                                             
## 2                                                                             
## 3                                                                             
## 4                                                                             
## 5                                                                             
## 6 https://www.nature.com/articles/s41422-018-0099-2 Guo et al. 2018. Cell Res.
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   abstract
## 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    \nArteriovenous malformation and control endothelial cell types coembedded with\ntheir respective cell types: nidus, arterial, venous, and venules.\n
## 2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 \nArteriovenous malformation and control immune cell types coembedding of myeloid\nand lymphoid cells.\n
## 3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        \nArteriovenous malformation and control perivascular cell types coembedded with\ntheir respective cell types: smooth muscle cells, pericytes, fibroblasts, and\nfibromyocytes.\n
## 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                \nAdult brain endothelial cell types broken down into four broad cell types:\ncapillary, arterial, venous, and venules.\n
## 5                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         \nAdult brain perivascular cell types broken down into four broad cell\ntypes: smooth muscle cells, pericytes, fibroblasts, and fibromyocytes.\n
## 6 \n<p>\nFrom <a href="https://www.nature.com/articles/s41422-018-0099-2"\ntarget="_blank">Guo et al</a>:\n</p>\n\n<p>\nHuman adult spermatogenesis balances spermatogonial stem cell (SSC)\nself-renewal and differentiation, alongside complex germ cell-niche\ninteractions, to ensure long-term fertility and faithful genome propagation.\nHere, we performed single-cell RNA sequencing of ~6500 testicular cells from\nyoung adults. We found five niche/somatic cell types (Leydig, myoid, Sertoli,\nendothelial, macrophage), and observed germline-niche interactions and key\nhuman-mouse differences. Spermatogenesis, including meiosis, was reconstructed\ncomputationally, revealing sequential coding, non-coding, and repeat-element\ntranscriptional signatures. Interestingly, we identified five discrete\ntranscriptional/developmental spermatogonial states, including a novel early\nSSC state, termed State 0. Epigenetic features and nascent transcription\nanalyses suggested developmental plasticity within spermatogonial States. To\nunderstand the origin of State 0, we profiled testicular cells from infants,\nand identified distinct similarities between adult State 0 and infant SSCs.\nOverall, our datasets describe key transcriptional and epigenetic signatures of\nthe normal adult human testis, and provide new insights into germ cell\ndevelopmental transitions and plasticity.\n</p>\n
##   unit                    coords
## 1      Seurat_umap.coords.tsv.gz
## 2      Seurat_umap.coords.tsv.gz
## 3      Seurat_umap.coords.tsv.gz
## 4      Seurat_umap.coords.tsv.gz
## 5             UMAP.coords.tsv.gz
## 6          umap_hm.coords.tsv.gz
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  methods
## 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
## 2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
## 3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
## 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
## 5                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
## 6 \n<p>\nDataset was imported from the h5ad file available on the <a href="https://www.covid19cellatlas.org/"\ntarget="_blank"> COVID-19 Cell Atlas website</a> using the UCSC Cell Browser utility\n<code>cbImportScanpy</code>\n</p>\n\n<section>Single cell RNA-seq performance, library preparation and sequencing</section>\n<p>\nscRNA-Seq was performed using the 10× Genomics system. Briefly, each experiment\ncaptured ~1500 single cells, in order to obtain ~0.8% multiplex rate. Cells\nwere diluted following manufacturer recommendations, and mixed with 33.8 µL of\ntotal mixed buffer before being loaded into 10× Chromium Controller using\nChromium Single Cell 3’ v2 reagents. Each sequencing library was prepared\nfollowing the manufacturer’s instructions, with 13 cycles used for cDNA\namplification. Then ~100 ng of cDNA were used for library amplification by 12\ncycles. The resulting libraries were then sequenced on a 26 × 100 cycle\npaired-end run on an Illumina HiSeq 2500 instrument.\n</p>\n\n<section>Process of single cell RNA-seq data</section>\n<p>\nRaw sequencing data were demultiplexed using the mkfastq application (Cell\nRanger v1.2.1). Three types of fastq files were generated: I1 contains 8 bp\nsample index; R1 contains 26 bp (10 bp cell-BC + 16 bp UMI) index and R2\ncontains 100 bp cDNA sequence. Fastq files were then run with the cellranger\ncount application (Cell Ranger v1.2.1) using default settings, to perform\nalignment (using STAR v2.5.4a), filtering and cellular barcode and UMI\ncounting. The UMI count tables of each cellular barcode were used for further\nanalysis.\n</p>\n\n<section>Cell type identification and clustering analysis using Seurat program</section>\n<p>\nThe Seurat program (http://satijalab.org/seurat/, R package, v.2.0.0) was\nfirstly applied for analysis of RNA-Sequencing data. To start with, UMI count\ntables from each replicates and donors were loaded into R using Read10X\nfunction, and Seurat objects were built from each experiment. Each experiment\nwas filtered and normalized with default settings. Specifically, cells were\nretained only when they had greater than 500 genes expressed, and less than 20%\nreads mapped to mitochondrial genome. We first ran t-SNE and the clustering\nanalysis for each replicate, which resulted in similar t-SNE map. Next, to\nminimize variation between technical replicates, we normalized and combined\ntechnical replicates from the same donor using the 10× Genomics built-in\napplication from Cell Ranger “cellrange aggr”. Data matrices from different\ndonors were then loaded into R using Seurat. Next, cells were normalized to the\ntotal UMI read count as well as mitochondrial read percentage, as instructed in\nthe manufacturer’s manual (http://satijalab.org/seurat/). Seurat objects\n(matrices from different donors) were then combined using RunCCA function.\nt-SNE and clustering analyses were then performed on the combined dataset using\nthe top 5000 highly variable genes and PCs 1–15, which showed most significant\np-values. Given the low number of Sertoli cells (underrepresented due to size\nfiltering), the initial clustering analysis did not identify them as a separate\ncluster. We performed deeper clustering of somatic cells, identified the\nSertoli cell cluster, and projected it back to the overall clusters, which\nresulted in 13 discrete cell clusters. Correlation of different replicates was\ncalculated based on average expression (normalized UMIs by Seurat) in each\nexperiment.\n</p>\n\n<p>\nSee the source paper <a href="https://www.nature.com/articles/s41422-018-0099-2"\ntarget="_blank">Guo et al. 2018. Cell Res.</a> for more details.\n</p>\n
##         geo
## 1          
## 2          
## 3          
## 4          
## 5          
## 6 GSE120508

# always read online
# ucsc.cb.samples = ShowCBDatasets(lazy = FALSE)

The number of datasets and all available species:

# the number of datasets
nrow(ucsc.cb.samples)

## [1] 1040

# available species
unique(unlist(sapply(unique(gsub(pattern = "\\|parent", replacement = "", x = ucsc.cb.samples$organisms)), function(x) {
  unlist(strsplit(x = x, split = ", "))
})))

##  [1] "Human (H. sapiens)"          "Mouse (M. musculus)"        
##  [3] "Rhesus macaque (M. mulatta)" "Dog (C. familiaris)"        
##  [5] "Human (H. Sapiens)"          "C. intestinalis"            
##  [7] "C. robusta"                  "Zebrafish (D. rerio)"       
##  [9] "Fruit fly (D. melanogaster)" "Hydra vulgaris"             
## [11] "Capitella teleta"            "Spongilla lacustris"        
## [13] "X. tropicalis"               "Chimp (P. troglodytes)"     
## [15] "Bonobo (P. paniscus)"        "S. mansoni"                 
## [17] "Sea urchin (S. purpuratus)"  "Human-Mouse Xenograft"

Summary attributes

scfetch provides StatDBAttribute to summary attributes of UCSC Cell Browser:

StatDBAttribute(df = ucsc.cb.samples, filter = c("organism", "organ"), database = "UCSC")

## $organism
##                          Value Num      Key
## 1           human (h. sapiens) 525 organism
## 2          mouse (m. musculus) 196 organism
## 3  fruit fly (d. melanogaster)  32 organism
## 4  rhesus macaque (m. mulatta)  25 organism
## 5             capitella teleta  18 organism
## 6               hydra vulgaris  18 organism
## 7          spongilla lacustris  18 organism
## 8         zebrafish (d. rerio)  10 organism
## 9              c. intestinalis   9 organism
## 10      chimp (p. troglodytes)   8 organism
## 11         dog (c. familiaris)   6 organism
## 12                  c. robusta   5 organism
## 13        bonobo (p. paniscus)   3 organism
## 14  sea urchin (s. purpuratus)   3 organism
## 15       human-mouse xenograft   1 organism
## 16                  s. mansoni   1 organism
## 17               x. tropicalis   1 organism
## 
## $organ
##                              Value Num   Key
## 1                            brain 175 organ
## 2                              eye 136 organ
## 3                           retina 133 organ
## 4                             lung  72 organ
## 5                           muscle  44 organ
## 6                            blood  42 organ
## 7                            heart  36 organ
## 8                  skeletal muscle  35 organ
## 9                         pancreas  24 organ
## 10                          thymus  23 organ
## 11                          kidney  22 organ
## 12                          immune  21 organ
## 13                     bone marrow  20 organ
## 14                            skin  20 organ
## 15                             gut  19 organ
## 16                           liver  19 organ
## 17                  whole organism  18 organ
## 18                          embryo  17 organ
## 19                           ovary  16 organ
## 20                          spleen  15 organ
## 21                peripheral blood  13 organ
## 22                             all  12 organ
## 23                           colon  11 organ
## 24                           nasal  11 organ
## 25                           tumor  11 organ
## 26                          testis  10 organ
## 27                        organoid   9 organ
## 28                          cortex   8 organ
## 29                           fetal   8 organ
## 30                     hippocampus   7 organ
## 31                       intestine   7 organ
## 32                 large intestine   7 organ
## 33                      lymph node   7 organ
## 34                 small intestine   6 organ
## 35                          airway   5 organ
## 36                          breast   5 organ
## 37       leptomeningeal metastasis   5 organ
## 38                        placenta   5 organ
## 39              respiratory system   5 organ
## 40                     spinal cord   5 organ
## 41                         stomach   5 organ
## 42                         trachea   5 organ
## 43                          ureter   5 organ
## 44                            balf   4 organ
## 45                         bladder   4 organ
## 46                      epithelium   4 organ
## 47                       esophagus   4 organ
## 48                   mammary gland   4 organ
## 49                        striatum   4 organ
## 50                          tongue   4 organ
## 51                         adrenal   3 organ
## 52                      cerebellum   3 organ
## 53                  early develop.   3 organ
## 54                     oral cavity   3 organ
## 55                        prostate   3 organ
## 56                  salivary gland   3 organ
## 57                       cell line   2 organ
## 58                        cerebrum   2 organ
## 59                      cord blood   2 organ
## 60                         decidua   2 organ
## 61                        ectoderm   2 organ
## 62                     endothelial   2 organ
## 63                      epithelial   2 organ
## 64                             fat   2 organ
## 65                     fetal liver   2 organ
## 66                       forebrain   2 organ
## 67                         gingiva   2 organ
## 68                           ileum   2 organ
## 69                            limb   2 organ
## 70                     nasopharynx   2 organ
## 71                            nose   2 organ
## 72                      oesophagus   2 organ
## 73                placenta/decidua   2 organ
## 74                          rectum   2 organ
## 75                           teeth   2 organ
## 76                         antenna   1 organ
## 77                            body   1 organ
## 78                       body wall   1 organ
## 79                            bone   1 organ
## 80                     bone marroe   1 organ
## 81            brown adipose tissue   1 organ
## 82                    cell culture   1 organ
## 83                          cornea   1 organ
## 84                           cotex   1 organ
## 85                       diaphragm   1 organ
## 86  dorsolateral prefrontal cortex   1 organ
## 87                     endometrium   1 organ
## 88          enteric nervous system   1 organ
## 89                       epidermis   1 organ
## 90                         fatbody   1 organ
## 91                           fovea   1 organ
## 92                     gallbladder   1 organ
## 93          gonadal adipose tissue   1 organ
## 94                         haltere   1 organ
## 95                            head   1 organ
## 96                             leg   1 organ
## 97        male reproductive glands   1 organ
## 98               malpighian-tubule   1 organ
## 99       mesechymal adipost tissue   1 organ
## 100                   nasal mucosa   1 organ
## 101                      neocortex   1 organ
## 102                   neural crest   1 organ
## 103                       oenocyte   1 organ
## 104              peritoneal cavity   1 organ
## 105  proboscis and maxillary palps   1 organ
## 106                        stromal   1 organ
## 107    subcutaneous adipose tissue   1 organ
## 108                         uterus   1 organ
## 109                    vasculature   1 organ
## 110                   whole embryo   1 organ
## 111                           wing   1 organ
## 112                       yolk sac   1 organ

Extract metadata

scfetch provides ExtractCBDatasets to filter metadata with collection, sub-collection, organ, disease status, organism, project and cell number (The available values of these attributes can be obtained with StatDBAttribute except cell number). All attributes except cell number support fuzzy match with fuzzy.match, this is useful when selecting datasets.

hbb.sample.df <- ExtractCBDatasets(all.samples.df = ucsc.cb.samples, organ = c("brain", "blood"), organism = "Human (H. sapiens)", cell.num = c(1000, 2000))

## Use all shortLabel as input!

## Use all subLabel as input!

## Use all diseases as input!

## Use all projects as input!

head(hbb.sample.df)

##                                              name
## 1   allen-celltypes/comparative-thalmus/human-lgn
## 2 allen-celltypes/comparative-thalmus/macaque-lgn
## 3   allen-celltypes/comparative-thalmus/mouse-lgd
## 4                      lepto-metastasis/patient-d
##                                             shortLabel
## 1                 Allen Brain Map: Cell Types Database
## 2                 Allen Brain Map: Cell Types Database
## 3                 Allen Brain Map: Cell Types Database
## 4 Single-cell atlas of human leptomeningeal metastasis
##                            subLabel tags
## 1                         Human LGN     
## 2                       Macaque LGN     
## 3                         Mouse LGd     
## 4 Patient D with Lung Primary Tumor     
##                                             body_parts
## 1                                         brain|parent
## 2                                         brain|parent
## 3                                         brain|parent
## 4 brain, spinal cord, tumor, leptomeningeal metastasis
##                         diseases
## 1                 Healthy|parent
## 2                 Healthy|parent
## 3                 Healthy|parent
## 4 Leptomeningeal Melanoma|parent
##                                                                     organisms
## 1 Human (H. sapiens), Mouse (M. musculus), Rhesus macaque (M. mulatta)|parent
## 2 Human (H. sapiens), Mouse (M. musculus), Rhesus macaque (M. mulatta)|parent
## 3 Human (H. sapiens), Mouse (M. musculus), Rhesus macaque (M. mulatta)|parent
## 4                                                   Human (H. sapiens)|parent
##                   projects life_stages domains sources sampleCount assays
## 1 Allen Brain Atlas|parent                  NA      NA        1576       
## 2 Allen Brain Atlas|parent                  NA      NA        1092       
## 3 Allen Brain Atlas|parent                  NA      NA        1996       
## 4                                           NA      NA        1682       
##              matrix barcode feature matrixType
## 1 exprMatrix.tsv.gz                     matrix
## 2 exprMatrix.tsv.gz                     matrix
## 3 exprMatrix.tsv.gz                     matrix
## 4 exprMatrix.tsv.gz                     matrix
##                                                                title paper
## 1                             Human Lateral Geniculate Nucleus (LGN)      
## 2                           Macaque Lateral Geniculate Nucleus (LGN)      
## 3 Comparative Thalamus - Mouse Dorsolateral Geniculate Complex (LGd)      
## 4                                  Patient D with Lung Primary Tumor      
##                                                                                                           abstract
## 1                                                             This dataset covers 1,576 nuclei from human samples.
## 2                                                           This dataset covers 1,092 nuclei from macaque samples.
## 3                                                              This dataset covers 1,996 cells from mouse samples.
## 4 CSF cell fraction isolated from Patient D (primary tumor in lung) with newly diagnosed leptomeningeal metastasis
##   unit                                                            coords
## 1      UMAP.coords.tsv.gz, ForceAtlas2.coords.tsv.gz, tSNE.coords.tsv.gz
## 2      UMAP.coords.tsv.gz, ForceAtlas2.coords.tsv.gz, tSNE.coords.tsv.gz
## 3      UMAP.coords.tsv.gz, ForceAtlas2.coords.tsv.gz, tSNE.coords.tsv.gz
## 4                                                     UMAP.coords.tsv.gz
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 methods
## 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   See the Allen Brain Atlas\n<a href="http://help.brain-map.org/display/celltypes/Documentation" traget="_blank"\n>Documentation</a> for details about how the different aspects of this project\nwere carried out.\n
## 2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   See the Allen Brain Atlas\n<a href="http://help.brain-map.org/display/celltypes/Documentation" traget="_blank"\n>Documentation</a> for details about how the different aspects of this project\nwere carried out.\n
## 3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   See the Allen Brain Atlas\n<a href="http://help.brain-map.org/display/celltypes/Documentation" traget="_blank"\n>Documentation</a> for details about how the different aspects of this project\nwere carried out.\n
## 4 <section>Human CSF Single-Cell Transcriptomic Analysis</section>\n<p>Single-cell and bulk RNA-sequencing data have been deposited to NCBI GEO as\nGSE150681 SuperSeries. CSF, collected with informed consent from patients under\nprotocol IRB 13-039, was processed to isolate the sample into cell-free CSF and the cellular contents of the CSF: The whole CSF sample was centrifuged at 600 x g for 5 minutes without brake at 4 ºC to pellet the cells, and the supernatant was saved as cellfree CSF. The pellet was resuspended, washed with PBS supplemented with 0.4% BSA twice and processed immediately. The cells were manually counted with a hematocytometer. scRNA-Seq was performed with 10X genomics system using Chromium Single Cell 3' Library and Gel Bead Kit V2 (catalog no. 120234). Briefly,8,700 cells (viability 70-80%) were processed per sample, targeting recovery of ~5,000\ncells with 3.9% multiplet rate. In cases, where cell count was too low to target 5,000\ncells, maximum volume (34 µl) was loaded in the microfluidic droplet generation device.\nAfter reverse transcription reaction emulsions were broken, barcoded cDNA was purified\nwith DynaBeads, followed by 12-cycles of PCR amplification. The resulting amplified\nbarcoded-cDNA library was fragmented to ~400-600 bp, ligated to sequencing adapter\nand PCR-amplified to obtain sufficient amount of material for next-generation\nsequencing. The final libraries were sequenced on an Illumina NovaSeq 6000 system\n(Read 1 - 28 cycles, Index Read  8 cycles, and Read 2 - 96 cycles).\n</p> \n<section>Annotation of cell types</section>\n<p>Cell quality control steps described earlier are quite permissive and sometimes fail to\neliminate all low-quality cells. These cells typically do not express any particular\ngenes and frequently have low library size. In our dataset, after careful consideration,\ncluster 10 was identified as low-quality, dying cells and ultimately eliminated.\nMoreover, cluster 5 and 17 exhibited both T cell and myeloid lineage phenotypic\nprofiles and were eliminated from the dataset. To facilitate annotation of remaining 15\nclusters identified by PhenoGraph, we have examined MAGIC imputed gene expression\nof known marker genes (Figure S6A) and MAST derived differentially expressed genes\nbetween PhenoGraph clusters. Markers used to identify major cell types included MS4A1\nand CD79A (B cells), IL3RA and CLEC4C (plasmacytoid DC), EPCAM and KRT18\n(Cancer cells), CD3D and CD8A (CD8 T cells), CD3D, IL7R and CD4 (CD4 T cells),\nGNLY, NKG7, KLRB1 and NCAM1 (NK cells), CD14, CD68 and CST3\n(Macrophages), FCGR3A, LYZ and CST3 (Monocytes), CST3 and FCER1A (conventional DC).</p>\n\n\n
##   geo
## 1    
## 2    
## 3    
## 4

Extract cell type composition

scfetch provides ExtractCBComposition to extract cell type annotation and composition.

hbb.sample.ct <- ExtractCBComposition(json.folder = "/Users/soyabean/Desktop/tmp/scdown/cell_browser/json", sample.df = hbb.sample.df)
head(hbb.sample.ct)

##                             shortLabel  subLabel           CellType Num tags
## 1 Allen Brain Map: Cell Types Database Human LGN      LGN Exc BTNL9 908     
## 2 Allen Brain Map: Cell Types Database Human LGN          Oligo MAG 188     
## 3 Allen Brain Map: Cell Types Database Human LGN LGN Exc PRKCG BCHE 102     
## 4 Allen Brain Map: Cell Types Database Human LGN      LGN Inh CTXN3  96     
## 5 Allen Brain Map: Cell Types Database Human LGN      LGN Inh LAMP5  72     
## 6 Allen Brain Map: Cell Types Database Human LGN      LGN Inh NTRK1  42     
##     body_parts       diseases
## 1 brain|parent Healthy|parent
## 2 brain|parent Healthy|parent
## 3 brain|parent Healthy|parent
## 4 brain|parent Healthy|parent
## 5 brain|parent Healthy|parent
## 6 brain|parent Healthy|parent
##                                                                     organisms
## 1 Human (H. sapiens), Mouse (M. musculus), Rhesus macaque (M. mulatta)|parent
## 2 Human (H. sapiens), Mouse (M. musculus), Rhesus macaque (M. mulatta)|parent
## 3 Human (H. sapiens), Mouse (M. musculus), Rhesus macaque (M. mulatta)|parent
## 4 Human (H. sapiens), Mouse (M. musculus), Rhesus macaque (M. mulatta)|parent
## 5 Human (H. sapiens), Mouse (M. musculus), Rhesus macaque (M. mulatta)|parent
## 6 Human (H. sapiens), Mouse (M. musculus), Rhesus macaque (M. mulatta)|parent
##                   projects life_stages domains sources sampleCount assays
## 1 Allen Brain Atlas|parent                  NA      NA        1576       
## 2 Allen Brain Atlas|parent                  NA      NA        1576       
## 3 Allen Brain Atlas|parent                  NA      NA        1576       
## 4 Allen Brain Atlas|parent                  NA      NA        1576       
## 5 Allen Brain Atlas|parent                  NA      NA        1576       
## 6 Allen Brain Atlas|parent                  NA      NA        1576       
##                                    title paper
## 1 Human Lateral Geniculate Nucleus (LGN)      
## 2 Human Lateral Geniculate Nucleus (LGN)      
## 3 Human Lateral Geniculate Nucleus (LGN)      
## 4 Human Lateral Geniculate Nucleus (LGN)      
## 5 Human Lateral Geniculate Nucleus (LGN)      
## 6 Human Lateral Geniculate Nucleus (LGN)      
##                                               abstract
## 1 This dataset covers 1,576 nuclei from human samples.
## 2 This dataset covers 1,576 nuclei from human samples.
## 3 This dataset covers 1,576 nuclei from human samples.
## 4 This dataset covers 1,576 nuclei from human samples.
## 5 This dataset covers 1,576 nuclei from human samples.
## 6 This dataset covers 1,576 nuclei from human samples.
##                                                                                                                                                                                                               methods
## 1 See the Allen Brain Atlas\n<a href="http://help.brain-map.org/display/celltypes/Documentation" traget="_blank"\n>Documentation</a> for details about how the different aspects of this project\nwere carried out.\n
## 2 See the Allen Brain Atlas\n<a href="http://help.brain-map.org/display/celltypes/Documentation" traget="_blank"\n>Documentation</a> for details about how the different aspects of this project\nwere carried out.\n
## 3 See the Allen Brain Atlas\n<a href="http://help.brain-map.org/display/celltypes/Documentation" traget="_blank"\n>Documentation</a> for details about how the different aspects of this project\nwere carried out.\n
## 4 See the Allen Brain Atlas\n<a href="http://help.brain-map.org/display/celltypes/Documentation" traget="_blank"\n>Documentation</a> for details about how the different aspects of this project\nwere carried out.\n
## 5 See the Allen Brain Atlas\n<a href="http://help.brain-map.org/display/celltypes/Documentation" traget="_blank"\n>Documentation</a> for details about how the different aspects of this project\nwere carried out.\n
## 6 See the Allen Brain Atlas\n<a href="http://help.brain-map.org/display/celltypes/Documentation" traget="_blank"\n>Documentation</a> for details about how the different aspects of this project\nwere carried out.\n
##   geo
## 1    
## 2    
## 3    
## 4    
## 5    
## 6

Load the online datasets to Seurat

After manually check the extracted metadata, scfetch provides ParseCBDatasets to load the online count matrix to Seurat. All the attributes available in ExtractCBDatasets are also same here. Please note that the loading process provided by ParseCBDatasets will load the online count matrix instead of downloading it to local. If multiple datasets available, users can choose to merge the SeuratObject with merge.

hbb.sample.seu <- ParseCBDatasets(sample.df = hbb.sample.df)

2023-07-23

Introduction

GEO

Extract metadata

Download matrix and load to Seurat

PanglaoDB

Summary attributes

Extract metadata

Extract cell type composition

Download matrix and load to Seurat

UCSC Cell Browser

Show available datasets

Summary attributes

Extract metadata

Extract cell type composition

Load the online datasets to Seurat