Introduction
GEfetch2R
provides functions for users to download
count matrices and annotations
(e.g. cell type annotation and composition) from GEO and some
single-cell databases (e.g. PanglaoDB and UCSC Cell Browser).
GEfetch2R
also supports loading the downloaded data to
Seurat
.
Until now, the public resources supported and the returned values:
Resources | URL | Download Type | Returned values |
---|---|---|---|
GEO | https://www.ncbi.nlm.nih.gov/geo/ | count matrix | SeuratObject or count matrix for bulk RNA-seq |
PanglaoDB | https://panglaodb.se/index.html | count matrix and annotation | SeuratObject |
UCSC Cell Browser | https://cells.ucsc.edu/ | count matrix and annotation | SeuratObject |
GEO
GEO is an international public repository that archives and freely distributes microarray, next-generation sequencing, and other forms of high-throughput functional genomics data submitted by the research community. It provides a very convenient way for users to explore and select interested scRNA-seq datasets.
## Warning: replacing previous import 'LoomExperiment::import' by
## 'reticulate::import' when loading 'GEfetch2R'
Extract metadata
GEfetch2R
provides ExtractGEOMeta
to
extract sample metadata, including sample title, source name/tissue,
description, cell type, treatment, paper title, paper abstract,
organism, protocol, data processing methods, et al.
# library
library(GEfetch2R)
# extract metadata of specified platform
GSE200257.meta <- ExtractGEOMeta(acce = "GSE200257", platform = "GPL24676")
# set VROOM_CONNECTION_SIZE to avoid error: Error: The size of the connection buffer (786432) was not large enough
Sys.setenv("VROOM_CONNECTION_SIZE" = 131072 * 60)
# extract metadata of all platforms
GSE94820.meta <- ExtractGEOMeta(acce = "GSE94820", platform = NULL)
Show the metadata:
head(GSE94820.meta)
Download matrix and load to Seurat
After manually check the extracted metadata, users can
download count matrix and load the count
matrix to Seurat with ParseGEO
.
For count matrix, ParseGEO
supports downloading the
matrix from supplementary files and extracting from
ExpressionSet
, users can control the source by specifying
down.supp
or detecting automatically (ParseGEO
will extract the count matrix from ExpressionSet
first, if
the count matrix is NULL or contains non-integer values,
ParseGEO
will download supplementary files). While the
supplementary files have two main types: single count matrix file
containing all cells and CellRanger-style outputs (barcode, matrix,
feature/gene), users are required to choose the type of supplementary
files with supp.type
.
With the count matrix, ParseGEO
will load the matrix to
Seurat automatically. If multiple samples available, users can choose to
merge the SeuratObject with merge
.
# cellranger out
GSE200257.seu <- ParseGEO(
acce = "GSE200257", platform = NULL, supp.idx = 1,
down.supp = TRUE, supp.type = "10x",
out.folder = "/Volumes/soyabean/GEfetch2R/dwonload_geo"
)
# count matrix
GSE94820.seu <- ParseGEO(
acce = "GSE94820", platform = NULL,
supp.idx = 1, down.supp = TRUE,
supp.type = "count"
)
Show the GSE200257.seu
:
# for cellranger output
GSE200257.seu
Show the GSE94820.seu
:
# for count matrix
GSE94820.seu
The structure of downloaded count matrix for 10x:
tree /Volumes/soyabean/GEfetch2R/dwonload_geo
## /Volumes/soyabean/GEfetch2R/dwonload_geo
## ├── GSM6025652_1
## │ ├── barcodes.tsv.gz
## │ ├── features.tsv.gz
## │ └── matrix.mtx.gz
## ├── GSM6025653_2
## │ ├── barcodes.tsv.gz
## │ ├── features.tsv.gz
## │ └── matrix.mtx.gz
## ├── GSM6025654_3
## │ ├── barcodes.tsv.gz
## │ ├── features.tsv.gz
## │ └── matrix.mtx.gz
## └── GSM6025655_4
## ├── barcodes.tsv.gz
## ├── features.tsv.gz
## └── matrix.mtx.gz
##
## 5 directories, 12 files
For bulk RNA-seq, set
data.type = "bulk"
in ParseGEO
, this will
return count matrix.
PanglaoDB
PanglaoDB is a database
which contains scRNA-seq datasets from mouse and human. Up to now, it
contains 5,586,348 cells from 1368 datasets
(1063 from Mus musculus and 305 from Homo sapiens). It has well
organized metadata for every dataset, including tissue, protocol,
species, number of cells and cell type annotation (computationally
identified). Daniel Osorio has developed rPanglaoDB to access
PanglaoDB data, the
functions of GEfetch2R
here are based on rPanglaoDB.
Since PanglaoDB is no
longer maintained, GEfetch2R
has cached all metadata and
cell type composition and use these cached data by default to
accelerate, users can access the cached data with
PanglaoDBMeta
(all metadata) and
PanglaoDBComposition
(all cell type composition).
Summary attributes
GEfetch2R
provides StatDBAttribute
to
summary attributes of PanglaoDB:
# use cached metadata
StatDBAttribute(df = PanglaoDBMeta, filter = c("species", "protocol"), database = "PanglaoDB")
## $species
## Value Num Key
## 1 Mus musculus 1063 species
## 2 Homo sapiens 305 species
##
## $protocol
## Value Num Key
## 1 10x chromium 1046 protocol
## 2 drop-seq 204 protocol
## 3 microwell-seq 74 protocol
## 4 Smart-seq2 26 protocol
## 5 C1 Fluidigm 16 protocol
## 6 CEL-seq 1 protocol
## 7 inDrops 1 protocol
Extract metadata
GEfetch2R
provides ExtractPanglaoDBMeta
to
select interested datasets with specified species,
protocol, tissue and cell
number (The available values of these attributes can be
obtained with StatDBAttribute
). User can also choose to
whether to add cell type annotation to every dataset with
show.cell.type
.
GEfetch2R
uses cached metadata and cell type composition
by default, users can change this by setting
local.data = FALSE
.
hsa.meta <- ExtractPanglaoDBMeta(
species = "Homo sapiens", protocol = c("Smart-seq2", "10x chromium"),
show.cell.type = TRUE, cell.num = c(1000, 2000)
)
Show the metadata:
head(hsa.meta)
## SRA SRS Tissue Protocol
## 1 SRA550660 SRS2089635 Peripheral blood mononuclear cells 10x chromium
## 2 SRA550660 SRS2089636 Peripheral blood mononuclear cells 10x chromium
## 3 SRA550660 SRS2089638 Peripheral blood mononuclear cells 10x chromium
## 4 SRA605365 SRS2492922 Nasal airway epithelium 10x chromium
## 5 SRA608611 SRS2517316 Lung progenitors 10x chromium
## 6 SRA608353 SRS2517519 Hepatocellular carcinoma 10x chromium
## Species Cells
## 1 Homo sapiens 1860
## 2 Homo sapiens 1580
## 3 Homo sapiens 1818
## 4 Homo sapiens 1932
## 5 Homo sapiens 1077
## 6 Homo sapiens 1230
## CellType
## 1 Unknown, NK cells
## 2 Unknown, T cells, Plasmacytoid dendritic cells
## 3 Unknown, Gamma delta T cells, Dendritic cells, Plasmacytoid dendritic cells
## 4 Luminal epithelial cells, Basal cells, Keratinocytes, Ependymal cells
## 5 Unknown, Hepatocytes, Basal cells
## 6 Unknown, Hepatocytes, Foveolar cells
## CellNum
## 1 1860
## 2 1580
## 3 1818
## 4 1932
## 5 1077
## 6 1230
Extract cell type composition
GEfetch2R
provides
ExtractPanglaoDBComposition
to extract cell type annotation
and composition (use cached data by default to accelerate, users can
change this by setting local.data = FALSE
).
hsa.composition <- ExtractPanglaoDBComposition(
species = "Homo sapiens",
protocol = c("Smart-seq2", "10x chromium")
)
Show the extracted cell type annotation and composition:
head(hsa.composition)
## SRA SRS Tissue Protocol
## 1.1 SRA553822 SRS2119548 Cultured embryonic stem cells 10x chromium
## 1.2 SRA553822 SRS2119548 Cultured embryonic stem cells 10x chromium
## 1.3 SRA553822 SRS2119548 Cultured embryonic stem cells 10x chromium
## 1.4 SRA553822 SRS2119548 Cultured embryonic stem cells 10x chromium
## 1.5 SRA553822 SRS2119548 Cultured embryonic stem cells 10x chromium
## 1.6 SRA553822 SRS2119548 Cultured embryonic stem cells 10x chromium
## Species Cluster Cells Cell Type
## 1.1 Homo sapiens 0 1572 Unknown
## 1.2 Homo sapiens 1 563 Unknown
## 1.3 Homo sapiens 2 280 Unknown
## 1.4 Homo sapiens 3 270 Unknown
## 1.5 Homo sapiens 4 220 Unknown
## 1.6 Homo sapiens 5 192 Unknown
Download matrix and load to Seurat
After manually check the extracted metadata, GEfetch2R
provides ParsePanglaoDB
to download count
matrix and load the count matrix to Seurat.
With available cell type annotation, uses can filter datasets without
specified cell type with cell.type
. Users can also
include/exclude cells expressing specified genes with
include.gene
/exclude.gene
.
With the count matrix, ParsePanglaoDB
will load the
matrix to Seurat automatically. If multiple datasets available, users
can choose to merge the SeuratObject with merge
.
hsa.seu <- ParsePanglaoDB(hsa.meta[1:3, ], merge = TRUE)
Show the returned SeuratObject
:
hsa.seu
## An object of class Seurat
## 25917 features across 4996 samples within 1 assay
## Active assay: RNA (25917 features, 0 variable features)
UCSC Cell Browser
The UCSC Cell Browser is a web-based tool that allows scientists to interactively visualize scRNA-seq datasets. It contains 1267 single cell datasets from 25 different species. And, it is organized with the hierarchical structure, which can help users quickly locate the datasets they are interested in.
Show available datasets
GEfetch2R
provides ShowCBDatasets
to show
all available datasets. Due to the large number of datasets,
ShowCBDatasets
enables users to perform lazy load
of dataset json files instead of downloading the json files online
(time-consuming!!!). This lazy load requires users to provide
json.folder
to save json files and set
lazy = TRUE
(for the first time of run,
ShowCBDatasets
will download current json files to
json.folder
, for next time of run, with
lazy = TRUE
, ShowCBDatasets
will load the
downloaded json files from json.folder
.). And,
ShowCBDatasets
supports updating the local datasets with
update = TRUE
.
# first time run, the json files are stored under json.folder
# ucsc.cb.samples = ShowCBDatasets(lazy = TRUE, json.folder = "/Volumes/soyabean/GEfetch2R/cell_browser/json", update = TRUE)
# second time run, load the downloaded json files
ucsc.cb.samples <- ShowCBDatasets(lazy = TRUE, json.folder = "/Volumes/soyabean/GEfetch2R/cell_browser/json", update = FALSE)
# always read online
# ucsc.cb.samples = ShowCBDatasets(lazy = FALSE)
Show the metadata:
head(ucsc.cb.samples)
## name shortLabel
## 1 ad-aging-brain/ad-aging-brain Aging Brain and Alzheimer's Disease
## 2 ad-aging-brain/ad-atac/erosion Aging Brain and Alzheimer's Disease
## 3 ad-aging-brain/ad-atac/hq Aging Brain and Alzheimer's Disease
## 4 ad-aging-brain/ad-atac/integration Aging Brain and Alzheimer's Disease
## 5 ad-aging-brain/ad-atac/rna-hq Aging Brain and Alzheimer's Disease
## 6 ad-aging-brain/microglia-states Aging Brain and Alzheimer's Disease
## subLabel tags
## 1 AD and Aging Prefrontal Cortex Across 427 individuals 10x
## 2 snATAC-seq of Epigenomic Erosion 10x
## 3 snATAC-seq - AD and aging prefrontal cortex across 92 individuals 10x
## 4 Integrated snRNA-seq and snATAC-seq 10x
## 5 snRNA-seq - AD and aging prefrontal cortex across 92 individuals 10x
## 6 Microglia States In The Aged Brain 10x
## body_parts diseases
## 1 brain, cortex, prefrontal cortex Alzheimer’s disease
## 2 brain Alzheimer's disease|parent
## 3 brain Alzheimer's disease|parent
## 4 brain Alzheimer's disease|parent
## 5 brain Alzheimer's disease|parent
## 6 brain, cortex, prefrontal cortex Alzheimer’s disease
## organisms projects life_stages domains sources
## 1 Human (H. sapiens) ROSMAP atlas NA
## 2 Human (H. sapiens)|parent ROSMAP|parent NA
## 3 Human (H. sapiens)|parent ROSMAP|parent NA
## 4 Human (H. sapiens)|parent ROSMAP|parent NA
## 5 Human (H. sapiens)|parent ROSMAP|parent NA
## 6 Human (H. sapiens) ROSMAP atlas NA
## sampleCount assays matrix barcode
## 1 2327742 ad430_matrix.mtx.gz ad430_barcodes.tsv.gz
## 2 437098 matrix.mtx.gz barcodes.tsv.gz
## 3 171119 matrix.mtx.gz barcodes.tsv.gz
## 4 586083 matrix.mtx.gz barcodes.tsv.gz
## 5 414964 matrix.mtx.gz barcodes.tsv.gz
## 6 152459 matrix.mtx.gz barcodes.tsv.gz
## feature matrixType
## 1 ad430_features.tsv.gz 10x
## 2 features.tsv.gz 10x
## 3 features.tsv.gz 10x
## 4 features.tsv.gz 10x
## 5 features.tsv.gz 10x
## 6 features.tsv.gz 10x
## title
## 1 Single-cell atlas reveals correlates of high cognitive function, dementia, and resilience to Alzheimer's disease pathology
## 2 Epigenomic dissection of Alzheimer's disease pinpoints causal variants and reveals epigenome erosion
## 3 Epigenomic dissection of Alzheimer's disease pinpoints causal variants and reveals epigenome erosion
## 4 Epigenomic dissection of Alzheimer's disease pinpoints causal variants and reveals epigenome erosion
## 5 Epigenomic dissection of Alzheimer's disease pinpoints causal variants and reveals epigenome erosion
## 6 Human Microglial State Dynamics in Alzheimer's Disease Progression
## paper
## 1
## 2
## 3
## 4
## 5
## 6
## abstract
## 1 \nAlzheimer's disease (AD) is the most common cause of dementia worldwide, but\nthe molecular and cellular mechanisms underlying cognitive impairment remain\npoorly understood. To address this, we generated a single-cell transcriptomic\natlas of the aged human prefrontal cortex covering 2.3 million cells from\npost-mortem human brain samples of 427 individuals with varying degrees of AD\npathology and cognitive impairment. Our analyses identified AD\npathology-associated alterations shared between excitatory neuron subtypes,\nrevealed a coordinated increase of the cohesin complex and DNA damage response\nfactors in excitatory neurons and in oligodendrocytes, and uncovered genes and\npathways associated with high cognitive function, dementia, and resilience to\nAD pathology. Furthermore, we identified selectively vulnerable somatostatin\ninhibitory neuron subtypes depleted in AD, discovered two distinct groups of\ninhibitory neurons that were more abundant in individuals with preserved high\ncognitive function late in life, and uncovered a link between inhibitory\nneurons and resilience to AD pathology.\n
## 2 \nRecent work has identified dozens of non-coding loci for Alzheimer's disease\n(AD) risk, but their mechanisms and AD transcriptional regulatory circuitry are\npoorly understood. Here, we profile epigenomic and transcriptomic landscapes of\n850k nuclei from prefrontal cortexes of 92 individuals with and without AD to\nbuild a map of the brain regulome, including epigenomic profiles,\ntranscriptional regulators, co-accessibility modules, and peak-to-gene links in\na cell-type-specific manner. We develop methods for multimodal integration and\ndetecting regulatory modules using peak-to-gene linking. We show AD risk loci\nare enriched in microglial enhancers and for specific TFs including SPI1, ELF2,\nand RUNX1. We detect 9,628 cell-type-specific ATAC-QTL loci, which we integrate\nalongside peak-to-gene links to prioritize AD variant regulatory circuits. We\nreport differential accessibility of regulatory modules in late-AD in glia and\nin early-AD in neurons. Strikingly, late-stage AD brains show global epigenome\ndysregulation indicative of epigenome erosion and cell identity loss.\n
## 3 \nRecent work has identified dozens of non-coding loci for Alzheimer's disease\n(AD) risk, but their mechanisms and AD transcriptional regulatory circuitry are\npoorly understood. Here, we profile epigenomic and transcriptomic landscapes of\n850k nuclei from prefrontal cortexes of 92 individuals with and without AD to\nbuild a map of the brain regulome, including epigenomic profiles,\ntranscriptional regulators, co-accessibility modules, and peak-to-gene links in\na cell-type-specific manner. We develop methods for multimodal integration and\ndetecting regulatory modules using peak-to-gene linking. We show AD risk loci\nare enriched in microglial enhancers and for specific TFs including SPI1, ELF2,\nand RUNX1. We detect 9,628 cell-type-specific ATAC-QTL loci, which we integrate\nalongside peak-to-gene links to prioritize AD variant regulatory circuits. We\nreport differential accessibility of regulatory modules in late-AD in glia and\nin early-AD in neurons. Strikingly, late-stage AD brains show global epigenome\ndysregulation indicative of epigenome erosion and cell identity loss.\n
## 4 \nRecent work has identified dozens of non-coding loci for Alzheimer's disease\n(AD) risk, but their mechanisms and AD transcriptional regulatory circuitry are\npoorly understood. Here, we profile epigenomic and transcriptomic landscapes of\n850k nuclei from prefrontal cortexes of 92 individuals with and without AD to\nbuild a map of the brain regulome, including epigenomic profiles,\ntranscriptional regulators, co-accessibility modules, and peak-to-gene links in\na cell-type-specific manner. We develop methods for multimodal integration and\ndetecting regulatory modules using peak-to-gene linking. We show AD risk loci\nare enriched in microglial enhancers and for specific TFs including SPI1, ELF2,\nand RUNX1. We detect 9,628 cell-type-specific ATAC-QTL loci, which we integrate\nalongside peak-to-gene links to prioritize AD variant regulatory circuits. We\nreport differential accessibility of regulatory modules in late-AD in glia and\nin early-AD in neurons. Strikingly, late-stage AD brains show global epigenome\ndysregulation indicative of epigenome erosion and cell identity loss.\n
## 5 \nRecent work has identified dozens of non-coding loci for Alzheimer's disease\n(AD) risk, but their mechanisms and AD transcriptional regulatory circuitry are\npoorly understood. Here, we profile epigenomic and transcriptomic landscapes of\n850k nuclei from prefrontal cortexes of 92 individuals with and without AD to\nbuild a map of the brain regulome, including epigenomic profiles,\ntranscriptional regulators, co-accessibility modules, and peak-to-gene links in\na cell-type-specific manner. We develop methods for multimodal integration and\ndetecting regulatory modules using peak-to-gene linking. We show AD risk loci\nare enriched in microglial enhancers and for specific TFs including SPI1, ELF2,\nand RUNX1. We detect 9,628 cell-type-specific ATAC-QTL loci, which we integrate\nalongside peak-to-gene links to prioritize AD variant regulatory circuits. We\nreport differential accessibility of regulatory modules in late-AD in glia and\nin early-AD in neurons. Strikingly, late-stage AD brains show global epigenome\ndysregulation indicative of epigenome erosion and cell identity loss.\n
## 6 \nAltered microglial states affect neuroinflammation, neurodegeneration, and\ndisease, but remain poorly understood. Here, we report 194k single-nucleus\nmicroglial transcriptomes and epigenomes across 443 human subjects, and diverse\nAlzheimer's disease (AD) pathological phenotypes. We annotate 12 microglial\ntranscriptional states, including AD-dysregulated homeostatic, inflammatory,\nand lipid-processing states. We identify 1,542 AD-differentially-expressed\ngenes, including both microglia-state-specific and disease-stage-specific\nalterations. By integrating epigenomic, transcriptomic, and motif information,\nwe infer upstream regulators of microglial cell states, gene-regulatory\nnetworks, enhancer-gene links, and transcription factor-driven microglial state\ntransitions. We demonstrate that ectopic expression of our predicted\nhomeostatic-state activators induces homeostatic features in human iPSC-derived\nmicroglia-like cells, while inhibiting activators of inflammation can block\ninflammatory progression. Lastly, we pinpoint the expression of AD-risk genes\nin microglial states and differential expression of AD-risk genes and their\nregulators during AD progression. Overall, we provide insights underlying\nmicroglial states, including state-specific and AD-stage-specific microglial\nalterations at unprecedented resolution. \n
## unit coords
## 1 UMAP_coordinates.coords.tsv.gz
## 2 UMAP_coordinates.coords.tsv.gz
## 3 UMAP_coordinates.coords.tsv.gz
## 4 UMAP_coordinates.coords.tsv.gz
## 5 UMAP_coordinates.coords.tsv.gz
## 6 UMAP_coordinates.coords.tsv.gz
## methods
## 1 \n<p>\nWe selected 427 individuals from the Religious Orders Study and Rush Memory and\nAging Project (ROSMAP), both ongoing longitudinal clinical-pathologic cohort\nstudies of aging and dementia, in which all participants are brain donors. The\nstudies include clinical data collected annually, detailed post- mortem\npathological evaluations, and extensive genetic, epigenomic, transcriptomic,\nproteomic, and metabolomic bulk-tissue profiling (DA et al., 2018) .\nIndividuals were balanced between sexes (male:female ratio 212:215). Informed\nconsent was obtained from each subject, and the Religious Orders Study and Rush\nMemory and Aging Project were each approved by an Institutional Review Board\n(IRB) of Rush University Medical Center. Participants also signed an Anatomic\nGift Act, and a repository consent to allow their data to be repurposed.\n\n<p>\nFor droplet-based snRNA-seq, libraries were prepared using the Chromium Single\nCell 3′ Reagent Kits v3 according to the manufacturer's protocol (10x\nGenomics). The generated snRNA-seq libraries were sequenced using NextSeq\n500/550 High Output v2 kits (150 cycles) or NovaSeq 6000 S2 Reagent Kits. Gene\ncounts were obtained by aligning reads to the GRCh38 genome using Cell Ranger\nsoftware (v.3.0.2) (10x Genomics) (Zheng et al., 2017). To account for\nunspliced nuclear transcripts, reads mapping to pre-mRNA were counted. After\nquantification of pre-mRNA using the Cell Ranger count pipeline, the Cell\nRanger aggr pipeline was used to aggregate all libraries (without equalizing\nthe read depth between groups) to generate a gene-count matrix. The Cell Ranger\n3.0 default parameters were used to call cell barcodes. We used SCANPY (Wolf,\nAngerer and Theis, 2018) to process and cluster the expression profiles and\ninfer cell identities of major cell classes. To remove doublets and\npoor-quality cells, cells were excluded from subsequent analysis if they were\nextreme outliers (observations outside the range [Q1 – k(Q3 – Q1),Q3 +\nk(Q3-Q1)], with k=3 and Q1 and Q3 as the lower and upper quartiles) in terms of\nnumber of genes, number of unique molecular identifiers (UMIs), and percentage\nof mitochondrial genes. In addition, we called doublets using Scrublet (Wolock,\nLopez and Klein, 2019) and flagged and removed cells that were labeled as\ndoublets.\n
## 2 \n<p>\nWe selected 427 individuals from ROSMAP (Mathys et al., co-submitted) and\nperformed snRNA-seq on the prefrontal cortex region of these samples. For 92 of\nthe samples, we carried out snATAC profiling as well. The 92 samples are with a\nvariety of AD-related pathological and cognitive measurements, and are grouped\ninto 48 controls (22 female; 26 male), 29 early-stage AD (15 female; 14 male)\nand 15 late-stage AD (9 female; 6 male) samples. The ages between AD groups are\nmatched, without significant difference). \n\n<p>\nWe used "cellranger-atac mkfastq" (V1.1.0) to demultiplex BCL files into Fastq\nraw sequencing data, and then ran "cellranger-atac count" to map the reads onto\nhuman reference genome (GRCh38) and obtain the fragment file of each sample. We\nthen utilized ArchR (V1.0.1) to process the snATAC-seq data, using fragment\nfiles as input (Granja et al. 2021). We performed the first round of doublet\nremoval within each sample using the "filterDoublets" function.\n\n<p>\nFor the "high-quality cell" analysis, we selected the cells with TSS enrichment\n> 6 and number of fragments between 1000 to 100,000. We performed Iterative LSI\ndimension reduction and clustering using a 500 bp tile matrix, with parameters\n"iterations=4, resolution=0.2, varFeat=50000". Cell embedding visualization was\nperformed using UMAP. We estimated gene expression score using ArchR, and then\nquantified the cell type signature of each cluster based on PsychENCODE marker\ngenes (PsychENCODE Consortium et al. 2015). We calculated the mean of gene\nscores of all the cell marker genes in each cell type for each cluster, which\nwere then used to assign cell types. We performed a second round of doublet\nremoval at cluster level by discarding the cell clusters that show ambiguous\ncell type marker signal and meanwhile locate between two clusters that show\nclear cell type signature.\n\n<p>\nFor the "erosion" analysis, we selected the cells with TSS enrichment > 1 and\nnumber of fragments between 1000 to 100,000. We performed Iterative LSI\ndimension reduction and clustering using a 500 bp tile matrix, with parameters\n"iterations=3, resolution=0.2, varFeat=50000". Cell embedding visualization was\nperformed using UMAP. We performed a second round of doublet removal at cluster\nlevel using the same strategy as "TSS enrichment > 6" analysis described above. \n
## 3 \n<p>\nWe selected 427 individuals from ROSMAP (Mathys et al., co-submitted) and\nperformed snRNA-seq on the prefrontal cortex region of these samples. For 92 of\nthe samples, we carried out snATAC profiling as well. The 92 samples are with a\nvariety of AD-related pathological and cognitive measurements, and are grouped\ninto 48 controls (22 female; 26 male), 29 early-stage AD (15 female; 14 male)\nand 15 late-stage AD (9 female; 6 male) samples. The ages between AD groups are\nmatched, without significant difference). \n\n<p>\nWe used "cellranger-atac mkfastq" (V1.1.0) to demultiplex BCL files into Fastq\nraw sequencing data, and then ran "cellranger-atac count" to map the reads onto\nhuman reference genome (GRCh38) and obtain the fragment file of each sample. We\nthen utilized ArchR (V1.0.1) to process the snATAC-seq data, using fragment\nfiles as input (Granja et al. 2021). We performed the first round of doublet\nremoval within each sample using the "filterDoublets" function.\n\n<p>\nFor the "high-quality cell" analysis, we selected the cells with TSS enrichment\n> 6 and number of fragments between 1000 to 100,000. We performed Iterative LSI\ndimension reduction and clustering using a 500 bp tile matrix, with parameters\n"iterations=4, resolution=0.2, varFeat=50000". Cell embedding visualization was\nperformed using UMAP. We estimated gene expression score using ArchR, and then\nquantified the cell type signature of each cluster based on PsychENCODE marker\ngenes (PsychENCODE Consortium et al. 2015). We calculated the mean of gene\nscores of all the cell marker genes in each cell type for each cluster, which\nwere then used to assign cell types. We performed a second round of doublet\nremoval at cluster level by discarding the cell clusters that show ambiguous\ncell type marker signal and meanwhile locate between two clusters that show\nclear cell type signature.\n\n<p>\nFor the "erosion" analysis, we selected the cells with TSS enrichment > 1 and\nnumber of fragments between 1000 to 100,000. We performed Iterative LSI\ndimension reduction and clustering using a 500 bp tile matrix, with parameters\n"iterations=3, resolution=0.2, varFeat=50000". Cell embedding visualization was\nperformed using UMAP. We performed a second round of doublet removal at cluster\nlevel using the same strategy as "TSS enrichment > 6" analysis described above. \n
## 4 \n<p>\nWe selected 427 individuals from ROSMAP (Mathys et al., co-submitted) and\nperformed snRNA-seq on the prefrontal cortex region of these samples. For 92 of\nthe samples, we carried out snATAC profiling as well. The 92 samples are with a\nvariety of AD-related pathological and cognitive measurements, and are grouped\ninto 48 controls (22 female; 26 male), 29 early-stage AD (15 female; 14 male)\nand 15 late-stage AD (9 female; 6 male) samples. The ages between AD groups are\nmatched, without significant difference). \n\n<p>\nWe used "cellranger-atac mkfastq" (V1.1.0) to demultiplex BCL files into Fastq\nraw sequencing data, and then ran "cellranger-atac count" to map the reads onto\nhuman reference genome (GRCh38) and obtain the fragment file of each sample. We\nthen utilized ArchR (V1.0.1) to process the snATAC-seq data, using fragment\nfiles as input (Granja et al. 2021). We performed the first round of doublet\nremoval within each sample using the "filterDoublets" function.\n\n<p>\nFor the "high-quality cell" analysis, we selected the cells with TSS enrichment\n> 6 and number of fragments between 1000 to 100,000. We performed Iterative LSI\ndimension reduction and clustering using a 500 bp tile matrix, with parameters\n"iterations=4, resolution=0.2, varFeat=50000". Cell embedding visualization was\nperformed using UMAP. We estimated gene expression score using ArchR, and then\nquantified the cell type signature of each cluster based on PsychENCODE marker\ngenes (PsychENCODE Consortium et al. 2015). We calculated the mean of gene\nscores of all the cell marker genes in each cell type for each cluster, which\nwere then used to assign cell types. We performed a second round of doublet\nremoval at cluster level by discarding the cell clusters that show ambiguous\ncell type marker signal and meanwhile locate between two clusters that show\nclear cell type signature.\n\n<p>\nFor the "erosion" analysis, we selected the cells with TSS enrichment > 1 and\nnumber of fragments between 1000 to 100,000. We performed Iterative LSI\ndimension reduction and clustering using a 500 bp tile matrix, with parameters\n"iterations=3, resolution=0.2, varFeat=50000". Cell embedding visualization was\nperformed using UMAP. We performed a second round of doublet removal at cluster\nlevel using the same strategy as "TSS enrichment > 6" analysis described above. \n
## 5 \n<p>\nWe selected 427 individuals from ROSMAP (Mathys et al., co-submitted) and\nperformed snRNA-seq on the prefrontal cortex region of these samples. For 92 of\nthe samples, we carried out snATAC profiling as well. The 92 samples are with a\nvariety of AD-related pathological and cognitive measurements, and are grouped\ninto 48 controls (22 female; 26 male), 29 early-stage AD (15 female; 14 male)\nand 15 late-stage AD (9 female; 6 male) samples. The ages between AD groups are\nmatched, without significant difference). \n\n<p>\nWe used "cellranger-atac mkfastq" (V1.1.0) to demultiplex BCL files into Fastq\nraw sequencing data, and then ran "cellranger-atac count" to map the reads onto\nhuman reference genome (GRCh38) and obtain the fragment file of each sample. We\nthen utilized ArchR (V1.0.1) to process the snATAC-seq data, using fragment\nfiles as input (Granja et al. 2021). We performed the first round of doublet\nremoval within each sample using the "filterDoublets" function.\n\n<p>\nFor the "high-quality cell" analysis, we selected the cells with TSS enrichment\n> 6 and number of fragments between 1000 to 100,000. We performed Iterative LSI\ndimension reduction and clustering using a 500 bp tile matrix, with parameters\n"iterations=4, resolution=0.2, varFeat=50000". Cell embedding visualization was\nperformed using UMAP. We estimated gene expression score using ArchR, and then\nquantified the cell type signature of each cluster based on PsychENCODE marker\ngenes (PsychENCODE Consortium et al. 2015). We calculated the mean of gene\nscores of all the cell marker genes in each cell type for each cluster, which\nwere then used to assign cell types. We performed a second round of doublet\nremoval at cluster level by discarding the cell clusters that show ambiguous\ncell type marker signal and meanwhile locate between two clusters that show\nclear cell type signature.\n\n<p>\nFor the "erosion" analysis, we selected the cells with TSS enrichment > 1 and\nnumber of fragments between 1000 to 100,000. We performed Iterative LSI\ndimension reduction and clustering using a 500 bp tile matrix, with parameters\n"iterations=3, resolution=0.2, varFeat=50000". Cell embedding visualization was\nperformed using UMAP. We performed a second round of doublet removal at cluster\nlevel using the same strategy as "TSS enrichment > 6" analysis described above. \n
## 6 \n<p>\nNuclei isolation from frozen postmortem brain tissue: We isolated nuclei from\nfrozen postmortem brain tissue as previously described (Mathys et al., Nature\n2019) with some modifications. Briefly, we homogenized the brain tissue in 700\nμL Homogenization buffer and filtered the homogenate through a 40 μm cell\nstrainer (Corning, NY), added 450 μL Working solution and loaded it as a 25%\nOptiPrep solution on top of a 30%/40% OptiPrep density gradient (750 μL 30%\nOptiPrep solution, 300 μL 40% OptiPrep solution). We separated the nuclei by\ncentrifugation using a fixed rotor, tabletop centrifuge (5 minutes, 10000g,\n4˚C). We collected the nuclei pellet at the 30%/40% interphase, transferred it\non a new tube, washed it twice with 1mL ice-cold PBS containing 0.04% BSA\n(centrifuged 3 minutes, 300g, 4˚C) and finally resuspended it in 100 μL PBS\ncontaining 0.04% BSA. After counting, we diluted the nuclei to a concentration\nof 1000 nuclei per μL. We used the isolated nuclei for the droplet-based 10x\nscRNA-seq assay, targeting 5000 nuclei per brain region and individual, and\nprepared libraries using the Chromium Single-Cell 3′ Reagent Kits v3 (10x\nGenomics, Pleasanton, CA) according to the manufacturer's protocol. We\nsequenced pooled libraries using the NovaSeq 6000 S2 sequencing kits (100\ncycles, Illumina).\n\n<p>\nsnRNA-seq data processing: We mapped the raw reads to human reference genome\nversion GRCh38 and quantified unique molecular identifiers (UMIs) counts for\neach gene in each cell using CellRanger software v3.0.1 (10x Genomics). We\npre-processed this count matrix (gene-cell) using the Seurat R package\nv.4.0.3). We kept the cells with more than 500 UMIs and less than 5%\nmitochondrial genes, and genes with expression at least in 50 cells for further\nanalysis. We normalized the counts by the total UMI counts for each cell,\nmultiplied by 10,000, and then log-transformed. We used the top 2,000 highly\nvariable genes for principal component analysis (PCA) and the top 30 principal\ncomponents (PCs) as inputs to perform UMAP. We used Harmony for batch\ncorrection. We used the resolution as 0.5 to identify clusters. We used\nDoubletFinder to estimate the potential doublets formed by two or more cells\nbased on the by default parameters. The cells with high doublet scores (0.2 as\ncutoff) were removed for further analysis.After generating clusters, one\ncluster showing high expression of markers of two or more cell types was also\ntreated as doublets and removed for further analysis.\n\n<p>\nIn Silico Sorting to enrich immune cells and cell type annotation in snRNA-seq\ndata: For the full datasets with all cell types (2.8 million cells), we first\nannotated the cell type for each cluster based on three widely-used canonical\nmarkers of major cell types in the brain (including excitatory and inhibitory\nneurons, astrocytes, oligodendrocytes, OPCs, microglia and vascular cells) and\na list of markers for immune cells. We also tested the enrichment of a large\nset of markers in highly expressed genes for each cluster to confirm the\nannotation based on several marker genes. We next calculated the cell type\nscores (i.e. astrocyte, oligodendrocyte, microglia, etc) for each cell, which\nwere represented by the average expression of a group of markers for each cell\ntype. The cells were then selected as microglia/immune cells for further\nintegrative analysis if and only if (1) the clusters that the cells belong to\nwere annotated as microglia/immune cells; and (2) the cells had the highest\nscore for microglia/immune cells, and 3) the score for microglia/immune cells\nwas 2-fold higher than the second highest score. For the selected\nmicroglia/immune cells, we followed the same pipeline to perform dimensional\nreduction and clustering with the same parameters as full datasets. We used the\nWilcoxon rank-sum test in Seurat with customized parameters (min.pct = 0.25,\nlogfc.threshold = 0.25) to identify highly expressed genes for each cluster\ncompared to all cells from other clusters.\n
## geo
## 1
## 2
## 3
## 4
## 5
## 6
The number of datasets and all available species:
# the number of datasets
nrow(ucsc.cb.samples)
## [1] 1318
# available species
unique(unlist(sapply(unique(gsub(pattern = "\\|parent", replacement = "", x = ucsc.cb.samples$organisms)), function(x) {
unlist(strsplit(x = x, split = ", "))
})))
## [1] "Human (H. sapiens)" "Mouse (M. musculus)"
## [3] "Rhesus macaque (M. mulatta)" "Chimp (P. troglodytes)"
## [5] "Brine Shrimp (A. franciscana)" "Canis lupus familiaris"
## [7] "Dog (C. familiaris)" "Human (H. Sapiens)"
## [9] "C. intestinalis" "C. robusta"
## [11] "Zebrafish (D. rerio)" "Fruit fly (D. melanogaster)"
## [13] "Horse (E. caballus)" "Hydra vulgaris"
## [15] "Capitella teleta" "Spongilla lacustris"
## [17] "H. symbiolongicarpus" "X. tropicalis"
## [19] "Marmoset (C. jacchus)" "P. leidyi"
## [21] "Bonobo (P. paniscus)" "Rat (R. norvegicus)"
## [23] "S. mansoni" "Starlet Sea Anemone (N. vectensis)"
## [25] "Nematostella vectensis" "Sea urchin (S. purpuratus)"
## [27] "Mouse lemur (T. microcebus)" "Human-Mouse Xenograft"
## [29] "Xenopus laevis"
Summary attributes
GEfetch2R
provides StatDBAttribute
to
summary attributes of UCSC Cell
Browser:
StatDBAttribute(
df = ucsc.cb.samples, filter = c("organism", "organ"),
database = "UCSC", combine = TRUE
)
## # A tibble: 301 × 3
## # Groups: organisms [29]
## organisms body_parts Num
## <chr> <chr> <int>
## 1 human (h. sapiens) brain 171
## 2 human (h. sapiens) eye 111
## 3 human (h. sapiens) retina 109
## 4 mouse (m. musculus) brain 96
## 5 human (h. sapiens) lung 63
## 6 human (h. sapiens) muscle 46
## 7 human (h. sapiens) skeletal muscle 40
## 8 human (h. sapiens) cortex 38
## 9 human (h. sapiens) blood 34
## 10 human (h. sapiens) heart 32
## # ℹ 291 more rows
Extract metadata
GEfetch2R
provides ExtractCBDatasets
to
filter metadata with collection,
sub-collection, organ, disease
status, organism, project and
cell number (The available values of these attributes
can be obtained with StatDBAttribute
except cell
number). All attributes except cell number support fuzzy match
with fuzzy.match
, this is useful when selecting
datasets.
hbb.sample.df <- ExtractCBDatasets(
all.samples.df = ucsc.cb.samples, organ = c("skeletal muscle"),
organism = "Human (H. sapiens)", cell.num = c(1000, 2000)
)
Show the metadata:
head(hbb.sample.df)
## name shortLabel
## 1 skeletal-muscle/embryonic/embryonic-wk7-8-myogenic Skeletal Muscle
## 2 skeletal-muscle/fetal/fetal-wk12-14-hindlimb Skeletal Muscle
## 3 skeletal-muscle/in-vitro/hx-protocol-wk4 Skeletal Muscle
## 4 skeletal-muscle/in-vitro/hx-protocol-wk6-myogenic Skeletal Muscle
## 5 skeletal-muscle/in-vitro/hx-protocol-wk8-myogenic Skeletal Muscle
## 6 skeletal-muscle/juvenile/juvenile-hindlimb Skeletal Muscle
## subLabel tags body_parts
## 1 Embryonic Week 7-8 Myogenic Subset muscle, skeletal muscle|parent
## 2 Fetal Week 12-14 Hindlimb Muscle muscle, skeletal muscle|parent
## 3 HX Protocol Week 4 Culture muscle, skeletal muscle|parent
## 4 HX Protocol Week 6 Myogenic Subset muscle, skeletal muscle|parent
## 5 HX Protocol Week 8 Myogenic Subset muscle, skeletal muscle|parent
## 6 Juvenile Hindlimb Muscle muscle, skeletal muscle|parent
## diseases organisms projects life_stages domains sources
## 1 Healthy|parent Human (H. sapiens)|parent NA
## 2 Healthy|parent Human (H. sapiens)|parent NA
## 3 Healthy|parent Human (H. sapiens)|parent NA
## 4 Healthy|parent Human (H. sapiens)|parent NA
## 5 Healthy|parent Human (H. sapiens)|parent NA
## 6 Healthy|parent Human (H. sapiens)|parent NA
## sampleCount assays matrix barcode feature matrixType
## 1 1448 exprMatrix.tsv.gz matrix
## 2 1545 exprMatrix.tsv.gz matrix
## 3 1562 exprMatrix.tsv.gz matrix
## 4 1598 exprMatrix.tsv.gz matrix
## 5 1350 exprMatrix.tsv.gz matrix
## 6 1982 exprMatrix.tsv.gz matrix
## title paper
## 1 Embryonic Week 7-8 Myogenic Subset
## 2 Fetal Week 12-14 Hindlimb Muscle
## 3 HX Protocol Week 4 Culture
## 4 HX Protocol Week 6 Myogenic Subset
## 5 HX Protocol Week 8 Myogenic Subset
## 6 Juvenile Hindlimb Muscle
## abstract
## 1 \nSingle cell transcriptomes of the skeletal muscle (SkM) lineage from hindlimbs\nof 7-8 week human embryos (Carnegie Stage: CS19-21). Cells are subjected to\nre-clustering to reveal potential heterogeneity within the myogenic subset.\nThree independent biological samples are included.\n
## 2 \nSingle cell transcriptomes of the total skeletal muscle tissues of\nhindlimbs from human fetuses of 12-14 weeks of development. Hematopoietic and\nendothelial cells are pre-excluded by FACS. Two independent biological samples\nare included.\n
## 3 \nSingle cell transcriptomes of cell cultures differentiated from human\npluripotent stem cells (hPSCs) following the HX protocol (Xi et al., Cell Rep,\n2017) for 4 weeks. Cells are either non-enriched or enriched using an\nendogenous PAX7-driven GFP reporter.\n
## 4 \nSingle cell transcriptomes of the skeletal muscle (SkM) lineage from human\npluripotent stem cell (hPSCs) differentiated for 6 weeks using the HX protocol\n(Xi et al., Cell Rep, 2017). Cells are subjected to re-clustering to reveal\npotential heterogeneity within the myogenic subset.\n
## 5 \nSingle cell transcriptomes of the skeletal muscle (SkM) lineage from human\npluripotent stem cell (hPSCs) differentiated for 8 weeks using the HX protocol\n(Xi et al., Cell Rep, 2017). Cells are subjected to re-clustering to reveal\npotential heterogeneity within the myogenic subset.\n
## 6 \nSingle cell transcriptomes of the gastrocnemius or quadriceps muscles from\nhuman juveniles. Hematopoietic and endothelial cells are pre-excluded by FACS.\nTwo independent biological samples at 7 and 11 years old are included.\n
## unit coords
## 1 Seurat_tsne.coords.tsv.gz
## 2 Seurat_tsne.coords.tsv.gz
## 3 Seurat_tsne.coords.tsv.gz
## 4 Seurat_tsne.coords.tsv.gz
## 5 Seurat_tsne.coords.tsv.gz
## 6 Seurat_tsne.coords.tsv.gz
## methods
## 1 <h3>Single cell preparation, cell capture and library construction</h3>\n<p>\nWhole hindlimbs of developmental week 5-9 human embryos and fetuses (feet\nexcluded for week 7.75-9), total hindlimb skeletal muscles of week 12-18 human\nfetuses, and gastrocnemius or quadriceps muscles from juvenile and adult human\nsubjects were digested into single cells. Dissociated cells were either sorted\n(fetal week 12 and above) to exclude the hematopoietic and endothelial lineages\nor directly used for downstream processing. For directed myogenic\ndifferentiation, human pluripotent stem cells (hPSCs) were differentiated using\nthree published protocols (HX protocol: Xi et al., Cell Rep, 2017; JC protocol:\nChal et al., Nat Biotechnol, 2015; MS protocol: Shelton et al., Stem Cell Rep,\n2014). Cell cultures were dissociated at indicated time points. Cells were\neither not enriched or sorted based on an endogenous PAX7-driven GFP reporter\nor cell surface combination of ERBB3+/NGFR+/HNK1- (Hicks, et al., Nat Cell\nBiol, 2018).\n</p>\n\n<p>\nPrepared single cell solutions were subjected to single cell capture and\ndroplet formation following instructions in the online Drop-seq protocol v.3.1\n(<a href="http://mccarrolllab.org/download/905/"\ntarget="_blank">http://mccarrolllab.org/download/905/</a>) and those published\nin the original Drop-seq paper (Macosko et al., Cell, 2015).\n</p>\n\n<h3>Raw read processing and computational analysis of single cell transcriptomes</h3>\n<p>\nThe raw sequencing reads were processed using the Drop-seq_tools-1.13 pipeline\nfrom the McCaroll lab\n(<a href="https://github.com/broadinstitute/Drop-seq/releases/tag/v1.13"\ntarget="_blank">https://github.com/broadinstitute/Drop-seq/releases/tag/v1.13</a>),\nfollowing the general guidelines from the Drop-seq Alignment Cookbook v1.2\n(<a href="https://github.com/broadinstitute/Drop-seq/files/2425535/Drop-seqAlignmentCookbookv1.2Jan2016.pdf"\ntarget="_blank">https://github.com/broadinstitute/Drop-seq/files/2425535/Drop-seqAlignmentCookbookv1.2Jan2016.pdf</a>)\n(Macosko et al., Cell, 2015). Indexed reads were aligned to the human reference\ngenome hg19 and digital gene expression matrices (DGEs) were generated by\ncounting the number of unique transcripts for each gene associated with each\ncell barcodes.\n</p>\n\n<p>\nDownstream computational analysis of scRNA-seq data was mainly performed using\nthe R package Seurat v2.3.3 (<a href="https://github.com/satijalab/seurat/releases/tag/v2.3.3"\ntarget="_blank">https://github.com/satijalab/seurat/releases/tag/v2.3.3</a>)\n(Butler et al., Nat Biotechnol, 2018) by largely following the standard guidelines from the Satija\nlab (<a href="https://satijalab.org/seurat/" target="_blank">https://satijalab.org/seurat/</a>).\nSeurat objects were generated with DGEs\nconstructed as described above. Violin plots of number of expressed genes and\nunique transcripts (nGene and nUMI, respectively) of each cell were generated\nand outliers with too high or too low nGene/nUMI were removed to exclude\npotential cell doublets/aggregates or low quality cells/cell debris,\nrespectively, based on each object/sample. Raw count data were normalized and\nscaled with regression on effects from cell cycle (Tirosh et al., Science,\n2016) and dissociation-associated stress response (van den Brink et al., Nat\nMethods, 2017). Highly variable genes from each dataset were obtained to\ncalculate the top variable PCs, which were further used in tSNE dimensional\nreduction and cell clustering. The skeletal muscle cell cluster from individual\ndataset was further isolated in silico and subjected to re-clustering to reveal\npotential myogenic subpopulations. Skeletal muscle stem and progenitor cells\nfrom different stage in vivo human samples were computationally purified and\nassembled to construct a trajectory of human myogenesis using diffusion map in\nSeurat. Myogenic progenitors derived from hPSCs in vitro were aligned to the\ntrajectory to determine their developmental identities.\n</p>\n
## 2 <h3>Single cell preparation, cell capture and library construction</h3>\n<p>\nWhole hindlimbs of developmental week 5-9 human embryos and fetuses (feet\nexcluded for week 7.75-9), total hindlimb skeletal muscles of week 12-18 human\nfetuses, and gastrocnemius or quadriceps muscles from juvenile and adult human\nsubjects were digested into single cells. Dissociated cells were either sorted\n(fetal week 12 and above) to exclude the hematopoietic and endothelial lineages\nor directly used for downstream processing. For directed myogenic\ndifferentiation, human pluripotent stem cells (hPSCs) were differentiated using\nthree published protocols (HX protocol: Xi et al., Cell Rep, 2017; JC protocol:\nChal et al., Nat Biotechnol, 2015; MS protocol: Shelton et al., Stem Cell Rep,\n2014). Cell cultures were dissociated at indicated time points. Cells were\neither not enriched or sorted based on an endogenous PAX7-driven GFP reporter\nor cell surface combination of ERBB3+/NGFR+/HNK1- (Hicks, et al., Nat Cell\nBiol, 2018).\n</p>\n\n<p>\nPrepared single cell solutions were subjected to single cell capture and\ndroplet formation following instructions in the online Drop-seq protocol v.3.1\n(<a href="http://mccarrolllab.org/download/905/"\ntarget="_blank">http://mccarrolllab.org/download/905/</a>) and those published\nin the original Drop-seq paper (Macosko et al., Cell, 2015).\n</p>\n\n<h3>Raw read processing and computational analysis of single cell transcriptomes</h3>\n<p>\nThe raw sequencing reads were processed using the Drop-seq_tools-1.13 pipeline\nfrom the McCaroll lab\n(<a href="https://github.com/broadinstitute/Drop-seq/releases/tag/v1.13"\ntarget="_blank">https://github.com/broadinstitute/Drop-seq/releases/tag/v1.13</a>),\nfollowing the general guidelines from the Drop-seq Alignment Cookbook v1.2\n(<a href="https://github.com/broadinstitute/Drop-seq/files/2425535/Drop-seqAlignmentCookbookv1.2Jan2016.pdf"\ntarget="_blank">https://github.com/broadinstitute/Drop-seq/files/2425535/Drop-seqAlignmentCookbookv1.2Jan2016.pdf</a>)\n(Macosko et al., Cell, 2015). Indexed reads were aligned to the human reference\ngenome hg19 and digital gene expression matrices (DGEs) were generated by\ncounting the number of unique transcripts for each gene associated with each\ncell barcodes.\n</p>\n\n<p>\nDownstream computational analysis of scRNA-seq data was mainly performed using\nthe R package Seurat v2.3.3 (<a href="https://github.com/satijalab/seurat/releases/tag/v2.3.3"\ntarget="_blank">https://github.com/satijalab/seurat/releases/tag/v2.3.3</a>)\n(Butler et al., Nat Biotechnol, 2018) by largely following the standard guidelines from the Satija\nlab (<a href="https://satijalab.org/seurat/" target="_blank">https://satijalab.org/seurat/</a>).\nSeurat objects were generated with DGEs\nconstructed as described above. Violin plots of number of expressed genes and\nunique transcripts (nGene and nUMI, respectively) of each cell were generated\nand outliers with too high or too low nGene/nUMI were removed to exclude\npotential cell doublets/aggregates or low quality cells/cell debris,\nrespectively, based on each object/sample. Raw count data were normalized and\nscaled with regression on effects from cell cycle (Tirosh et al., Science,\n2016) and dissociation-associated stress response (van den Brink et al., Nat\nMethods, 2017). Highly variable genes from each dataset were obtained to\ncalculate the top variable PCs, which were further used in tSNE dimensional\nreduction and cell clustering. The skeletal muscle cell cluster from individual\ndataset was further isolated in silico and subjected to re-clustering to reveal\npotential myogenic subpopulations. Skeletal muscle stem and progenitor cells\nfrom different stage in vivo human samples were computationally purified and\nassembled to construct a trajectory of human myogenesis using diffusion map in\nSeurat. Myogenic progenitors derived from hPSCs in vitro were aligned to the\ntrajectory to determine their developmental identities.\n</p>\n
## 3 <h3>Single cell preparation, cell capture and library construction</h3>\n<p>\nWhole hindlimbs of developmental week 5-9 human embryos and fetuses (feet\nexcluded for week 7.75-9), total hindlimb skeletal muscles of week 12-18 human\nfetuses, and gastrocnemius or quadriceps muscles from juvenile and adult human\nsubjects were digested into single cells. Dissociated cells were either sorted\n(fetal week 12 and above) to exclude the hematopoietic and endothelial lineages\nor directly used for downstream processing. For directed myogenic\ndifferentiation, human pluripotent stem cells (hPSCs) were differentiated using\nthree published protocols (HX protocol: Xi et al., Cell Rep, 2017; JC protocol:\nChal et al., Nat Biotechnol, 2015; MS protocol: Shelton et al., Stem Cell Rep,\n2014). Cell cultures were dissociated at indicated time points. Cells were\neither not enriched or sorted based on an endogenous PAX7-driven GFP reporter\nor cell surface combination of ERBB3+/NGFR+/HNK1- (Hicks, et al., Nat Cell\nBiol, 2018).\n</p>\n\n<p>\nPrepared single cell solutions were subjected to single cell capture and\ndroplet formation following instructions in the online Drop-seq protocol v.3.1\n(<a href="http://mccarrolllab.org/download/905/"\ntarget="_blank">http://mccarrolllab.org/download/905/</a>) and those published\nin the original Drop-seq paper (Macosko et al., Cell, 2015).\n</p>\n\n<h3>Raw read processing and computational analysis of single cell transcriptomes</h3>\n<p>\nThe raw sequencing reads were processed using the Drop-seq_tools-1.13 pipeline\nfrom the McCaroll lab\n(<a href="https://github.com/broadinstitute/Drop-seq/releases/tag/v1.13"\ntarget="_blank">https://github.com/broadinstitute/Drop-seq/releases/tag/v1.13</a>),\nfollowing the general guidelines from the Drop-seq Alignment Cookbook v1.2\n(<a href="https://github.com/broadinstitute/Drop-seq/files/2425535/Drop-seqAlignmentCookbookv1.2Jan2016.pdf"\ntarget="_blank">https://github.com/broadinstitute/Drop-seq/files/2425535/Drop-seqAlignmentCookbookv1.2Jan2016.pdf</a>)\n(Macosko et al., Cell, 2015). Indexed reads were aligned to the human reference\ngenome hg19 and digital gene expression matrices (DGEs) were generated by\ncounting the number of unique transcripts for each gene associated with each\ncell barcodes.\n</p>\n\n<p>\nDownstream computational analysis of scRNA-seq data was mainly performed using\nthe R package Seurat v2.3.3 (<a href="https://github.com/satijalab/seurat/releases/tag/v2.3.3"\ntarget="_blank">https://github.com/satijalab/seurat/releases/tag/v2.3.3</a>)\n(Butler et al., Nat Biotechnol, 2018) by largely following the standard guidelines from the Satija\nlab (<a href="https://satijalab.org/seurat/" target="_blank">https://satijalab.org/seurat/</a>).\nSeurat objects were generated with DGEs\nconstructed as described above. Violin plots of number of expressed genes and\nunique transcripts (nGene and nUMI, respectively) of each cell were generated\nand outliers with too high or too low nGene/nUMI were removed to exclude\npotential cell doublets/aggregates or low quality cells/cell debris,\nrespectively, based on each object/sample. Raw count data were normalized and\nscaled with regression on effects from cell cycle (Tirosh et al., Science,\n2016) and dissociation-associated stress response (van den Brink et al., Nat\nMethods, 2017). Highly variable genes from each dataset were obtained to\ncalculate the top variable PCs, which were further used in tSNE dimensional\nreduction and cell clustering. The skeletal muscle cell cluster from individual\ndataset was further isolated in silico and subjected to re-clustering to reveal\npotential myogenic subpopulations. Skeletal muscle stem and progenitor cells\nfrom different stage in vivo human samples were computationally purified and\nassembled to construct a trajectory of human myogenesis using diffusion map in\nSeurat. Myogenic progenitors derived from hPSCs in vitro were aligned to the\ntrajectory to determine their developmental identities.\n</p>\n
## 4 <h3>Single cell preparation, cell capture and library construction</h3>\n<p>\nWhole hindlimbs of developmental week 5-9 human embryos and fetuses (feet\nexcluded for week 7.75-9), total hindlimb skeletal muscles of week 12-18 human\nfetuses, and gastrocnemius or quadriceps muscles from juvenile and adult human\nsubjects were digested into single cells. Dissociated cells were either sorted\n(fetal week 12 and above) to exclude the hematopoietic and endothelial lineages\nor directly used for downstream processing. For directed myogenic\ndifferentiation, human pluripotent stem cells (hPSCs) were differentiated using\nthree published protocols (HX protocol: Xi et al., Cell Rep, 2017; JC protocol:\nChal et al., Nat Biotechnol, 2015; MS protocol: Shelton et al., Stem Cell Rep,\n2014). Cell cultures were dissociated at indicated time points. Cells were\neither not enriched or sorted based on an endogenous PAX7-driven GFP reporter\nor cell surface combination of ERBB3+/NGFR+/HNK1- (Hicks, et al., Nat Cell\nBiol, 2018).\n</p>\n\n<p>\nPrepared single cell solutions were subjected to single cell capture and\ndroplet formation following instructions in the online Drop-seq protocol v.3.1\n(<a href="http://mccarrolllab.org/download/905/"\ntarget="_blank">http://mccarrolllab.org/download/905/</a>) and those published\nin the original Drop-seq paper (Macosko et al., Cell, 2015).\n</p>\n\n<h3>Raw read processing and computational analysis of single cell transcriptomes</h3>\n<p>\nThe raw sequencing reads were processed using the Drop-seq_tools-1.13 pipeline\nfrom the McCaroll lab\n(<a href="https://github.com/broadinstitute/Drop-seq/releases/tag/v1.13"\ntarget="_blank">https://github.com/broadinstitute/Drop-seq/releases/tag/v1.13</a>),\nfollowing the general guidelines from the Drop-seq Alignment Cookbook v1.2\n(<a href="https://github.com/broadinstitute/Drop-seq/files/2425535/Drop-seqAlignmentCookbookv1.2Jan2016.pdf"\ntarget="_blank">https://github.com/broadinstitute/Drop-seq/files/2425535/Drop-seqAlignmentCookbookv1.2Jan2016.pdf</a>)\n(Macosko et al., Cell, 2015). Indexed reads were aligned to the human reference\ngenome hg19 and digital gene expression matrices (DGEs) were generated by\ncounting the number of unique transcripts for each gene associated with each\ncell barcodes.\n</p>\n\n<p>\nDownstream computational analysis of scRNA-seq data was mainly performed using\nthe R package Seurat v2.3.3 (<a href="https://github.com/satijalab/seurat/releases/tag/v2.3.3"\ntarget="_blank">https://github.com/satijalab/seurat/releases/tag/v2.3.3</a>)\n(Butler et al., Nat Biotechnol, 2018) by largely following the standard guidelines from the Satija\nlab (<a href="https://satijalab.org/seurat/" target="_blank">https://satijalab.org/seurat/</a>).\nSeurat objects were generated with DGEs\nconstructed as described above. Violin plots of number of expressed genes and\nunique transcripts (nGene and nUMI, respectively) of each cell were generated\nand outliers with too high or too low nGene/nUMI were removed to exclude\npotential cell doublets/aggregates or low quality cells/cell debris,\nrespectively, based on each object/sample. Raw count data were normalized and\nscaled with regression on effects from cell cycle (Tirosh et al., Science,\n2016) and dissociation-associated stress response (van den Brink et al., Nat\nMethods, 2017). Highly variable genes from each dataset were obtained to\ncalculate the top variable PCs, which were further used in tSNE dimensional\nreduction and cell clustering. The skeletal muscle cell cluster from individual\ndataset was further isolated in silico and subjected to re-clustering to reveal\npotential myogenic subpopulations. Skeletal muscle stem and progenitor cells\nfrom different stage in vivo human samples were computationally purified and\nassembled to construct a trajectory of human myogenesis using diffusion map in\nSeurat. Myogenic progenitors derived from hPSCs in vitro were aligned to the\ntrajectory to determine their developmental identities.\n</p>\n
## 5 <h3>Single cell preparation, cell capture and library construction</h3>\n<p>\nWhole hindlimbs of developmental week 5-9 human embryos and fetuses (feet\nexcluded for week 7.75-9), total hindlimb skeletal muscles of week 12-18 human\nfetuses, and gastrocnemius or quadriceps muscles from juvenile and adult human\nsubjects were digested into single cells. Dissociated cells were either sorted\n(fetal week 12 and above) to exclude the hematopoietic and endothelial lineages\nor directly used for downstream processing. For directed myogenic\ndifferentiation, human pluripotent stem cells (hPSCs) were differentiated using\nthree published protocols (HX protocol: Xi et al., Cell Rep, 2017; JC protocol:\nChal et al., Nat Biotechnol, 2015; MS protocol: Shelton et al., Stem Cell Rep,\n2014). Cell cultures were dissociated at indicated time points. Cells were\neither not enriched or sorted based on an endogenous PAX7-driven GFP reporter\nor cell surface combination of ERBB3+/NGFR+/HNK1- (Hicks, et al., Nat Cell\nBiol, 2018).\n</p>\n\n<p>\nPrepared single cell solutions were subjected to single cell capture and\ndroplet formation following instructions in the online Drop-seq protocol v.3.1\n(<a href="http://mccarrolllab.org/download/905/"\ntarget="_blank">http://mccarrolllab.org/download/905/</a>) and those published\nin the original Drop-seq paper (Macosko et al., Cell, 2015).\n</p>\n\n<h3>Raw read processing and computational analysis of single cell transcriptomes</h3>\n<p>\nThe raw sequencing reads were processed using the Drop-seq_tools-1.13 pipeline\nfrom the McCaroll lab\n(<a href="https://github.com/broadinstitute/Drop-seq/releases/tag/v1.13"\ntarget="_blank">https://github.com/broadinstitute/Drop-seq/releases/tag/v1.13</a>),\nfollowing the general guidelines from the Drop-seq Alignment Cookbook v1.2\n(<a href="https://github.com/broadinstitute/Drop-seq/files/2425535/Drop-seqAlignmentCookbookv1.2Jan2016.pdf"\ntarget="_blank">https://github.com/broadinstitute/Drop-seq/files/2425535/Drop-seqAlignmentCookbookv1.2Jan2016.pdf</a>)\n(Macosko et al., Cell, 2015). Indexed reads were aligned to the human reference\ngenome hg19 and digital gene expression matrices (DGEs) were generated by\ncounting the number of unique transcripts for each gene associated with each\ncell barcodes.\n</p>\n\n<p>\nDownstream computational analysis of scRNA-seq data was mainly performed using\nthe R package Seurat v2.3.3 (<a href="https://github.com/satijalab/seurat/releases/tag/v2.3.3"\ntarget="_blank">https://github.com/satijalab/seurat/releases/tag/v2.3.3</a>)\n(Butler et al., Nat Biotechnol, 2018) by largely following the standard guidelines from the Satija\nlab (<a href="https://satijalab.org/seurat/" target="_blank">https://satijalab.org/seurat/</a>).\nSeurat objects were generated with DGEs\nconstructed as described above. Violin plots of number of expressed genes and\nunique transcripts (nGene and nUMI, respectively) of each cell were generated\nand outliers with too high or too low nGene/nUMI were removed to exclude\npotential cell doublets/aggregates or low quality cells/cell debris,\nrespectively, based on each object/sample. Raw count data were normalized and\nscaled with regression on effects from cell cycle (Tirosh et al., Science,\n2016) and dissociation-associated stress response (van den Brink et al., Nat\nMethods, 2017). Highly variable genes from each dataset were obtained to\ncalculate the top variable PCs, which were further used in tSNE dimensional\nreduction and cell clustering. The skeletal muscle cell cluster from individual\ndataset was further isolated in silico and subjected to re-clustering to reveal\npotential myogenic subpopulations. Skeletal muscle stem and progenitor cells\nfrom different stage in vivo human samples were computationally purified and\nassembled to construct a trajectory of human myogenesis using diffusion map in\nSeurat. Myogenic progenitors derived from hPSCs in vitro were aligned to the\ntrajectory to determine their developmental identities.\n</p>\n
## 6 <h3>Single cell preparation, cell capture and library construction</h3>\n<p>\nWhole hindlimbs of developmental week 5-9 human embryos and fetuses (feet\nexcluded for week 7.75-9), total hindlimb skeletal muscles of week 12-18 human\nfetuses, and gastrocnemius or quadriceps muscles from juvenile and adult human\nsubjects were digested into single cells. Dissociated cells were either sorted\n(fetal week 12 and above) to exclude the hematopoietic and endothelial lineages\nor directly used for downstream processing. For directed myogenic\ndifferentiation, human pluripotent stem cells (hPSCs) were differentiated using\nthree published protocols (HX protocol: Xi et al., Cell Rep, 2017; JC protocol:\nChal et al., Nat Biotechnol, 2015; MS protocol: Shelton et al., Stem Cell Rep,\n2014). Cell cultures were dissociated at indicated time points. Cells were\neither not enriched or sorted based on an endogenous PAX7-driven GFP reporter\nor cell surface combination of ERBB3+/NGFR+/HNK1- (Hicks, et al., Nat Cell\nBiol, 2018).\n</p>\n\n<p>\nPrepared single cell solutions were subjected to single cell capture and\ndroplet formation following instructions in the online Drop-seq protocol v.3.1\n(<a href="http://mccarrolllab.org/download/905/"\ntarget="_blank">http://mccarrolllab.org/download/905/</a>) and those published\nin the original Drop-seq paper (Macosko et al., Cell, 2015).\n</p>\n\n<h3>Raw read processing and computational analysis of single cell transcriptomes</h3>\n<p>\nThe raw sequencing reads were processed using the Drop-seq_tools-1.13 pipeline\nfrom the McCaroll lab\n(<a href="https://github.com/broadinstitute/Drop-seq/releases/tag/v1.13"\ntarget="_blank">https://github.com/broadinstitute/Drop-seq/releases/tag/v1.13</a>),\nfollowing the general guidelines from the Drop-seq Alignment Cookbook v1.2\n(<a href="https://github.com/broadinstitute/Drop-seq/files/2425535/Drop-seqAlignmentCookbookv1.2Jan2016.pdf"\ntarget="_blank">https://github.com/broadinstitute/Drop-seq/files/2425535/Drop-seqAlignmentCookbookv1.2Jan2016.pdf</a>)\n(Macosko et al., Cell, 2015). Indexed reads were aligned to the human reference\ngenome hg19 and digital gene expression matrices (DGEs) were generated by\ncounting the number of unique transcripts for each gene associated with each\ncell barcodes.\n</p>\n\n<p>\nDownstream computational analysis of scRNA-seq data was mainly performed using\nthe R package Seurat v2.3.3 (<a href="https://github.com/satijalab/seurat/releases/tag/v2.3.3"\ntarget="_blank">https://github.com/satijalab/seurat/releases/tag/v2.3.3</a>)\n(Butler et al., Nat Biotechnol, 2018) by largely following the standard guidelines from the Satija\nlab (<a href="https://satijalab.org/seurat/" target="_blank">https://satijalab.org/seurat/</a>).\nSeurat objects were generated with DGEs\nconstructed as described above. Violin plots of number of expressed genes and\nunique transcripts (nGene and nUMI, respectively) of each cell were generated\nand outliers with too high or too low nGene/nUMI were removed to exclude\npotential cell doublets/aggregates or low quality cells/cell debris,\nrespectively, based on each object/sample. Raw count data were normalized and\nscaled with regression on effects from cell cycle (Tirosh et al., Science,\n2016) and dissociation-associated stress response (van den Brink et al., Nat\nMethods, 2017). Highly variable genes from each dataset were obtained to\ncalculate the top variable PCs, which were further used in tSNE dimensional\nreduction and cell clustering. The skeletal muscle cell cluster from individual\ndataset was further isolated in silico and subjected to re-clustering to reveal\npotential myogenic subpopulations. Skeletal muscle stem and progenitor cells\nfrom different stage in vivo human samples were computationally purified and\nassembled to construct a trajectory of human myogenesis using diffusion map in\nSeurat. Myogenic progenitors derived from hPSCs in vitro were aligned to the\ntrajectory to determine their developmental identities.\n</p>\n
## geo
## 1
## 2
## 3
## 4
## 5
## 6
Extract cell type composition
GEfetch2R
provides ExtractCBComposition
to
extract cell type annotation and composition.
hbb.sample.ct <- ExtractCBComposition(
json.folder = "/Volumes/soyabean/GEfetch2R/cell_browser/json",
sample.df = hbb.sample.df
)
Show the extracted cell type annotation and composition:
head(hbb.sample.ct)
## shortLabel subLabel CellType Num tags
## 1 Skeletal Muscle Embryonic Week 7-8 Myogenic Subset MP 785
## 2 Skeletal Muscle Embryonic Week 7-8 Myogenic Subset MB 303
## 3 Skeletal Muscle Embryonic Week 7-8 Myogenic Subset SkM.Mesen 264
## 4 Skeletal Muscle Embryonic Week 7-8 Myogenic Subset MC 96
## 5 Skeletal Muscle Fetal Week 12-14 Hindlimb Muscle MSC 822
## 6 Skeletal Muscle Fetal Week 12-14 Hindlimb Muscle SkM 554
## body_parts diseases organisms
## 1 muscle, skeletal muscle|parent Healthy|parent Human (H. sapiens)|parent
## 2 muscle, skeletal muscle|parent Healthy|parent Human (H. sapiens)|parent
## 3 muscle, skeletal muscle|parent Healthy|parent Human (H. sapiens)|parent
## 4 muscle, skeletal muscle|parent Healthy|parent Human (H. sapiens)|parent
## 5 muscle, skeletal muscle|parent Healthy|parent Human (H. sapiens)|parent
## 6 muscle, skeletal muscle|parent Healthy|parent Human (H. sapiens)|parent
## projects life_stages domains sources sampleCount assays
## 1 NA 1448
## 2 NA 1448
## 3 NA 1448
## 4 NA 1448
## 5 NA 1545
## 6 NA 1545
## title paper
## 1 Embryonic Week 7-8 Myogenic Subset
## 2 Embryonic Week 7-8 Myogenic Subset
## 3 Embryonic Week 7-8 Myogenic Subset
## 4 Embryonic Week 7-8 Myogenic Subset
## 5 Fetal Week 12-14 Hindlimb Muscle
## 6 Fetal Week 12-14 Hindlimb Muscle
## abstract
## 1 \nSingle cell transcriptomes of the skeletal muscle (SkM) lineage from hindlimbs\nof 7-8 week human embryos (Carnegie Stage: CS19-21). Cells are subjected to\nre-clustering to reveal potential heterogeneity within the myogenic subset.\nThree independent biological samples are included.\n
## 2 \nSingle cell transcriptomes of the skeletal muscle (SkM) lineage from hindlimbs\nof 7-8 week human embryos (Carnegie Stage: CS19-21). Cells are subjected to\nre-clustering to reveal potential heterogeneity within the myogenic subset.\nThree independent biological samples are included.\n
## 3 \nSingle cell transcriptomes of the skeletal muscle (SkM) lineage from hindlimbs\nof 7-8 week human embryos (Carnegie Stage: CS19-21). Cells are subjected to\nre-clustering to reveal potential heterogeneity within the myogenic subset.\nThree independent biological samples are included.\n
## 4 \nSingle cell transcriptomes of the skeletal muscle (SkM) lineage from hindlimbs\nof 7-8 week human embryos (Carnegie Stage: CS19-21). Cells are subjected to\nre-clustering to reveal potential heterogeneity within the myogenic subset.\nThree independent biological samples are included.\n
## 5 \nSingle cell transcriptomes of the total skeletal muscle tissues of\nhindlimbs from human fetuses of 12-14 weeks of development. Hematopoietic and\nendothelial cells are pre-excluded by FACS. Two independent biological samples\nare included.\n
## 6 \nSingle cell transcriptomes of the total skeletal muscle tissues of\nhindlimbs from human fetuses of 12-14 weeks of development. Hematopoietic and\nendothelial cells are pre-excluded by FACS. Two independent biological samples\nare included.\n
## methods
## 1 <h3>Single cell preparation, cell capture and library construction</h3>\n<p>\nWhole hindlimbs of developmental week 5-9 human embryos and fetuses (feet\nexcluded for week 7.75-9), total hindlimb skeletal muscles of week 12-18 human\nfetuses, and gastrocnemius or quadriceps muscles from juvenile and adult human\nsubjects were digested into single cells. Dissociated cells were either sorted\n(fetal week 12 and above) to exclude the hematopoietic and endothelial lineages\nor directly used for downstream processing. For directed myogenic\ndifferentiation, human pluripotent stem cells (hPSCs) were differentiated using\nthree published protocols (HX protocol: Xi et al., Cell Rep, 2017; JC protocol:\nChal et al., Nat Biotechnol, 2015; MS protocol: Shelton et al., Stem Cell Rep,\n2014). Cell cultures were dissociated at indicated time points. Cells were\neither not enriched or sorted based on an endogenous PAX7-driven GFP reporter\nor cell surface combination of ERBB3+/NGFR+/HNK1- (Hicks, et al., Nat Cell\nBiol, 2018).\n</p>\n\n<p>\nPrepared single cell solutions were subjected to single cell capture and\ndroplet formation following instructions in the online Drop-seq protocol v.3.1\n(<a href="http://mccarrolllab.org/download/905/"\ntarget="_blank">http://mccarrolllab.org/download/905/</a>) and those published\nin the original Drop-seq paper (Macosko et al., Cell, 2015).\n</p>\n\n<h3>Raw read processing and computational analysis of single cell transcriptomes</h3>\n<p>\nThe raw sequencing reads were processed using the Drop-seq_tools-1.13 pipeline\nfrom the McCaroll lab\n(<a href="https://github.com/broadinstitute/Drop-seq/releases/tag/v1.13"\ntarget="_blank">https://github.com/broadinstitute/Drop-seq/releases/tag/v1.13</a>),\nfollowing the general guidelines from the Drop-seq Alignment Cookbook v1.2\n(<a href="https://github.com/broadinstitute/Drop-seq/files/2425535/Drop-seqAlignmentCookbookv1.2Jan2016.pdf"\ntarget="_blank">https://github.com/broadinstitute/Drop-seq/files/2425535/Drop-seqAlignmentCookbookv1.2Jan2016.pdf</a>)\n(Macosko et al., Cell, 2015). Indexed reads were aligned to the human reference\ngenome hg19 and digital gene expression matrices (DGEs) were generated by\ncounting the number of unique transcripts for each gene associated with each\ncell barcodes.\n</p>\n\n<p>\nDownstream computational analysis of scRNA-seq data was mainly performed using\nthe R package Seurat v2.3.3 (<a href="https://github.com/satijalab/seurat/releases/tag/v2.3.3"\ntarget="_blank">https://github.com/satijalab/seurat/releases/tag/v2.3.3</a>)\n(Butler et al., Nat Biotechnol, 2018) by largely following the standard guidelines from the Satija\nlab (<a href="https://satijalab.org/seurat/" target="_blank">https://satijalab.org/seurat/</a>).\nSeurat objects were generated with DGEs\nconstructed as described above. Violin plots of number of expressed genes and\nunique transcripts (nGene and nUMI, respectively) of each cell were generated\nand outliers with too high or too low nGene/nUMI were removed to exclude\npotential cell doublets/aggregates or low quality cells/cell debris,\nrespectively, based on each object/sample. Raw count data were normalized and\nscaled with regression on effects from cell cycle (Tirosh et al., Science,\n2016) and dissociation-associated stress response (van den Brink et al., Nat\nMethods, 2017). Highly variable genes from each dataset were obtained to\ncalculate the top variable PCs, which were further used in tSNE dimensional\nreduction and cell clustering. The skeletal muscle cell cluster from individual\ndataset was further isolated in silico and subjected to re-clustering to reveal\npotential myogenic subpopulations. Skeletal muscle stem and progenitor cells\nfrom different stage in vivo human samples were computationally purified and\nassembled to construct a trajectory of human myogenesis using diffusion map in\nSeurat. Myogenic progenitors derived from hPSCs in vitro were aligned to the\ntrajectory to determine their developmental identities.\n</p>\n
## 2 <h3>Single cell preparation, cell capture and library construction</h3>\n<p>\nWhole hindlimbs of developmental week 5-9 human embryos and fetuses (feet\nexcluded for week 7.75-9), total hindlimb skeletal muscles of week 12-18 human\nfetuses, and gastrocnemius or quadriceps muscles from juvenile and adult human\nsubjects were digested into single cells. Dissociated cells were either sorted\n(fetal week 12 and above) to exclude the hematopoietic and endothelial lineages\nor directly used for downstream processing. For directed myogenic\ndifferentiation, human pluripotent stem cells (hPSCs) were differentiated using\nthree published protocols (HX protocol: Xi et al., Cell Rep, 2017; JC protocol:\nChal et al., Nat Biotechnol, 2015; MS protocol: Shelton et al., Stem Cell Rep,\n2014). Cell cultures were dissociated at indicated time points. Cells were\neither not enriched or sorted based on an endogenous PAX7-driven GFP reporter\nor cell surface combination of ERBB3+/NGFR+/HNK1- (Hicks, et al., Nat Cell\nBiol, 2018).\n</p>\n\n<p>\nPrepared single cell solutions were subjected to single cell capture and\ndroplet formation following instructions in the online Drop-seq protocol v.3.1\n(<a href="http://mccarrolllab.org/download/905/"\ntarget="_blank">http://mccarrolllab.org/download/905/</a>) and those published\nin the original Drop-seq paper (Macosko et al., Cell, 2015).\n</p>\n\n<h3>Raw read processing and computational analysis of single cell transcriptomes</h3>\n<p>\nThe raw sequencing reads were processed using the Drop-seq_tools-1.13 pipeline\nfrom the McCaroll lab\n(<a href="https://github.com/broadinstitute/Drop-seq/releases/tag/v1.13"\ntarget="_blank">https://github.com/broadinstitute/Drop-seq/releases/tag/v1.13</a>),\nfollowing the general guidelines from the Drop-seq Alignment Cookbook v1.2\n(<a href="https://github.com/broadinstitute/Drop-seq/files/2425535/Drop-seqAlignmentCookbookv1.2Jan2016.pdf"\ntarget="_blank">https://github.com/broadinstitute/Drop-seq/files/2425535/Drop-seqAlignmentCookbookv1.2Jan2016.pdf</a>)\n(Macosko et al., Cell, 2015). Indexed reads were aligned to the human reference\ngenome hg19 and digital gene expression matrices (DGEs) were generated by\ncounting the number of unique transcripts for each gene associated with each\ncell barcodes.\n</p>\n\n<p>\nDownstream computational analysis of scRNA-seq data was mainly performed using\nthe R package Seurat v2.3.3 (<a href="https://github.com/satijalab/seurat/releases/tag/v2.3.3"\ntarget="_blank">https://github.com/satijalab/seurat/releases/tag/v2.3.3</a>)\n(Butler et al., Nat Biotechnol, 2018) by largely following the standard guidelines from the Satija\nlab (<a href="https://satijalab.org/seurat/" target="_blank">https://satijalab.org/seurat/</a>).\nSeurat objects were generated with DGEs\nconstructed as described above. Violin plots of number of expressed genes and\nunique transcripts (nGene and nUMI, respectively) of each cell were generated\nand outliers with too high or too low nGene/nUMI were removed to exclude\npotential cell doublets/aggregates or low quality cells/cell debris,\nrespectively, based on each object/sample. Raw count data were normalized and\nscaled with regression on effects from cell cycle (Tirosh et al., Science,\n2016) and dissociation-associated stress response (van den Brink et al., Nat\nMethods, 2017). Highly variable genes from each dataset were obtained to\ncalculate the top variable PCs, which were further used in tSNE dimensional\nreduction and cell clustering. The skeletal muscle cell cluster from individual\ndataset was further isolated in silico and subjected to re-clustering to reveal\npotential myogenic subpopulations. Skeletal muscle stem and progenitor cells\nfrom different stage in vivo human samples were computationally purified and\nassembled to construct a trajectory of human myogenesis using diffusion map in\nSeurat. Myogenic progenitors derived from hPSCs in vitro were aligned to the\ntrajectory to determine their developmental identities.\n</p>\n
## 3 <h3>Single cell preparation, cell capture and library construction</h3>\n<p>\nWhole hindlimbs of developmental week 5-9 human embryos and fetuses (feet\nexcluded for week 7.75-9), total hindlimb skeletal muscles of week 12-18 human\nfetuses, and gastrocnemius or quadriceps muscles from juvenile and adult human\nsubjects were digested into single cells. Dissociated cells were either sorted\n(fetal week 12 and above) to exclude the hematopoietic and endothelial lineages\nor directly used for downstream processing. For directed myogenic\ndifferentiation, human pluripotent stem cells (hPSCs) were differentiated using\nthree published protocols (HX protocol: Xi et al., Cell Rep, 2017; JC protocol:\nChal et al., Nat Biotechnol, 2015; MS protocol: Shelton et al., Stem Cell Rep,\n2014). Cell cultures were dissociated at indicated time points. Cells were\neither not enriched or sorted based on an endogenous PAX7-driven GFP reporter\nor cell surface combination of ERBB3+/NGFR+/HNK1- (Hicks, et al., Nat Cell\nBiol, 2018).\n</p>\n\n<p>\nPrepared single cell solutions were subjected to single cell capture and\ndroplet formation following instructions in the online Drop-seq protocol v.3.1\n(<a href="http://mccarrolllab.org/download/905/"\ntarget="_blank">http://mccarrolllab.org/download/905/</a>) and those published\nin the original Drop-seq paper (Macosko et al., Cell, 2015).\n</p>\n\n<h3>Raw read processing and computational analysis of single cell transcriptomes</h3>\n<p>\nThe raw sequencing reads were processed using the Drop-seq_tools-1.13 pipeline\nfrom the McCaroll lab\n(<a href="https://github.com/broadinstitute/Drop-seq/releases/tag/v1.13"\ntarget="_blank">https://github.com/broadinstitute/Drop-seq/releases/tag/v1.13</a>),\nfollowing the general guidelines from the Drop-seq Alignment Cookbook v1.2\n(<a href="https://github.com/broadinstitute/Drop-seq/files/2425535/Drop-seqAlignmentCookbookv1.2Jan2016.pdf"\ntarget="_blank">https://github.com/broadinstitute/Drop-seq/files/2425535/Drop-seqAlignmentCookbookv1.2Jan2016.pdf</a>)\n(Macosko et al., Cell, 2015). Indexed reads were aligned to the human reference\ngenome hg19 and digital gene expression matrices (DGEs) were generated by\ncounting the number of unique transcripts for each gene associated with each\ncell barcodes.\n</p>\n\n<p>\nDownstream computational analysis of scRNA-seq data was mainly performed using\nthe R package Seurat v2.3.3 (<a href="https://github.com/satijalab/seurat/releases/tag/v2.3.3"\ntarget="_blank">https://github.com/satijalab/seurat/releases/tag/v2.3.3</a>)\n(Butler et al., Nat Biotechnol, 2018) by largely following the standard guidelines from the Satija\nlab (<a href="https://satijalab.org/seurat/" target="_blank">https://satijalab.org/seurat/</a>).\nSeurat objects were generated with DGEs\nconstructed as described above. Violin plots of number of expressed genes and\nunique transcripts (nGene and nUMI, respectively) of each cell were generated\nand outliers with too high or too low nGene/nUMI were removed to exclude\npotential cell doublets/aggregates or low quality cells/cell debris,\nrespectively, based on each object/sample. Raw count data were normalized and\nscaled with regression on effects from cell cycle (Tirosh et al., Science,\n2016) and dissociation-associated stress response (van den Brink et al., Nat\nMethods, 2017). Highly variable genes from each dataset were obtained to\ncalculate the top variable PCs, which were further used in tSNE dimensional\nreduction and cell clustering. The skeletal muscle cell cluster from individual\ndataset was further isolated in silico and subjected to re-clustering to reveal\npotential myogenic subpopulations. Skeletal muscle stem and progenitor cells\nfrom different stage in vivo human samples were computationally purified and\nassembled to construct a trajectory of human myogenesis using diffusion map in\nSeurat. Myogenic progenitors derived from hPSCs in vitro were aligned to the\ntrajectory to determine their developmental identities.\n</p>\n
## 4 <h3>Single cell preparation, cell capture and library construction</h3>\n<p>\nWhole hindlimbs of developmental week 5-9 human embryos and fetuses (feet\nexcluded for week 7.75-9), total hindlimb skeletal muscles of week 12-18 human\nfetuses, and gastrocnemius or quadriceps muscles from juvenile and adult human\nsubjects were digested into single cells. Dissociated cells were either sorted\n(fetal week 12 and above) to exclude the hematopoietic and endothelial lineages\nor directly used for downstream processing. For directed myogenic\ndifferentiation, human pluripotent stem cells (hPSCs) were differentiated using\nthree published protocols (HX protocol: Xi et al., Cell Rep, 2017; JC protocol:\nChal et al., Nat Biotechnol, 2015; MS protocol: Shelton et al., Stem Cell Rep,\n2014). Cell cultures were dissociated at indicated time points. Cells were\neither not enriched or sorted based on an endogenous PAX7-driven GFP reporter\nor cell surface combination of ERBB3+/NGFR+/HNK1- (Hicks, et al., Nat Cell\nBiol, 2018).\n</p>\n\n<p>\nPrepared single cell solutions were subjected to single cell capture and\ndroplet formation following instructions in the online Drop-seq protocol v.3.1\n(<a href="http://mccarrolllab.org/download/905/"\ntarget="_blank">http://mccarrolllab.org/download/905/</a>) and those published\nin the original Drop-seq paper (Macosko et al., Cell, 2015).\n</p>\n\n<h3>Raw read processing and computational analysis of single cell transcriptomes</h3>\n<p>\nThe raw sequencing reads were processed using the Drop-seq_tools-1.13 pipeline\nfrom the McCaroll lab\n(<a href="https://github.com/broadinstitute/Drop-seq/releases/tag/v1.13"\ntarget="_blank">https://github.com/broadinstitute/Drop-seq/releases/tag/v1.13</a>),\nfollowing the general guidelines from the Drop-seq Alignment Cookbook v1.2\n(<a href="https://github.com/broadinstitute/Drop-seq/files/2425535/Drop-seqAlignmentCookbookv1.2Jan2016.pdf"\ntarget="_blank">https://github.com/broadinstitute/Drop-seq/files/2425535/Drop-seqAlignmentCookbookv1.2Jan2016.pdf</a>)\n(Macosko et al., Cell, 2015). Indexed reads were aligned to the human reference\ngenome hg19 and digital gene expression matrices (DGEs) were generated by\ncounting the number of unique transcripts for each gene associated with each\ncell barcodes.\n</p>\n\n<p>\nDownstream computational analysis of scRNA-seq data was mainly performed using\nthe R package Seurat v2.3.3 (<a href="https://github.com/satijalab/seurat/releases/tag/v2.3.3"\ntarget="_blank">https://github.com/satijalab/seurat/releases/tag/v2.3.3</a>)\n(Butler et al., Nat Biotechnol, 2018) by largely following the standard guidelines from the Satija\nlab (<a href="https://satijalab.org/seurat/" target="_blank">https://satijalab.org/seurat/</a>).\nSeurat objects were generated with DGEs\nconstructed as described above. Violin plots of number of expressed genes and\nunique transcripts (nGene and nUMI, respectively) of each cell were generated\nand outliers with too high or too low nGene/nUMI were removed to exclude\npotential cell doublets/aggregates or low quality cells/cell debris,\nrespectively, based on each object/sample. Raw count data were normalized and\nscaled with regression on effects from cell cycle (Tirosh et al., Science,\n2016) and dissociation-associated stress response (van den Brink et al., Nat\nMethods, 2017). Highly variable genes from each dataset were obtained to\ncalculate the top variable PCs, which were further used in tSNE dimensional\nreduction and cell clustering. The skeletal muscle cell cluster from individual\ndataset was further isolated in silico and subjected to re-clustering to reveal\npotential myogenic subpopulations. Skeletal muscle stem and progenitor cells\nfrom different stage in vivo human samples were computationally purified and\nassembled to construct a trajectory of human myogenesis using diffusion map in\nSeurat. Myogenic progenitors derived from hPSCs in vitro were aligned to the\ntrajectory to determine their developmental identities.\n</p>\n
## 5 <h3>Single cell preparation, cell capture and library construction</h3>\n<p>\nWhole hindlimbs of developmental week 5-9 human embryos and fetuses (feet\nexcluded for week 7.75-9), total hindlimb skeletal muscles of week 12-18 human\nfetuses, and gastrocnemius or quadriceps muscles from juvenile and adult human\nsubjects were digested into single cells. Dissociated cells were either sorted\n(fetal week 12 and above) to exclude the hematopoietic and endothelial lineages\nor directly used for downstream processing. For directed myogenic\ndifferentiation, human pluripotent stem cells (hPSCs) were differentiated using\nthree published protocols (HX protocol: Xi et al., Cell Rep, 2017; JC protocol:\nChal et al., Nat Biotechnol, 2015; MS protocol: Shelton et al., Stem Cell Rep,\n2014). Cell cultures were dissociated at indicated time points. Cells were\neither not enriched or sorted based on an endogenous PAX7-driven GFP reporter\nor cell surface combination of ERBB3+/NGFR+/HNK1- (Hicks, et al., Nat Cell\nBiol, 2018).\n</p>\n\n<p>\nPrepared single cell solutions were subjected to single cell capture and\ndroplet formation following instructions in the online Drop-seq protocol v.3.1\n(<a href="http://mccarrolllab.org/download/905/"\ntarget="_blank">http://mccarrolllab.org/download/905/</a>) and those published\nin the original Drop-seq paper (Macosko et al., Cell, 2015).\n</p>\n\n<h3>Raw read processing and computational analysis of single cell transcriptomes</h3>\n<p>\nThe raw sequencing reads were processed using the Drop-seq_tools-1.13 pipeline\nfrom the McCaroll lab\n(<a href="https://github.com/broadinstitute/Drop-seq/releases/tag/v1.13"\ntarget="_blank">https://github.com/broadinstitute/Drop-seq/releases/tag/v1.13</a>),\nfollowing the general guidelines from the Drop-seq Alignment Cookbook v1.2\n(<a href="https://github.com/broadinstitute/Drop-seq/files/2425535/Drop-seqAlignmentCookbookv1.2Jan2016.pdf"\ntarget="_blank">https://github.com/broadinstitute/Drop-seq/files/2425535/Drop-seqAlignmentCookbookv1.2Jan2016.pdf</a>)\n(Macosko et al., Cell, 2015). Indexed reads were aligned to the human reference\ngenome hg19 and digital gene expression matrices (DGEs) were generated by\ncounting the number of unique transcripts for each gene associated with each\ncell barcodes.\n</p>\n\n<p>\nDownstream computational analysis of scRNA-seq data was mainly performed using\nthe R package Seurat v2.3.3 (<a href="https://github.com/satijalab/seurat/releases/tag/v2.3.3"\ntarget="_blank">https://github.com/satijalab/seurat/releases/tag/v2.3.3</a>)\n(Butler et al., Nat Biotechnol, 2018) by largely following the standard guidelines from the Satija\nlab (<a href="https://satijalab.org/seurat/" target="_blank">https://satijalab.org/seurat/</a>).\nSeurat objects were generated with DGEs\nconstructed as described above. Violin plots of number of expressed genes and\nunique transcripts (nGene and nUMI, respectively) of each cell were generated\nand outliers with too high or too low nGene/nUMI were removed to exclude\npotential cell doublets/aggregates or low quality cells/cell debris,\nrespectively, based on each object/sample. Raw count data were normalized and\nscaled with regression on effects from cell cycle (Tirosh et al., Science,\n2016) and dissociation-associated stress response (van den Brink et al., Nat\nMethods, 2017). Highly variable genes from each dataset were obtained to\ncalculate the top variable PCs, which were further used in tSNE dimensional\nreduction and cell clustering. The skeletal muscle cell cluster from individual\ndataset was further isolated in silico and subjected to re-clustering to reveal\npotential myogenic subpopulations. Skeletal muscle stem and progenitor cells\nfrom different stage in vivo human samples were computationally purified and\nassembled to construct a trajectory of human myogenesis using diffusion map in\nSeurat. Myogenic progenitors derived from hPSCs in vitro were aligned to the\ntrajectory to determine their developmental identities.\n</p>\n
## 6 <h3>Single cell preparation, cell capture and library construction</h3>\n<p>\nWhole hindlimbs of developmental week 5-9 human embryos and fetuses (feet\nexcluded for week 7.75-9), total hindlimb skeletal muscles of week 12-18 human\nfetuses, and gastrocnemius or quadriceps muscles from juvenile and adult human\nsubjects were digested into single cells. Dissociated cells were either sorted\n(fetal week 12 and above) to exclude the hematopoietic and endothelial lineages\nor directly used for downstream processing. For directed myogenic\ndifferentiation, human pluripotent stem cells (hPSCs) were differentiated using\nthree published protocols (HX protocol: Xi et al., Cell Rep, 2017; JC protocol:\nChal et al., Nat Biotechnol, 2015; MS protocol: Shelton et al., Stem Cell Rep,\n2014). Cell cultures were dissociated at indicated time points. Cells were\neither not enriched or sorted based on an endogenous PAX7-driven GFP reporter\nor cell surface combination of ERBB3+/NGFR+/HNK1- (Hicks, et al., Nat Cell\nBiol, 2018).\n</p>\n\n<p>\nPrepared single cell solutions were subjected to single cell capture and\ndroplet formation following instructions in the online Drop-seq protocol v.3.1\n(<a href="http://mccarrolllab.org/download/905/"\ntarget="_blank">http://mccarrolllab.org/download/905/</a>) and those published\nin the original Drop-seq paper (Macosko et al., Cell, 2015).\n</p>\n\n<h3>Raw read processing and computational analysis of single cell transcriptomes</h3>\n<p>\nThe raw sequencing reads were processed using the Drop-seq_tools-1.13 pipeline\nfrom the McCaroll lab\n(<a href="https://github.com/broadinstitute/Drop-seq/releases/tag/v1.13"\ntarget="_blank">https://github.com/broadinstitute/Drop-seq/releases/tag/v1.13</a>),\nfollowing the general guidelines from the Drop-seq Alignment Cookbook v1.2\n(<a href="https://github.com/broadinstitute/Drop-seq/files/2425535/Drop-seqAlignmentCookbookv1.2Jan2016.pdf"\ntarget="_blank">https://github.com/broadinstitute/Drop-seq/files/2425535/Drop-seqAlignmentCookbookv1.2Jan2016.pdf</a>)\n(Macosko et al., Cell, 2015). Indexed reads were aligned to the human reference\ngenome hg19 and digital gene expression matrices (DGEs) were generated by\ncounting the number of unique transcripts for each gene associated with each\ncell barcodes.\n</p>\n\n<p>\nDownstream computational analysis of scRNA-seq data was mainly performed using\nthe R package Seurat v2.3.3 (<a href="https://github.com/satijalab/seurat/releases/tag/v2.3.3"\ntarget="_blank">https://github.com/satijalab/seurat/releases/tag/v2.3.3</a>)\n(Butler et al., Nat Biotechnol, 2018) by largely following the standard guidelines from the Satija\nlab (<a href="https://satijalab.org/seurat/" target="_blank">https://satijalab.org/seurat/</a>).\nSeurat objects were generated with DGEs\nconstructed as described above. Violin plots of number of expressed genes and\nunique transcripts (nGene and nUMI, respectively) of each cell were generated\nand outliers with too high or too low nGene/nUMI were removed to exclude\npotential cell doublets/aggregates or low quality cells/cell debris,\nrespectively, based on each object/sample. Raw count data were normalized and\nscaled with regression on effects from cell cycle (Tirosh et al., Science,\n2016) and dissociation-associated stress response (van den Brink et al., Nat\nMethods, 2017). Highly variable genes from each dataset were obtained to\ncalculate the top variable PCs, which were further used in tSNE dimensional\nreduction and cell clustering. The skeletal muscle cell cluster from individual\ndataset was further isolated in silico and subjected to re-clustering to reveal\npotential myogenic subpopulations. Skeletal muscle stem and progenitor cells\nfrom different stage in vivo human samples were computationally purified and\nassembled to construct a trajectory of human myogenesis using diffusion map in\nSeurat. Myogenic progenitors derived from hPSCs in vitro were aligned to the\ntrajectory to determine their developmental identities.\n</p>\n
## geo
## 1
## 2
## 3
## 4
## 5
## 6
Load the online datasets to Seurat
After manually check the extracted metadata, GEfetch2R
provides ParseCBDatasets
to load the online count
matrix to Seurat. All the attributes available in
ExtractCBDatasets
are also same here. Please note that the
loading process provided by ParseCBDatasets
will load the
online count matrix instead of downloading it to local. If multiple
datasets available, users can choose to merge the SeuratObject with
merge
.
ParseCBDatasets
supports extracting subset with
metadata and gene:
# parse the whole datasets
hbb.sample.seu <- ParseCBDatasets(sample.df = hbb.sample.df)
# subset metadata and gene
hbb.sample.seu <- ParseCBDatasets(
sample.df = hbb.sample.df, obs.value.filter = "Cell.Type == 'MP' & Phase == 'G2M'",
include.genes = c(
"PAX7", "MYF5", "C1QTNF3", "MYOD1", "MYOG", "RASSF4", "MYH3", "MYL4",
"TNNT3", "PDGFRA", "OGN", "COL3A1"
)
)
Show the returned SeuratObject
:
hbb.sample.seu
## An object of class Seurat
## 14 features across 5684 samples within 1 assay
## Active assay: RNA (14 features, 0 variable features)