Skip to contents

Introduction

GEfetch2R provides functions for users to download count matrices and annotations (e.g. cell type annotation and composition) from GEO and some single-cell databases (e.g. PanglaoDB and UCSC Cell Browser). GEfetch2R also supports loading the downloaded data to Seurat.

Until now, the public resources supported and the returned values:

Resources URL Download Type Returned values
GEO https://www.ncbi.nlm.nih.gov/geo/ count matrix SeuratObject or count matrix for bulk RNA-seq
PanglaoDB https://panglaodb.se/index.html count matrix and annotation SeuratObject
UCSC Cell Browser https://cells.ucsc.edu/ count matrix and annotation SeuratObject

GEO

GEO is an international public repository that archives and freely distributes microarray, next-generation sequencing, and other forms of high-throughput functional genomics data submitted by the research community. It provides a very convenient way for users to explore and select interested scRNA-seq datasets.

## Warning: replacing previous import 'LoomExperiment::import' by
## 'reticulate::import' when loading 'GEfetch2R'

Extract metadata

GEfetch2R provides ExtractGEOMeta to extract sample metadata, including sample title, source name/tissue, description, cell type, treatment, paper title, paper abstract, organism, protocol, data processing methods, et al.

# library
library(GEfetch2R)

# extract metadata of specified platform
GSE200257.meta <- ExtractGEOMeta(acce = "GSE200257", platform = "GPL24676")
# set VROOM_CONNECTION_SIZE to avoid error: Error: The size of the connection buffer (786432) was not large enough
Sys.setenv("VROOM_CONNECTION_SIZE" = 131072 * 60)
# extract metadata of all platforms
GSE94820.meta <- ExtractGEOMeta(acce = "GSE94820", platform = NULL)

Show the metadata:

head(GSE94820.meta)

Download matrix and load to Seurat

After manually check the extracted metadata, users can download count matrix and load the count matrix to Seurat with ParseGEO.

For count matrix, ParseGEO supports downloading the matrix from supplementary files and extracting from ExpressionSet, users can control the source by specifying down.supp or detecting automatically (ParseGEO will extract the count matrix from ExpressionSet first, if the count matrix is NULL or contains non-integer values, ParseGEO will download supplementary files). While the supplementary files have two main types: single count matrix file containing all cells and CellRanger-style outputs (barcode, matrix, feature/gene), users are required to choose the type of supplementary files with supp.type.

With the count matrix, ParseGEO will load the matrix to Seurat automatically. If multiple samples available, users can choose to merge the SeuratObject with merge.

# cellranger out
GSE200257.seu <- ParseGEO(
  acce = "GSE200257", platform = NULL, supp.idx = 1,
  down.supp = TRUE, supp.type = "10x",
  out.folder = "/Volumes/soyabean/GEfetch2R/dwonload_geo"
)
# count matrix
GSE94820.seu <- ParseGEO(
  acce = "GSE94820", platform = NULL,
  supp.idx = 1, down.supp = TRUE,
  supp.type = "count"
)

Show the GSE200257.seu:

# for cellranger output
GSE200257.seu

Show the GSE94820.seu:

# for count matrix
GSE94820.seu

The structure of downloaded count matrix for 10x:

tree /Volumes/soyabean/GEfetch2R/dwonload_geo
## /Volumes/soyabean/GEfetch2R/dwonload_geo
## ├── GSM6025652_1
## │   ├── barcodes.tsv.gz
## │   ├── features.tsv.gz
## │   └── matrix.mtx.gz
## ├── GSM6025653_2
## │   ├── barcodes.tsv.gz
## │   ├── features.tsv.gz
## │   └── matrix.mtx.gz
## ├── GSM6025654_3
## │   ├── barcodes.tsv.gz
## │   ├── features.tsv.gz
## │   └── matrix.mtx.gz
## └── GSM6025655_4
##     ├── barcodes.tsv.gz
##     ├── features.tsv.gz
##     └── matrix.mtx.gz
## 
## 5 directories, 12 files

For bulk RNA-seq, set data.type = "bulk" in ParseGEO, this will return count matrix.


PanglaoDB

PanglaoDB is a database which contains scRNA-seq datasets from mouse and human. Up to now, it contains 5,586,348 cells from 1368 datasets (1063 from Mus musculus and 305 from Homo sapiens). It has well organized metadata for every dataset, including tissue, protocol, species, number of cells and cell type annotation (computationally identified). Daniel Osorio has developed rPanglaoDB to access PanglaoDB data, the functions of GEfetch2R here are based on rPanglaoDB.

Since PanglaoDB is no longer maintained, GEfetch2R has cached all metadata and cell type composition and use these cached data by default to accelerate, users can access the cached data with PanglaoDBMeta (all metadata) and PanglaoDBComposition (all cell type composition).

Summary attributes

GEfetch2R provides StatDBAttribute to summary attributes of PanglaoDB:

# use cached metadata
StatDBAttribute(df = PanglaoDBMeta, filter = c("species", "protocol"), database = "PanglaoDB")
## $species
##          Value  Num     Key
## 1 Mus musculus 1063 species
## 2 Homo sapiens  305 species
## 
## $protocol
##           Value  Num      Key
## 1  10x chromium 1046 protocol
## 2      drop-seq  204 protocol
## 3 microwell-seq   74 protocol
## 4    Smart-seq2   26 protocol
## 5   C1 Fluidigm   16 protocol
## 6       CEL-seq    1 protocol
## 7       inDrops    1 protocol

Extract metadata

GEfetch2R provides ExtractPanglaoDBMeta to select interested datasets with specified species, protocol, tissue and cell number (The available values of these attributes can be obtained with StatDBAttribute). User can also choose to whether to add cell type annotation to every dataset with show.cell.type.

GEfetch2R uses cached metadata and cell type composition by default, users can change this by setting local.data = FALSE.

hsa.meta <- ExtractPanglaoDBMeta(
  species = "Homo sapiens", protocol = c("Smart-seq2", "10x chromium"),
  show.cell.type = TRUE, cell.num = c(1000, 2000)
)

Show the metadata:

head(hsa.meta)
##         SRA        SRS                             Tissue     Protocol
## 1 SRA550660 SRS2089635 Peripheral blood mononuclear cells 10x chromium
## 2 SRA550660 SRS2089636 Peripheral blood mononuclear cells 10x chromium
## 3 SRA550660 SRS2089638 Peripheral blood mononuclear cells 10x chromium
## 4 SRA605365 SRS2492922            Nasal airway epithelium 10x chromium
## 5 SRA608611 SRS2517316                   Lung progenitors 10x chromium
## 6 SRA608353 SRS2517519           Hepatocellular carcinoma 10x chromium
##        Species Cells
## 1 Homo sapiens  1860
## 2 Homo sapiens  1580
## 3 Homo sapiens  1818
## 4 Homo sapiens  1932
## 5 Homo sapiens  1077
## 6 Homo sapiens  1230
##                                                                      CellType
## 1                                                           Unknown, NK cells
## 2                              Unknown, T cells, Plasmacytoid dendritic cells
## 3 Unknown, Gamma delta T cells, Dendritic cells, Plasmacytoid dendritic cells
## 4       Luminal epithelial cells, Basal cells, Keratinocytes, Ependymal cells
## 5                                           Unknown, Hepatocytes, Basal cells
## 6                                        Unknown, Hepatocytes, Foveolar cells
##   CellNum
## 1    1860
## 2    1580
## 3    1818
## 4    1932
## 5    1077
## 6    1230

Extract cell type composition

GEfetch2R provides ExtractPanglaoDBComposition to extract cell type annotation and composition (use cached data by default to accelerate, users can change this by setting local.data = FALSE).

hsa.composition <- ExtractPanglaoDBComposition(
  species = "Homo sapiens",
  protocol = c("Smart-seq2", "10x chromium")
)

Show the extracted cell type annotation and composition:

head(hsa.composition)
##           SRA        SRS                        Tissue     Protocol
## 1.1 SRA553822 SRS2119548 Cultured embryonic stem cells 10x chromium
## 1.2 SRA553822 SRS2119548 Cultured embryonic stem cells 10x chromium
## 1.3 SRA553822 SRS2119548 Cultured embryonic stem cells 10x chromium
## 1.4 SRA553822 SRS2119548 Cultured embryonic stem cells 10x chromium
## 1.5 SRA553822 SRS2119548 Cultured embryonic stem cells 10x chromium
## 1.6 SRA553822 SRS2119548 Cultured embryonic stem cells 10x chromium
##          Species Cluster Cells Cell Type
## 1.1 Homo sapiens       0  1572   Unknown
## 1.2 Homo sapiens       1   563   Unknown
## 1.3 Homo sapiens       2   280   Unknown
## 1.4 Homo sapiens       3   270   Unknown
## 1.5 Homo sapiens       4   220   Unknown
## 1.6 Homo sapiens       5   192   Unknown

Download matrix and load to Seurat

After manually check the extracted metadata, GEfetch2R provides ParsePanglaoDB to download count matrix and load the count matrix to Seurat. With available cell type annotation, uses can filter datasets without specified cell type with cell.type. Users can also include/exclude cells expressing specified genes with include.gene/exclude.gene.

With the count matrix, ParsePanglaoDB will load the matrix to Seurat automatically. If multiple datasets available, users can choose to merge the SeuratObject with merge.

hsa.seu <- ParsePanglaoDB(hsa.meta[1:3, ], merge = TRUE)

Show the returned SeuratObject:

hsa.seu
## An object of class Seurat 
## 25917 features across 4996 samples within 1 assay 
## Active assay: RNA (25917 features, 0 variable features)

UCSC Cell Browser

The UCSC Cell Browser is a web-based tool that allows scientists to interactively visualize scRNA-seq datasets. It contains 1267 single cell datasets from 25 different species. And, it is organized with the hierarchical structure, which can help users quickly locate the datasets they are interested in.

Show available datasets

GEfetch2R provides ShowCBDatasets to show all available datasets. Due to the large number of datasets, ShowCBDatasets enables users to perform lazy load of dataset json files instead of downloading the json files online (time-consuming!!!). This lazy load requires users to provide json.folder to save json files and set lazy = TRUE (for the first time of run, ShowCBDatasets will download current json files to json.folder, for next time of run, with lazy = TRUE, ShowCBDatasets will load the downloaded json files from json.folder.). And, ShowCBDatasets supports updating the local datasets with update = TRUE.

# first time run, the json files are stored under json.folder
# ucsc.cb.samples = ShowCBDatasets(lazy = TRUE, json.folder = "/Volumes/soyabean/GEfetch2R/cell_browser/json", update = TRUE)

# second time run, load the downloaded json files
ucsc.cb.samples <- ShowCBDatasets(lazy = TRUE, json.folder = "/Volumes/soyabean/GEfetch2R/cell_browser/json", update = FALSE)

# always read online
# ucsc.cb.samples = ShowCBDatasets(lazy = FALSE)

Show the metadata:

head(ucsc.cb.samples)
##                                 name                          shortLabel
## 1      ad-aging-brain/ad-aging-brain Aging Brain and Alzheimer's Disease
## 2     ad-aging-brain/ad-atac/erosion Aging Brain and Alzheimer's Disease
## 3          ad-aging-brain/ad-atac/hq Aging Brain and Alzheimer's Disease
## 4 ad-aging-brain/ad-atac/integration Aging Brain and Alzheimer's Disease
## 5      ad-aging-brain/ad-atac/rna-hq Aging Brain and Alzheimer's Disease
## 6    ad-aging-brain/microglia-states Aging Brain and Alzheimer's Disease
##                                                            subLabel tags
## 1             AD and Aging Prefrontal Cortex Across 427 individuals  10x
## 2                                  snATAC-seq of Epigenomic Erosion  10x
## 3 snATAC-seq - AD and aging prefrontal cortex across 92 individuals  10x
## 4                               Integrated snRNA-seq and snATAC-seq  10x
## 5  snRNA-seq - AD and aging prefrontal cortex across 92 individuals  10x
## 6                                Microglia States In The Aged Brain  10x
##                         body_parts                   diseases
## 1 brain, cortex, prefrontal cortex        Alzheimer’s disease
## 2                            brain Alzheimer's disease|parent
## 3                            brain Alzheimer's disease|parent
## 4                            brain Alzheimer's disease|parent
## 5                            brain Alzheimer's disease|parent
## 6 brain, cortex, prefrontal cortex        Alzheimer’s disease
##                   organisms      projects life_stages domains sources
## 1        Human (H. sapiens)        ROSMAP               atlas      NA
## 2 Human (H. sapiens)|parent ROSMAP|parent                          NA
## 3 Human (H. sapiens)|parent ROSMAP|parent                          NA
## 4 Human (H. sapiens)|parent ROSMAP|parent                          NA
## 5 Human (H. sapiens)|parent ROSMAP|parent                          NA
## 6        Human (H. sapiens)        ROSMAP               atlas      NA
##   sampleCount assays              matrix               barcode
## 1     2327742        ad430_matrix.mtx.gz ad430_barcodes.tsv.gz
## 2      437098              matrix.mtx.gz       barcodes.tsv.gz
## 3      171119              matrix.mtx.gz       barcodes.tsv.gz
## 4      586083              matrix.mtx.gz       barcodes.tsv.gz
## 5      414964              matrix.mtx.gz       barcodes.tsv.gz
## 6      152459              matrix.mtx.gz       barcodes.tsv.gz
##                 feature matrixType
## 1 ad430_features.tsv.gz        10x
## 2       features.tsv.gz        10x
## 3       features.tsv.gz        10x
## 4       features.tsv.gz        10x
## 5       features.tsv.gz        10x
## 6       features.tsv.gz        10x
##                                                                                                                        title
## 1 Single-cell atlas reveals correlates of high cognitive function, dementia, and resilience to Alzheimer's disease pathology
## 2                       Epigenomic dissection of Alzheimer's disease pinpoints causal variants and reveals epigenome erosion
## 3                       Epigenomic dissection of Alzheimer's disease pinpoints causal variants and reveals epigenome erosion
## 4                       Epigenomic dissection of Alzheimer's disease pinpoints causal variants and reveals epigenome erosion
## 5                       Epigenomic dissection of Alzheimer's disease pinpoints causal variants and reveals epigenome erosion
## 6                                                         Human Microglial State Dynamics in Alzheimer's Disease Progression
##   paper
## 1      
## 2      
## 3      
## 4      
## 5      
## 6      
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      abstract
## 1                                                                                                                                                                                                                                           \nAlzheimer's disease (AD) is the most common cause of dementia worldwide, but\nthe molecular and cellular mechanisms underlying cognitive impairment remain\npoorly understood. To address this, we generated a single-cell transcriptomic\natlas of the aged human prefrontal cortex covering 2.3 million cells from\npost-mortem human brain samples of 427 individuals with varying degrees of AD\npathology and cognitive impairment. Our analyses identified AD\npathology-associated alterations shared between excitatory neuron subtypes,\nrevealed a coordinated increase of the cohesin complex and DNA damage response\nfactors in excitatory neurons and in oligodendrocytes, and uncovered genes and\npathways associated with high cognitive function, dementia, and resilience to\nAD pathology. Furthermore, we identified selectively vulnerable somatostatin\ninhibitory neuron subtypes depleted in AD, discovered two distinct groups of\ninhibitory neurons that were more abundant in individuals with preserved high\ncognitive function late in life, and uncovered a link between inhibitory\nneurons and resilience to AD pathology.\n
## 2                                                                                                                                                                                                                                                                \nRecent work has identified dozens of non-coding loci for Alzheimer's disease\n(AD) risk, but their mechanisms and AD transcriptional regulatory circuitry are\npoorly understood. Here, we profile epigenomic and transcriptomic landscapes of\n850k nuclei from prefrontal cortexes of 92 individuals with and without AD to\nbuild a map of the brain regulome, including epigenomic profiles,\ntranscriptional regulators, co-accessibility modules, and peak-to-gene links in\na cell-type-specific manner. We develop methods for multimodal integration and\ndetecting regulatory modules using peak-to-gene linking. We show AD risk loci\nare enriched in microglial enhancers and for specific TFs including SPI1, ELF2,\nand RUNX1. We detect 9,628 cell-type-specific ATAC-QTL loci, which we integrate\nalongside peak-to-gene links to prioritize AD variant regulatory circuits. We\nreport differential accessibility of regulatory modules in late-AD in glia and\nin early-AD in neurons. Strikingly, late-stage AD brains show global epigenome\ndysregulation indicative of epigenome erosion and cell identity loss.\n
## 3                                                                                                                                                                                                                                                                \nRecent work has identified dozens of non-coding loci for Alzheimer's disease\n(AD) risk, but their mechanisms and AD transcriptional regulatory circuitry are\npoorly understood. Here, we profile epigenomic and transcriptomic landscapes of\n850k nuclei from prefrontal cortexes of 92 individuals with and without AD to\nbuild a map of the brain regulome, including epigenomic profiles,\ntranscriptional regulators, co-accessibility modules, and peak-to-gene links in\na cell-type-specific manner. We develop methods for multimodal integration and\ndetecting regulatory modules using peak-to-gene linking. We show AD risk loci\nare enriched in microglial enhancers and for specific TFs including SPI1, ELF2,\nand RUNX1. We detect 9,628 cell-type-specific ATAC-QTL loci, which we integrate\nalongside peak-to-gene links to prioritize AD variant regulatory circuits. We\nreport differential accessibility of regulatory modules in late-AD in glia and\nin early-AD in neurons. Strikingly, late-stage AD brains show global epigenome\ndysregulation indicative of epigenome erosion and cell identity loss.\n
## 4                                                                                                                                                                                                                                                                \nRecent work has identified dozens of non-coding loci for Alzheimer's disease\n(AD) risk, but their mechanisms and AD transcriptional regulatory circuitry are\npoorly understood. Here, we profile epigenomic and transcriptomic landscapes of\n850k nuclei from prefrontal cortexes of 92 individuals with and without AD to\nbuild a map of the brain regulome, including epigenomic profiles,\ntranscriptional regulators, co-accessibility modules, and peak-to-gene links in\na cell-type-specific manner. We develop methods for multimodal integration and\ndetecting regulatory modules using peak-to-gene linking. We show AD risk loci\nare enriched in microglial enhancers and for specific TFs including SPI1, ELF2,\nand RUNX1. We detect 9,628 cell-type-specific ATAC-QTL loci, which we integrate\nalongside peak-to-gene links to prioritize AD variant regulatory circuits. We\nreport differential accessibility of regulatory modules in late-AD in glia and\nin early-AD in neurons. Strikingly, late-stage AD brains show global epigenome\ndysregulation indicative of epigenome erosion and cell identity loss.\n
## 5                                                                                                                                                                                                                                                                \nRecent work has identified dozens of non-coding loci for Alzheimer's disease\n(AD) risk, but their mechanisms and AD transcriptional regulatory circuitry are\npoorly understood. Here, we profile epigenomic and transcriptomic landscapes of\n850k nuclei from prefrontal cortexes of 92 individuals with and without AD to\nbuild a map of the brain regulome, including epigenomic profiles,\ntranscriptional regulators, co-accessibility modules, and peak-to-gene links in\na cell-type-specific manner. We develop methods for multimodal integration and\ndetecting regulatory modules using peak-to-gene linking. We show AD risk loci\nare enriched in microglial enhancers and for specific TFs including SPI1, ELF2,\nand RUNX1. We detect 9,628 cell-type-specific ATAC-QTL loci, which we integrate\nalongside peak-to-gene links to prioritize AD variant regulatory circuits. We\nreport differential accessibility of regulatory modules in late-AD in glia and\nin early-AD in neurons. Strikingly, late-stage AD brains show global epigenome\ndysregulation indicative of epigenome erosion and cell identity loss.\n
## 6 \nAltered microglial states affect neuroinflammation, neurodegeneration, and\ndisease, but remain poorly understood. Here, we report 194k single-nucleus\nmicroglial transcriptomes and epigenomes across 443 human subjects, and diverse\nAlzheimer's disease (AD) pathological phenotypes. We annotate 12 microglial\ntranscriptional states, including AD-dysregulated homeostatic, inflammatory,\nand lipid-processing states. We identify 1,542 AD-differentially-expressed\ngenes, including both microglia-state-specific and disease-stage-specific\nalterations. By integrating epigenomic, transcriptomic, and motif information,\nwe infer upstream regulators of microglial cell states, gene-regulatory\nnetworks, enhancer-gene links, and transcription factor-driven microglial state\ntransitions. We demonstrate that ectopic expression of our predicted\nhomeostatic-state activators induces homeostatic features in human iPSC-derived\nmicroglia-like cells, while inhibiting activators of inflammation can block\ninflammatory progression. Lastly, we pinpoint the expression of AD-risk genes\nin microglial states and differential expression of AD-risk genes and their\nregulators during AD progression. Overall, we provide insights underlying\nmicroglial states, including state-specific and AD-stage-specific microglial\nalterations at unprecedented resolution. \n
##   unit                         coords
## 1      UMAP_coordinates.coords.tsv.gz
## 2      UMAP_coordinates.coords.tsv.gz
## 3      UMAP_coordinates.coords.tsv.gz
## 4      UMAP_coordinates.coords.tsv.gz
## 5      UMAP_coordinates.coords.tsv.gz
## 6      UMAP_coordinates.coords.tsv.gz
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  methods
## 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     \n<p>\nWe selected 427 individuals from the Religious Orders Study and Rush Memory and\nAging Project (ROSMAP), both ongoing longitudinal clinical-pathologic cohort\nstudies of aging and dementia, in which all participants are brain donors. The\nstudies include clinical data collected annually, detailed post- mortem\npathological evaluations, and extensive genetic, epigenomic, transcriptomic,\nproteomic, and metabolomic bulk-tissue profiling (DA et al., 2018) .\nIndividuals were balanced between sexes (male:female ratio 212:215). Informed\nconsent was obtained from each subject, and the Religious Orders Study and Rush\nMemory and Aging Project were each approved by an Institutional Review Board\n(IRB) of Rush University Medical Center. Participants also signed an Anatomic\nGift Act, and a repository consent to allow their data to be repurposed.\n\n<p>\nFor droplet-based snRNA-seq, libraries were prepared using the Chromium Single\nCell 3′ Reagent Kits v3 according to the manufacturer's protocol (10x\nGenomics). The generated snRNA-seq libraries were sequenced using NextSeq\n500/550 High Output v2 kits (150 cycles) or NovaSeq 6000 S2 Reagent Kits. Gene\ncounts were obtained by aligning reads to the GRCh38 genome using Cell Ranger\nsoftware (v.3.0.2) (10x Genomics) (Zheng et al., 2017). To account for\nunspliced nuclear transcripts, reads mapping to pre-mRNA were counted. After\nquantification of pre-mRNA using the Cell Ranger count pipeline, the Cell\nRanger aggr pipeline was used to aggregate all libraries (without equalizing\nthe read depth between groups) to generate a gene-count matrix. The Cell Ranger\n3.0 default parameters were used to call cell barcodes. We used SCANPY (Wolf,\nAngerer and Theis, 2018) to process and cluster the expression profiles and\ninfer cell identities of major cell classes. To remove doublets and\npoor-quality cells, cells were excluded from subsequent analysis if they were\nextreme outliers (observations outside the range [Q1 – k(Q3 – Q1),Q3 +\nk(Q3-Q1)], with k=3 and Q1 and Q3 as the lower and upper quartiles) in terms of\nnumber of genes, number of unique molecular identifiers (UMIs), and percentage\nof mitochondrial genes. In addition, we called doublets using Scrublet (Wolock,\nLopez and Klein, 2019) and flagged and removed cells that were labeled as\ndoublets.\n
## 2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     \n<p>\nWe selected 427 individuals from ROSMAP (Mathys et al., co-submitted) and\nperformed snRNA-seq on the prefrontal cortex region of these samples. For 92 of\nthe samples, we carried out snATAC profiling as well. The 92 samples are with a\nvariety of AD-related pathological and cognitive measurements, and are grouped\ninto 48 controls (22 female; 26 male), 29 early-stage AD (15 female; 14 male)\nand 15 late-stage AD (9 female; 6 male) samples. The ages between AD groups are\nmatched, without significant difference).  \n\n<p>\nWe used "cellranger-atac mkfastq" (V1.1.0) to demultiplex BCL files into Fastq\nraw sequencing data, and then ran "cellranger-atac count" to map the reads onto\nhuman reference genome (GRCh38) and obtain the fragment file of each sample. We\nthen utilized ArchR (V1.0.1) to process the snATAC-seq data, using fragment\nfiles as input (Granja et al. 2021). We performed the first round of doublet\nremoval within each sample using the "filterDoublets" function.\n\n<p>\nFor the "high-quality cell" analysis, we selected the cells with TSS enrichment\n> 6 and number of fragments between 1000 to 100,000. We performed Iterative LSI\ndimension reduction and clustering using a 500 bp tile matrix, with parameters\n"iterations=4, resolution=0.2, varFeat=50000". Cell embedding visualization was\nperformed using UMAP. We estimated gene expression score using ArchR, and then\nquantified the cell type signature of each cluster based on PsychENCODE marker\ngenes (PsychENCODE Consortium et al. 2015). We calculated the mean of gene\nscores of all the cell marker genes in each cell type for each cluster, which\nwere then used to assign cell types. We performed a second round of doublet\nremoval at cluster level by discarding the cell clusters that show ambiguous\ncell type marker signal and meanwhile locate between two clusters that show\nclear cell type signature.\n\n<p>\nFor the "erosion" analysis, we selected the cells with TSS enrichment > 1 and\nnumber of fragments between 1000 to 100,000. We performed Iterative LSI\ndimension reduction and clustering using a 500 bp tile matrix, with parameters\n"iterations=3, resolution=0.2, varFeat=50000". Cell embedding visualization was\nperformed using UMAP. We performed a second round of doublet removal at cluster\nlevel using the same strategy as "TSS enrichment > 6" analysis described above. \n
## 3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     \n<p>\nWe selected 427 individuals from ROSMAP (Mathys et al., co-submitted) and\nperformed snRNA-seq on the prefrontal cortex region of these samples. For 92 of\nthe samples, we carried out snATAC profiling as well. The 92 samples are with a\nvariety of AD-related pathological and cognitive measurements, and are grouped\ninto 48 controls (22 female; 26 male), 29 early-stage AD (15 female; 14 male)\nand 15 late-stage AD (9 female; 6 male) samples. The ages between AD groups are\nmatched, without significant difference).  \n\n<p>\nWe used "cellranger-atac mkfastq" (V1.1.0) to demultiplex BCL files into Fastq\nraw sequencing data, and then ran "cellranger-atac count" to map the reads onto\nhuman reference genome (GRCh38) and obtain the fragment file of each sample. We\nthen utilized ArchR (V1.0.1) to process the snATAC-seq data, using fragment\nfiles as input (Granja et al. 2021). We performed the first round of doublet\nremoval within each sample using the "filterDoublets" function.\n\n<p>\nFor the "high-quality cell" analysis, we selected the cells with TSS enrichment\n> 6 and number of fragments between 1000 to 100,000. We performed Iterative LSI\ndimension reduction and clustering using a 500 bp tile matrix, with parameters\n"iterations=4, resolution=0.2, varFeat=50000". Cell embedding visualization was\nperformed using UMAP. We estimated gene expression score using ArchR, and then\nquantified the cell type signature of each cluster based on PsychENCODE marker\ngenes (PsychENCODE Consortium et al. 2015). We calculated the mean of gene\nscores of all the cell marker genes in each cell type for each cluster, which\nwere then used to assign cell types. We performed a second round of doublet\nremoval at cluster level by discarding the cell clusters that show ambiguous\ncell type marker signal and meanwhile locate between two clusters that show\nclear cell type signature.\n\n<p>\nFor the "erosion" analysis, we selected the cells with TSS enrichment > 1 and\nnumber of fragments between 1000 to 100,000. We performed Iterative LSI\ndimension reduction and clustering using a 500 bp tile matrix, with parameters\n"iterations=3, resolution=0.2, varFeat=50000". Cell embedding visualization was\nperformed using UMAP. We performed a second round of doublet removal at cluster\nlevel using the same strategy as "TSS enrichment > 6" analysis described above. \n
## 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     \n<p>\nWe selected 427 individuals from ROSMAP (Mathys et al., co-submitted) and\nperformed snRNA-seq on the prefrontal cortex region of these samples. For 92 of\nthe samples, we carried out snATAC profiling as well. The 92 samples are with a\nvariety of AD-related pathological and cognitive measurements, and are grouped\ninto 48 controls (22 female; 26 male), 29 early-stage AD (15 female; 14 male)\nand 15 late-stage AD (9 female; 6 male) samples. The ages between AD groups are\nmatched, without significant difference).  \n\n<p>\nWe used "cellranger-atac mkfastq" (V1.1.0) to demultiplex BCL files into Fastq\nraw sequencing data, and then ran "cellranger-atac count" to map the reads onto\nhuman reference genome (GRCh38) and obtain the fragment file of each sample. We\nthen utilized ArchR (V1.0.1) to process the snATAC-seq data, using fragment\nfiles as input (Granja et al. 2021). We performed the first round of doublet\nremoval within each sample using the "filterDoublets" function.\n\n<p>\nFor the "high-quality cell" analysis, we selected the cells with TSS enrichment\n> 6 and number of fragments between 1000 to 100,000. We performed Iterative LSI\ndimension reduction and clustering using a 500 bp tile matrix, with parameters\n"iterations=4, resolution=0.2, varFeat=50000". Cell embedding visualization was\nperformed using UMAP. We estimated gene expression score using ArchR, and then\nquantified the cell type signature of each cluster based on PsychENCODE marker\ngenes (PsychENCODE Consortium et al. 2015). We calculated the mean of gene\nscores of all the cell marker genes in each cell type for each cluster, which\nwere then used to assign cell types. We performed a second round of doublet\nremoval at cluster level by discarding the cell clusters that show ambiguous\ncell type marker signal and meanwhile locate between two clusters that show\nclear cell type signature.\n\n<p>\nFor the "erosion" analysis, we selected the cells with TSS enrichment > 1 and\nnumber of fragments between 1000 to 100,000. We performed Iterative LSI\ndimension reduction and clustering using a 500 bp tile matrix, with parameters\n"iterations=3, resolution=0.2, varFeat=50000". Cell embedding visualization was\nperformed using UMAP. We performed a second round of doublet removal at cluster\nlevel using the same strategy as "TSS enrichment > 6" analysis described above. \n
## 5                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     \n<p>\nWe selected 427 individuals from ROSMAP (Mathys et al., co-submitted) and\nperformed snRNA-seq on the prefrontal cortex region of these samples. For 92 of\nthe samples, we carried out snATAC profiling as well. The 92 samples are with a\nvariety of AD-related pathological and cognitive measurements, and are grouped\ninto 48 controls (22 female; 26 male), 29 early-stage AD (15 female; 14 male)\nand 15 late-stage AD (9 female; 6 male) samples. The ages between AD groups are\nmatched, without significant difference).  \n\n<p>\nWe used "cellranger-atac mkfastq" (V1.1.0) to demultiplex BCL files into Fastq\nraw sequencing data, and then ran "cellranger-atac count" to map the reads onto\nhuman reference genome (GRCh38) and obtain the fragment file of each sample. We\nthen utilized ArchR (V1.0.1) to process the snATAC-seq data, using fragment\nfiles as input (Granja et al. 2021). We performed the first round of doublet\nremoval within each sample using the "filterDoublets" function.\n\n<p>\nFor the "high-quality cell" analysis, we selected the cells with TSS enrichment\n> 6 and number of fragments between 1000 to 100,000. We performed Iterative LSI\ndimension reduction and clustering using a 500 bp tile matrix, with parameters\n"iterations=4, resolution=0.2, varFeat=50000". Cell embedding visualization was\nperformed using UMAP. We estimated gene expression score using ArchR, and then\nquantified the cell type signature of each cluster based on PsychENCODE marker\ngenes (PsychENCODE Consortium et al. 2015). We calculated the mean of gene\nscores of all the cell marker genes in each cell type for each cluster, which\nwere then used to assign cell types. We performed a second round of doublet\nremoval at cluster level by discarding the cell clusters that show ambiguous\ncell type marker signal and meanwhile locate between two clusters that show\nclear cell type signature.\n\n<p>\nFor the "erosion" analysis, we selected the cells with TSS enrichment > 1 and\nnumber of fragments between 1000 to 100,000. We performed Iterative LSI\ndimension reduction and clustering using a 500 bp tile matrix, with parameters\n"iterations=3, resolution=0.2, varFeat=50000". Cell embedding visualization was\nperformed using UMAP. We performed a second round of doublet removal at cluster\nlevel using the same strategy as "TSS enrichment > 6" analysis described above. \n
## 6 \n<p>\nNuclei isolation from frozen postmortem brain tissue: We isolated nuclei from\nfrozen postmortem brain tissue as previously described (Mathys et al., Nature\n2019) with some modifications. Briefly, we homogenized the brain tissue in 700\nμL Homogenization buffer and filtered the homogenate through a 40 μm cell\nstrainer (Corning, NY), added 450 μL Working solution and loaded it as a 25%\nOptiPrep solution on top of a 30%/40% OptiPrep density gradient (750 μL 30%\nOptiPrep solution, 300 μL 40% OptiPrep solution). We separated the nuclei by\ncentrifugation using a fixed rotor, tabletop centrifuge (5 minutes, 10000g,\n4˚C). We collected the nuclei pellet at the 30%/40% interphase, transferred it\non a new tube, washed it twice with 1mL ice-cold PBS containing 0.04% BSA\n(centrifuged 3 minutes, 300g, 4˚C) and finally resuspended it in 100 μL PBS\ncontaining 0.04% BSA. After counting, we diluted the nuclei to a concentration\nof 1000 nuclei per μL. We used the isolated nuclei for the droplet-based 10x\nscRNA-seq assay, targeting 5000 nuclei per brain region and individual, and\nprepared libraries using the Chromium Single-Cell 3′ Reagent Kits v3 (10x\nGenomics, Pleasanton, CA) according to the manufacturer's protocol. We\nsequenced pooled libraries using the NovaSeq 6000 S2 sequencing kits (100\ncycles, Illumina).\n\n<p>\nsnRNA-seq data processing: We mapped the raw reads to human reference genome\nversion GRCh38 and quantified unique molecular identifiers (UMIs) counts for\neach gene in each cell using CellRanger software v3.0.1 (10x Genomics). We\npre-processed this count matrix (gene-cell) using the Seurat R package\nv.4.0.3). We kept the cells with more than 500 UMIs and less than 5%\nmitochondrial genes, and genes with expression at least in 50 cells for further\nanalysis. We normalized the counts by the total UMI counts for each cell,\nmultiplied by 10,000, and then log-transformed. We used the top 2,000 highly\nvariable genes for principal component analysis (PCA) and  the top 30 principal\ncomponents (PCs) as inputs to perform UMAP. We used Harmony for batch\ncorrection. We used the resolution as 0.5 to identify clusters. We used\nDoubletFinder to estimate the potential doublets formed by two or more cells\nbased on the by default parameters. The cells with high doublet scores (0.2 as\ncutoff) were removed for further analysis.After generating clusters, one\ncluster showing high expression of markers of two or more cell types was also\ntreated as doublets and removed for further analysis.\n\n<p>\nIn Silico Sorting to enrich immune cells and cell type annotation in snRNA-seq\ndata: For the full datasets with all cell types (2.8 million cells), we first\nannotated the cell type for each cluster based on three widely-used canonical\nmarkers of major cell types in the brain (including excitatory and inhibitory\nneurons, astrocytes, oligodendrocytes, OPCs, microglia and vascular cells) and\na list of markers for immune cells. We also tested the enrichment of a large\nset of markers in highly expressed genes for each cluster to confirm the\nannotation based on several marker genes. We next calculated the cell type\nscores (i.e. astrocyte, oligodendrocyte, microglia, etc) for each cell, which\nwere represented by the average expression of a group of markers for each cell\ntype. The cells were then selected as microglia/immune cells for further\nintegrative analysis if and only if (1) the clusters that the cells belong to\nwere annotated as microglia/immune cells; and (2) the cells had the highest\nscore for microglia/immune cells, and 3) the score for microglia/immune cells\nwas 2-fold higher than the second highest score. For the selected\nmicroglia/immune cells, we followed the same pipeline to perform dimensional\nreduction and clustering with the same parameters as full datasets. We used the\nWilcoxon rank-sum test in Seurat with customized parameters (min.pct = 0.25,\nlogfc.threshold = 0.25) to identify highly expressed genes for each cluster\ncompared to all cells from other clusters.\n
##   geo
## 1    
## 2    
## 3    
## 4    
## 5    
## 6

The number of datasets and all available species:

# the number of datasets
nrow(ucsc.cb.samples)
## [1] 1318
# available species
unique(unlist(sapply(unique(gsub(pattern = "\\|parent", replacement = "", x = ucsc.cb.samples$organisms)), function(x) {
  unlist(strsplit(x = x, split = ", "))
})))
##  [1] "Human (H. sapiens)"                 "Mouse (M. musculus)"               
##  [3] "Rhesus macaque (M. mulatta)"        "Chimp (P. troglodytes)"            
##  [5] "Brine Shrimp (A. franciscana)"      "Canis lupus familiaris"            
##  [7] "Dog (C. familiaris)"                "Human (H. Sapiens)"                
##  [9] "C. intestinalis"                    "C. robusta"                        
## [11] "Zebrafish (D. rerio)"               "Fruit fly (D. melanogaster)"       
## [13] "Horse (E. caballus)"                "Hydra vulgaris"                    
## [15] "Capitella teleta"                   "Spongilla lacustris"               
## [17] "H. symbiolongicarpus"               "X. tropicalis"                     
## [19] "Marmoset (C. jacchus)"              "P. leidyi"                         
## [21] "Bonobo (P. paniscus)"               "Rat (R. norvegicus)"               
## [23] "S. mansoni"                         "Starlet Sea Anemone (N. vectensis)"
## [25] "Nematostella vectensis"             "Sea urchin (S. purpuratus)"        
## [27] "Mouse lemur (T. microcebus)"        "Human-Mouse Xenograft"             
## [29] "Xenopus laevis"

Summary attributes

GEfetch2R provides StatDBAttribute to summary attributes of UCSC Cell Browser:

StatDBAttribute(
  df = ucsc.cb.samples, filter = c("organism", "organ"),
  database = "UCSC", combine = TRUE
)
## # A tibble: 301 × 3
## # Groups:   organisms [29]
##    organisms           body_parts        Num
##    <chr>               <chr>           <int>
##  1 human (h. sapiens)  brain             171
##  2 human (h. sapiens)  eye               111
##  3 human (h. sapiens)  retina            109
##  4 mouse (m. musculus) brain              96
##  5 human (h. sapiens)  lung               63
##  6 human (h. sapiens)  muscle             46
##  7 human (h. sapiens)  skeletal muscle    40
##  8 human (h. sapiens)  cortex             38
##  9 human (h. sapiens)  blood              34
## 10 human (h. sapiens)  heart              32
## # ℹ 291 more rows

Extract metadata

GEfetch2R provides ExtractCBDatasets to filter metadata with collection, sub-collection, organ, disease status, organism, project and cell number (The available values of these attributes can be obtained with StatDBAttribute except cell number). All attributes except cell number support fuzzy match with fuzzy.match, this is useful when selecting datasets.

hbb.sample.df <- ExtractCBDatasets(
  all.samples.df = ucsc.cb.samples, organ = c("skeletal muscle"),
  organism = "Human (H. sapiens)", cell.num = c(1000, 2000)
)

Show the metadata:

head(hbb.sample.df)
##                                                 name      shortLabel
## 1 skeletal-muscle/embryonic/embryonic-wk7-8-myogenic Skeletal Muscle
## 2       skeletal-muscle/fetal/fetal-wk12-14-hindlimb Skeletal Muscle
## 3           skeletal-muscle/in-vitro/hx-protocol-wk4 Skeletal Muscle
## 4  skeletal-muscle/in-vitro/hx-protocol-wk6-myogenic Skeletal Muscle
## 5  skeletal-muscle/in-vitro/hx-protocol-wk8-myogenic Skeletal Muscle
## 6         skeletal-muscle/juvenile/juvenile-hindlimb Skeletal Muscle
##                             subLabel tags                     body_parts
## 1 Embryonic Week 7-8 Myogenic Subset      muscle, skeletal muscle|parent
## 2   Fetal Week 12-14 Hindlimb Muscle      muscle, skeletal muscle|parent
## 3         HX Protocol Week 4 Culture      muscle, skeletal muscle|parent
## 4 HX Protocol Week 6 Myogenic Subset      muscle, skeletal muscle|parent
## 5 HX Protocol Week 8 Myogenic Subset      muscle, skeletal muscle|parent
## 6           Juvenile Hindlimb Muscle      muscle, skeletal muscle|parent
##         diseases                 organisms projects life_stages domains sources
## 1 Healthy|parent Human (H. sapiens)|parent                                   NA
## 2 Healthy|parent Human (H. sapiens)|parent                                   NA
## 3 Healthy|parent Human (H. sapiens)|parent                                   NA
## 4 Healthy|parent Human (H. sapiens)|parent                                   NA
## 5 Healthy|parent Human (H. sapiens)|parent                                   NA
## 6 Healthy|parent Human (H. sapiens)|parent                                   NA
##   sampleCount assays            matrix barcode feature matrixType
## 1        1448        exprMatrix.tsv.gz                     matrix
## 2        1545        exprMatrix.tsv.gz                     matrix
## 3        1562        exprMatrix.tsv.gz                     matrix
## 4        1598        exprMatrix.tsv.gz                     matrix
## 5        1350        exprMatrix.tsv.gz                     matrix
## 6        1982        exprMatrix.tsv.gz                     matrix
##                                title paper
## 1 Embryonic Week 7-8 Myogenic Subset      
## 2   Fetal Week 12-14 Hindlimb Muscle      
## 3         HX Protocol Week 4 Culture      
## 4 HX Protocol Week 6 Myogenic Subset      
## 5 HX Protocol Week 8 Myogenic Subset      
## 6           Juvenile Hindlimb Muscle      
##                                                                                                                                                                                                                                                                                           abstract
## 1 \nSingle cell transcriptomes of the skeletal muscle (SkM) lineage from hindlimbs\nof 7-8 week human embryos (Carnegie Stage: CS19-21). Cells are subjected to\nre-clustering to reveal potential heterogeneity within the myogenic subset.\nThree independent biological samples are included.\n
## 2                                             \nSingle cell transcriptomes of the total skeletal muscle tissues of\nhindlimbs from human fetuses of 12-14 weeks of development. Hematopoietic and\nendothelial cells are pre-excluded by FACS. Two independent biological samples\nare included.\n
## 3                           \nSingle cell transcriptomes of cell cultures differentiated from human\npluripotent stem cells (hPSCs) following the HX protocol (Xi et al., Cell Rep,\n2017) for 4 weeks. Cells are either non-enriched or enriched using an\nendogenous PAX7-driven GFP reporter.\n
## 4 \nSingle cell transcriptomes of the skeletal muscle (SkM) lineage from human\npluripotent stem cell (hPSCs) differentiated for 6 weeks using the HX protocol\n(Xi et al., Cell Rep, 2017). Cells are subjected to re-clustering to reveal\npotential heterogeneity within the myogenic subset.\n
## 5 \nSingle cell transcriptomes of the skeletal muscle (SkM) lineage from human\npluripotent stem cell (hPSCs) differentiated for 8 weeks using the HX protocol\n(Xi et al., Cell Rep, 2017). Cells are subjected to re-clustering to reveal\npotential heterogeneity within the myogenic subset.\n
## 6                                                           \nSingle cell transcriptomes of the gastrocnemius or quadriceps muscles from\nhuman juveniles. Hematopoietic and endothelial cells are pre-excluded by FACS.\nTwo independent biological samples at 7 and 11 years old are included.\n
##   unit                    coords
## 1      Seurat_tsne.coords.tsv.gz
## 2      Seurat_tsne.coords.tsv.gz
## 3      Seurat_tsne.coords.tsv.gz
## 4      Seurat_tsne.coords.tsv.gz
## 5      Seurat_tsne.coords.tsv.gz
## 6      Seurat_tsne.coords.tsv.gz
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                methods
## 1 <h3>Single cell preparation, cell capture and library construction</h3>\n<p>\nWhole hindlimbs of developmental week 5-9 human embryos and fetuses (feet\nexcluded for week 7.75-9), total hindlimb skeletal muscles of week 12-18 human\nfetuses, and gastrocnemius or quadriceps muscles from juvenile and adult human\nsubjects were digested into single cells. Dissociated cells were either sorted\n(fetal week 12 and above) to exclude the hematopoietic and endothelial lineages\nor directly used for downstream processing. For directed myogenic\ndifferentiation, human pluripotent stem cells (hPSCs) were differentiated using\nthree published protocols (HX protocol: Xi et al., Cell Rep, 2017; JC protocol:\nChal et al., Nat Biotechnol, 2015; MS protocol: Shelton et al., Stem Cell Rep,\n2014). Cell cultures were dissociated at indicated time points. Cells were\neither not enriched or sorted based on an endogenous PAX7-driven GFP reporter\nor cell surface combination of ERBB3+/NGFR+/HNK1- (Hicks, et al., Nat Cell\nBiol, 2018).\n</p>\n\n<p>\nPrepared single cell solutions were subjected to single cell capture and\ndroplet formation following instructions in the online Drop-seq protocol v.3.1\n(<a href="http://mccarrolllab.org/download/905/"\ntarget="_blank">http://mccarrolllab.org/download/905/</a>) and those published\nin the original Drop-seq paper (Macosko et al., Cell, 2015).\n</p>\n\n<h3>Raw read processing and computational analysis of single cell transcriptomes</h3>\n<p>\nThe raw sequencing reads were processed using the Drop-seq_tools-1.13 pipeline\nfrom the McCaroll lab\n(<a href="https://github.com/broadinstitute/Drop-seq/releases/tag/v1.13"\ntarget="_blank">https://github.com/broadinstitute/Drop-seq/releases/tag/v1.13</a>),\nfollowing the general guidelines from the Drop-seq Alignment Cookbook v1.2\n(<a href="https://github.com/broadinstitute/Drop-seq/files/2425535/Drop-seqAlignmentCookbookv1.2Jan2016.pdf"\ntarget="_blank">https://github.com/broadinstitute/Drop-seq/files/2425535/Drop-seqAlignmentCookbookv1.2Jan2016.pdf</a>)\n(Macosko et al., Cell, 2015). Indexed reads were aligned to the human reference\ngenome hg19 and digital gene expression matrices (DGEs) were generated by\ncounting the number of unique transcripts for each gene associated with each\ncell barcodes.\n</p>\n\n<p>\nDownstream computational analysis of scRNA-seq data was mainly performed using\nthe R package Seurat v2.3.3 (<a href="https://github.com/satijalab/seurat/releases/tag/v2.3.3"\ntarget="_blank">https://github.com/satijalab/seurat/releases/tag/v2.3.3</a>)\n(Butler et al., Nat Biotechnol, 2018) by largely following the standard guidelines from the Satija\nlab (<a href="https://satijalab.org/seurat/" target="_blank">https://satijalab.org/seurat/</a>).\nSeurat objects were generated with DGEs\nconstructed as described above. Violin plots of number of expressed genes and\nunique transcripts (nGene and nUMI, respectively) of each cell were generated\nand outliers with too high or too low nGene/nUMI were removed to exclude\npotential cell doublets/aggregates or low quality cells/cell debris,\nrespectively, based on each object/sample. Raw count data were normalized and\nscaled with regression on effects from cell cycle (Tirosh et al., Science,\n2016) and dissociation-associated stress response (van den Brink et al., Nat\nMethods, 2017). Highly variable genes from each dataset were obtained to\ncalculate the top variable PCs, which were further used in tSNE dimensional\nreduction and cell clustering. The skeletal muscle cell cluster from individual\ndataset was further isolated in silico and subjected to re-clustering to reveal\npotential myogenic subpopulations. Skeletal muscle stem and progenitor cells\nfrom different stage in vivo human samples were computationally purified and\nassembled to construct a trajectory of human myogenesis using diffusion map in\nSeurat. Myogenic progenitors derived from hPSCs in vitro were aligned to the\ntrajectory to determine their developmental identities.\n</p>\n
## 2 <h3>Single cell preparation, cell capture and library construction</h3>\n<p>\nWhole hindlimbs of developmental week 5-9 human embryos and fetuses (feet\nexcluded for week 7.75-9), total hindlimb skeletal muscles of week 12-18 human\nfetuses, and gastrocnemius or quadriceps muscles from juvenile and adult human\nsubjects were digested into single cells. Dissociated cells were either sorted\n(fetal week 12 and above) to exclude the hematopoietic and endothelial lineages\nor directly used for downstream processing. For directed myogenic\ndifferentiation, human pluripotent stem cells (hPSCs) were differentiated using\nthree published protocols (HX protocol: Xi et al., Cell Rep, 2017; JC protocol:\nChal et al., Nat Biotechnol, 2015; MS protocol: Shelton et al., Stem Cell Rep,\n2014). Cell cultures were dissociated at indicated time points. Cells were\neither not enriched or sorted based on an endogenous PAX7-driven GFP reporter\nor cell surface combination of ERBB3+/NGFR+/HNK1- (Hicks, et al., Nat Cell\nBiol, 2018).\n</p>\n\n<p>\nPrepared single cell solutions were subjected to single cell capture and\ndroplet formation following instructions in the online Drop-seq protocol v.3.1\n(<a href="http://mccarrolllab.org/download/905/"\ntarget="_blank">http://mccarrolllab.org/download/905/</a>) and those published\nin the original Drop-seq paper (Macosko et al., Cell, 2015).\n</p>\n\n<h3>Raw read processing and computational analysis of single cell transcriptomes</h3>\n<p>\nThe raw sequencing reads were processed using the Drop-seq_tools-1.13 pipeline\nfrom the McCaroll lab\n(<a href="https://github.com/broadinstitute/Drop-seq/releases/tag/v1.13"\ntarget="_blank">https://github.com/broadinstitute/Drop-seq/releases/tag/v1.13</a>),\nfollowing the general guidelines from the Drop-seq Alignment Cookbook v1.2\n(<a href="https://github.com/broadinstitute/Drop-seq/files/2425535/Drop-seqAlignmentCookbookv1.2Jan2016.pdf"\ntarget="_blank">https://github.com/broadinstitute/Drop-seq/files/2425535/Drop-seqAlignmentCookbookv1.2Jan2016.pdf</a>)\n(Macosko et al., Cell, 2015). Indexed reads were aligned to the human reference\ngenome hg19 and digital gene expression matrices (DGEs) were generated by\ncounting the number of unique transcripts for each gene associated with each\ncell barcodes.\n</p>\n\n<p>\nDownstream computational analysis of scRNA-seq data was mainly performed using\nthe R package Seurat v2.3.3 (<a href="https://github.com/satijalab/seurat/releases/tag/v2.3.3"\ntarget="_blank">https://github.com/satijalab/seurat/releases/tag/v2.3.3</a>)\n(Butler et al., Nat Biotechnol, 2018) by largely following the standard guidelines from the Satija\nlab (<a href="https://satijalab.org/seurat/" target="_blank">https://satijalab.org/seurat/</a>).\nSeurat objects were generated with DGEs\nconstructed as described above. Violin plots of number of expressed genes and\nunique transcripts (nGene and nUMI, respectively) of each cell were generated\nand outliers with too high or too low nGene/nUMI were removed to exclude\npotential cell doublets/aggregates or low quality cells/cell debris,\nrespectively, based on each object/sample. Raw count data were normalized and\nscaled with regression on effects from cell cycle (Tirosh et al., Science,\n2016) and dissociation-associated stress response (van den Brink et al., Nat\nMethods, 2017). Highly variable genes from each dataset were obtained to\ncalculate the top variable PCs, which were further used in tSNE dimensional\nreduction and cell clustering. The skeletal muscle cell cluster from individual\ndataset was further isolated in silico and subjected to re-clustering to reveal\npotential myogenic subpopulations. Skeletal muscle stem and progenitor cells\nfrom different stage in vivo human samples were computationally purified and\nassembled to construct a trajectory of human myogenesis using diffusion map in\nSeurat. Myogenic progenitors derived from hPSCs in vitro were aligned to the\ntrajectory to determine their developmental identities.\n</p>\n
## 3 <h3>Single cell preparation, cell capture and library construction</h3>\n<p>\nWhole hindlimbs of developmental week 5-9 human embryos and fetuses (feet\nexcluded for week 7.75-9), total hindlimb skeletal muscles of week 12-18 human\nfetuses, and gastrocnemius or quadriceps muscles from juvenile and adult human\nsubjects were digested into single cells. Dissociated cells were either sorted\n(fetal week 12 and above) to exclude the hematopoietic and endothelial lineages\nor directly used for downstream processing. For directed myogenic\ndifferentiation, human pluripotent stem cells (hPSCs) were differentiated using\nthree published protocols (HX protocol: Xi et al., Cell Rep, 2017; JC protocol:\nChal et al., Nat Biotechnol, 2015; MS protocol: Shelton et al., Stem Cell Rep,\n2014). Cell cultures were dissociated at indicated time points. Cells were\neither not enriched or sorted based on an endogenous PAX7-driven GFP reporter\nor cell surface combination of ERBB3+/NGFR+/HNK1- (Hicks, et al., Nat Cell\nBiol, 2018).\n</p>\n\n<p>\nPrepared single cell solutions were subjected to single cell capture and\ndroplet formation following instructions in the online Drop-seq protocol v.3.1\n(<a href="http://mccarrolllab.org/download/905/"\ntarget="_blank">http://mccarrolllab.org/download/905/</a>) and those published\nin the original Drop-seq paper (Macosko et al., Cell, 2015).\n</p>\n\n<h3>Raw read processing and computational analysis of single cell transcriptomes</h3>\n<p>\nThe raw sequencing reads were processed using the Drop-seq_tools-1.13 pipeline\nfrom the McCaroll lab\n(<a href="https://github.com/broadinstitute/Drop-seq/releases/tag/v1.13"\ntarget="_blank">https://github.com/broadinstitute/Drop-seq/releases/tag/v1.13</a>),\nfollowing the general guidelines from the Drop-seq Alignment Cookbook v1.2\n(<a href="https://github.com/broadinstitute/Drop-seq/files/2425535/Drop-seqAlignmentCookbookv1.2Jan2016.pdf"\ntarget="_blank">https://github.com/broadinstitute/Drop-seq/files/2425535/Drop-seqAlignmentCookbookv1.2Jan2016.pdf</a>)\n(Macosko et al., Cell, 2015). Indexed reads were aligned to the human reference\ngenome hg19 and digital gene expression matrices (DGEs) were generated by\ncounting the number of unique transcripts for each gene associated with each\ncell barcodes.\n</p>\n\n<p>\nDownstream computational analysis of scRNA-seq data was mainly performed using\nthe R package Seurat v2.3.3 (<a href="https://github.com/satijalab/seurat/releases/tag/v2.3.3"\ntarget="_blank">https://github.com/satijalab/seurat/releases/tag/v2.3.3</a>)\n(Butler et al., Nat Biotechnol, 2018) by largely following the standard guidelines from the Satija\nlab (<a href="https://satijalab.org/seurat/" target="_blank">https://satijalab.org/seurat/</a>).\nSeurat objects were generated with DGEs\nconstructed as described above. Violin plots of number of expressed genes and\nunique transcripts (nGene and nUMI, respectively) of each cell were generated\nand outliers with too high or too low nGene/nUMI were removed to exclude\npotential cell doublets/aggregates or low quality cells/cell debris,\nrespectively, based on each object/sample. Raw count data were normalized and\nscaled with regression on effects from cell cycle (Tirosh et al., Science,\n2016) and dissociation-associated stress response (van den Brink et al., Nat\nMethods, 2017). Highly variable genes from each dataset were obtained to\ncalculate the top variable PCs, which were further used in tSNE dimensional\nreduction and cell clustering. The skeletal muscle cell cluster from individual\ndataset was further isolated in silico and subjected to re-clustering to reveal\npotential myogenic subpopulations. Skeletal muscle stem and progenitor cells\nfrom different stage in vivo human samples were computationally purified and\nassembled to construct a trajectory of human myogenesis using diffusion map in\nSeurat. Myogenic progenitors derived from hPSCs in vitro were aligned to the\ntrajectory to determine their developmental identities.\n</p>\n
## 4 <h3>Single cell preparation, cell capture and library construction</h3>\n<p>\nWhole hindlimbs of developmental week 5-9 human embryos and fetuses (feet\nexcluded for week 7.75-9), total hindlimb skeletal muscles of week 12-18 human\nfetuses, and gastrocnemius or quadriceps muscles from juvenile and adult human\nsubjects were digested into single cells. Dissociated cells were either sorted\n(fetal week 12 and above) to exclude the hematopoietic and endothelial lineages\nor directly used for downstream processing. For directed myogenic\ndifferentiation, human pluripotent stem cells (hPSCs) were differentiated using\nthree published protocols (HX protocol: Xi et al., Cell Rep, 2017; JC protocol:\nChal et al., Nat Biotechnol, 2015; MS protocol: Shelton et al., Stem Cell Rep,\n2014). Cell cultures were dissociated at indicated time points. Cells were\neither not enriched or sorted based on an endogenous PAX7-driven GFP reporter\nor cell surface combination of ERBB3+/NGFR+/HNK1- (Hicks, et al., Nat Cell\nBiol, 2018).\n</p>\n\n<p>\nPrepared single cell solutions were subjected to single cell capture and\ndroplet formation following instructions in the online Drop-seq protocol v.3.1\n(<a href="http://mccarrolllab.org/download/905/"\ntarget="_blank">http://mccarrolllab.org/download/905/</a>) and those published\nin the original Drop-seq paper (Macosko et al., Cell, 2015).\n</p>\n\n<h3>Raw read processing and computational analysis of single cell transcriptomes</h3>\n<p>\nThe raw sequencing reads were processed using the Drop-seq_tools-1.13 pipeline\nfrom the McCaroll lab\n(<a href="https://github.com/broadinstitute/Drop-seq/releases/tag/v1.13"\ntarget="_blank">https://github.com/broadinstitute/Drop-seq/releases/tag/v1.13</a>),\nfollowing the general guidelines from the Drop-seq Alignment Cookbook v1.2\n(<a href="https://github.com/broadinstitute/Drop-seq/files/2425535/Drop-seqAlignmentCookbookv1.2Jan2016.pdf"\ntarget="_blank">https://github.com/broadinstitute/Drop-seq/files/2425535/Drop-seqAlignmentCookbookv1.2Jan2016.pdf</a>)\n(Macosko et al., Cell, 2015). Indexed reads were aligned to the human reference\ngenome hg19 and digital gene expression matrices (DGEs) were generated by\ncounting the number of unique transcripts for each gene associated with each\ncell barcodes.\n</p>\n\n<p>\nDownstream computational analysis of scRNA-seq data was mainly performed using\nthe R package Seurat v2.3.3 (<a href="https://github.com/satijalab/seurat/releases/tag/v2.3.3"\ntarget="_blank">https://github.com/satijalab/seurat/releases/tag/v2.3.3</a>)\n(Butler et al., Nat Biotechnol, 2018) by largely following the standard guidelines from the Satija\nlab (<a href="https://satijalab.org/seurat/" target="_blank">https://satijalab.org/seurat/</a>).\nSeurat objects were generated with DGEs\nconstructed as described above. Violin plots of number of expressed genes and\nunique transcripts (nGene and nUMI, respectively) of each cell were generated\nand outliers with too high or too low nGene/nUMI were removed to exclude\npotential cell doublets/aggregates or low quality cells/cell debris,\nrespectively, based on each object/sample. Raw count data were normalized and\nscaled with regression on effects from cell cycle (Tirosh et al., Science,\n2016) and dissociation-associated stress response (van den Brink et al., Nat\nMethods, 2017). Highly variable genes from each dataset were obtained to\ncalculate the top variable PCs, which were further used in tSNE dimensional\nreduction and cell clustering. The skeletal muscle cell cluster from individual\ndataset was further isolated in silico and subjected to re-clustering to reveal\npotential myogenic subpopulations. Skeletal muscle stem and progenitor cells\nfrom different stage in vivo human samples were computationally purified and\nassembled to construct a trajectory of human myogenesis using diffusion map in\nSeurat. Myogenic progenitors derived from hPSCs in vitro were aligned to the\ntrajectory to determine their developmental identities.\n</p>\n
## 5 <h3>Single cell preparation, cell capture and library construction</h3>\n<p>\nWhole hindlimbs of developmental week 5-9 human embryos and fetuses (feet\nexcluded for week 7.75-9), total hindlimb skeletal muscles of week 12-18 human\nfetuses, and gastrocnemius or quadriceps muscles from juvenile and adult human\nsubjects were digested into single cells. Dissociated cells were either sorted\n(fetal week 12 and above) to exclude the hematopoietic and endothelial lineages\nor directly used for downstream processing. For directed myogenic\ndifferentiation, human pluripotent stem cells (hPSCs) were differentiated using\nthree published protocols (HX protocol: Xi et al., Cell Rep, 2017; JC protocol:\nChal et al., Nat Biotechnol, 2015; MS protocol: Shelton et al., Stem Cell Rep,\n2014). Cell cultures were dissociated at indicated time points. Cells were\neither not enriched or sorted based on an endogenous PAX7-driven GFP reporter\nor cell surface combination of ERBB3+/NGFR+/HNK1- (Hicks, et al., Nat Cell\nBiol, 2018).\n</p>\n\n<p>\nPrepared single cell solutions were subjected to single cell capture and\ndroplet formation following instructions in the online Drop-seq protocol v.3.1\n(<a href="http://mccarrolllab.org/download/905/"\ntarget="_blank">http://mccarrolllab.org/download/905/</a>) and those published\nin the original Drop-seq paper (Macosko et al., Cell, 2015).\n</p>\n\n<h3>Raw read processing and computational analysis of single cell transcriptomes</h3>\n<p>\nThe raw sequencing reads were processed using the Drop-seq_tools-1.13 pipeline\nfrom the McCaroll lab\n(<a href="https://github.com/broadinstitute/Drop-seq/releases/tag/v1.13"\ntarget="_blank">https://github.com/broadinstitute/Drop-seq/releases/tag/v1.13</a>),\nfollowing the general guidelines from the Drop-seq Alignment Cookbook v1.2\n(<a href="https://github.com/broadinstitute/Drop-seq/files/2425535/Drop-seqAlignmentCookbookv1.2Jan2016.pdf"\ntarget="_blank">https://github.com/broadinstitute/Drop-seq/files/2425535/Drop-seqAlignmentCookbookv1.2Jan2016.pdf</a>)\n(Macosko et al., Cell, 2015). Indexed reads were aligned to the human reference\ngenome hg19 and digital gene expression matrices (DGEs) were generated by\ncounting the number of unique transcripts for each gene associated with each\ncell barcodes.\n</p>\n\n<p>\nDownstream computational analysis of scRNA-seq data was mainly performed using\nthe R package Seurat v2.3.3 (<a href="https://github.com/satijalab/seurat/releases/tag/v2.3.3"\ntarget="_blank">https://github.com/satijalab/seurat/releases/tag/v2.3.3</a>)\n(Butler et al., Nat Biotechnol, 2018) by largely following the standard guidelines from the Satija\nlab (<a href="https://satijalab.org/seurat/" target="_blank">https://satijalab.org/seurat/</a>).\nSeurat objects were generated with DGEs\nconstructed as described above. Violin plots of number of expressed genes and\nunique transcripts (nGene and nUMI, respectively) of each cell were generated\nand outliers with too high or too low nGene/nUMI were removed to exclude\npotential cell doublets/aggregates or low quality cells/cell debris,\nrespectively, based on each object/sample. Raw count data were normalized and\nscaled with regression on effects from cell cycle (Tirosh et al., Science,\n2016) and dissociation-associated stress response (van den Brink et al., Nat\nMethods, 2017). Highly variable genes from each dataset were obtained to\ncalculate the top variable PCs, which were further used in tSNE dimensional\nreduction and cell clustering. The skeletal muscle cell cluster from individual\ndataset was further isolated in silico and subjected to re-clustering to reveal\npotential myogenic subpopulations. Skeletal muscle stem and progenitor cells\nfrom different stage in vivo human samples were computationally purified and\nassembled to construct a trajectory of human myogenesis using diffusion map in\nSeurat. Myogenic progenitors derived from hPSCs in vitro were aligned to the\ntrajectory to determine their developmental identities.\n</p>\n
## 6 <h3>Single cell preparation, cell capture and library construction</h3>\n<p>\nWhole hindlimbs of developmental week 5-9 human embryos and fetuses (feet\nexcluded for week 7.75-9), total hindlimb skeletal muscles of week 12-18 human\nfetuses, and gastrocnemius or quadriceps muscles from juvenile and adult human\nsubjects were digested into single cells. Dissociated cells were either sorted\n(fetal week 12 and above) to exclude the hematopoietic and endothelial lineages\nor directly used for downstream processing. For directed myogenic\ndifferentiation, human pluripotent stem cells (hPSCs) were differentiated using\nthree published protocols (HX protocol: Xi et al., Cell Rep, 2017; JC protocol:\nChal et al., Nat Biotechnol, 2015; MS protocol: Shelton et al., Stem Cell Rep,\n2014). Cell cultures were dissociated at indicated time points. Cells were\neither not enriched or sorted based on an endogenous PAX7-driven GFP reporter\nor cell surface combination of ERBB3+/NGFR+/HNK1- (Hicks, et al., Nat Cell\nBiol, 2018).\n</p>\n\n<p>\nPrepared single cell solutions were subjected to single cell capture and\ndroplet formation following instructions in the online Drop-seq protocol v.3.1\n(<a href="http://mccarrolllab.org/download/905/"\ntarget="_blank">http://mccarrolllab.org/download/905/</a>) and those published\nin the original Drop-seq paper (Macosko et al., Cell, 2015).\n</p>\n\n<h3>Raw read processing and computational analysis of single cell transcriptomes</h3>\n<p>\nThe raw sequencing reads were processed using the Drop-seq_tools-1.13 pipeline\nfrom the McCaroll lab\n(<a href="https://github.com/broadinstitute/Drop-seq/releases/tag/v1.13"\ntarget="_blank">https://github.com/broadinstitute/Drop-seq/releases/tag/v1.13</a>),\nfollowing the general guidelines from the Drop-seq Alignment Cookbook v1.2\n(<a href="https://github.com/broadinstitute/Drop-seq/files/2425535/Drop-seqAlignmentCookbookv1.2Jan2016.pdf"\ntarget="_blank">https://github.com/broadinstitute/Drop-seq/files/2425535/Drop-seqAlignmentCookbookv1.2Jan2016.pdf</a>)\n(Macosko et al., Cell, 2015). Indexed reads were aligned to the human reference\ngenome hg19 and digital gene expression matrices (DGEs) were generated by\ncounting the number of unique transcripts for each gene associated with each\ncell barcodes.\n</p>\n\n<p>\nDownstream computational analysis of scRNA-seq data was mainly performed using\nthe R package Seurat v2.3.3 (<a href="https://github.com/satijalab/seurat/releases/tag/v2.3.3"\ntarget="_blank">https://github.com/satijalab/seurat/releases/tag/v2.3.3</a>)\n(Butler et al., Nat Biotechnol, 2018) by largely following the standard guidelines from the Satija\nlab (<a href="https://satijalab.org/seurat/" target="_blank">https://satijalab.org/seurat/</a>).\nSeurat objects were generated with DGEs\nconstructed as described above. Violin plots of number of expressed genes and\nunique transcripts (nGene and nUMI, respectively) of each cell were generated\nand outliers with too high or too low nGene/nUMI were removed to exclude\npotential cell doublets/aggregates or low quality cells/cell debris,\nrespectively, based on each object/sample. Raw count data were normalized and\nscaled with regression on effects from cell cycle (Tirosh et al., Science,\n2016) and dissociation-associated stress response (van den Brink et al., Nat\nMethods, 2017). Highly variable genes from each dataset were obtained to\ncalculate the top variable PCs, which were further used in tSNE dimensional\nreduction and cell clustering. The skeletal muscle cell cluster from individual\ndataset was further isolated in silico and subjected to re-clustering to reveal\npotential myogenic subpopulations. Skeletal muscle stem and progenitor cells\nfrom different stage in vivo human samples were computationally purified and\nassembled to construct a trajectory of human myogenesis using diffusion map in\nSeurat. Myogenic progenitors derived from hPSCs in vitro were aligned to the\ntrajectory to determine their developmental identities.\n</p>\n
##   geo
## 1    
## 2    
## 3    
## 4    
## 5    
## 6

Extract cell type composition

GEfetch2R provides ExtractCBComposition to extract cell type annotation and composition.

hbb.sample.ct <- ExtractCBComposition(
  json.folder = "/Volumes/soyabean/GEfetch2R/cell_browser/json",
  sample.df = hbb.sample.df
)

Show the extracted cell type annotation and composition:

head(hbb.sample.ct)
##        shortLabel                           subLabel  CellType Num tags
## 1 Skeletal Muscle Embryonic Week 7-8 Myogenic Subset        MP 785     
## 2 Skeletal Muscle Embryonic Week 7-8 Myogenic Subset        MB 303     
## 3 Skeletal Muscle Embryonic Week 7-8 Myogenic Subset SkM.Mesen 264     
## 4 Skeletal Muscle Embryonic Week 7-8 Myogenic Subset        MC  96     
## 5 Skeletal Muscle   Fetal Week 12-14 Hindlimb Muscle       MSC 822     
## 6 Skeletal Muscle   Fetal Week 12-14 Hindlimb Muscle       SkM 554     
##                       body_parts       diseases                 organisms
## 1 muscle, skeletal muscle|parent Healthy|parent Human (H. sapiens)|parent
## 2 muscle, skeletal muscle|parent Healthy|parent Human (H. sapiens)|parent
## 3 muscle, skeletal muscle|parent Healthy|parent Human (H. sapiens)|parent
## 4 muscle, skeletal muscle|parent Healthy|parent Human (H. sapiens)|parent
## 5 muscle, skeletal muscle|parent Healthy|parent Human (H. sapiens)|parent
## 6 muscle, skeletal muscle|parent Healthy|parent Human (H. sapiens)|parent
##   projects life_stages domains sources sampleCount assays
## 1                                   NA        1448       
## 2                                   NA        1448       
## 3                                   NA        1448       
## 4                                   NA        1448       
## 5                                   NA        1545       
## 6                                   NA        1545       
##                                title paper
## 1 Embryonic Week 7-8 Myogenic Subset      
## 2 Embryonic Week 7-8 Myogenic Subset      
## 3 Embryonic Week 7-8 Myogenic Subset      
## 4 Embryonic Week 7-8 Myogenic Subset      
## 5   Fetal Week 12-14 Hindlimb Muscle      
## 6   Fetal Week 12-14 Hindlimb Muscle      
##                                                                                                                                                                                                                                                                                           abstract
## 1 \nSingle cell transcriptomes of the skeletal muscle (SkM) lineage from hindlimbs\nof 7-8 week human embryos (Carnegie Stage: CS19-21). Cells are subjected to\nre-clustering to reveal potential heterogeneity within the myogenic subset.\nThree independent biological samples are included.\n
## 2 \nSingle cell transcriptomes of the skeletal muscle (SkM) lineage from hindlimbs\nof 7-8 week human embryos (Carnegie Stage: CS19-21). Cells are subjected to\nre-clustering to reveal potential heterogeneity within the myogenic subset.\nThree independent biological samples are included.\n
## 3 \nSingle cell transcriptomes of the skeletal muscle (SkM) lineage from hindlimbs\nof 7-8 week human embryos (Carnegie Stage: CS19-21). Cells are subjected to\nre-clustering to reveal potential heterogeneity within the myogenic subset.\nThree independent biological samples are included.\n
## 4 \nSingle cell transcriptomes of the skeletal muscle (SkM) lineage from hindlimbs\nof 7-8 week human embryos (Carnegie Stage: CS19-21). Cells are subjected to\nre-clustering to reveal potential heterogeneity within the myogenic subset.\nThree independent biological samples are included.\n
## 5                                             \nSingle cell transcriptomes of the total skeletal muscle tissues of\nhindlimbs from human fetuses of 12-14 weeks of development. Hematopoietic and\nendothelial cells are pre-excluded by FACS. Two independent biological samples\nare included.\n
## 6                                             \nSingle cell transcriptomes of the total skeletal muscle tissues of\nhindlimbs from human fetuses of 12-14 weeks of development. Hematopoietic and\nendothelial cells are pre-excluded by FACS. Two independent biological samples\nare included.\n
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                methods
## 1 <h3>Single cell preparation, cell capture and library construction</h3>\n<p>\nWhole hindlimbs of developmental week 5-9 human embryos and fetuses (feet\nexcluded for week 7.75-9), total hindlimb skeletal muscles of week 12-18 human\nfetuses, and gastrocnemius or quadriceps muscles from juvenile and adult human\nsubjects were digested into single cells. Dissociated cells were either sorted\n(fetal week 12 and above) to exclude the hematopoietic and endothelial lineages\nor directly used for downstream processing. For directed myogenic\ndifferentiation, human pluripotent stem cells (hPSCs) were differentiated using\nthree published protocols (HX protocol: Xi et al., Cell Rep, 2017; JC protocol:\nChal et al., Nat Biotechnol, 2015; MS protocol: Shelton et al., Stem Cell Rep,\n2014). Cell cultures were dissociated at indicated time points. Cells were\neither not enriched or sorted based on an endogenous PAX7-driven GFP reporter\nor cell surface combination of ERBB3+/NGFR+/HNK1- (Hicks, et al., Nat Cell\nBiol, 2018).\n</p>\n\n<p>\nPrepared single cell solutions were subjected to single cell capture and\ndroplet formation following instructions in the online Drop-seq protocol v.3.1\n(<a href="http://mccarrolllab.org/download/905/"\ntarget="_blank">http://mccarrolllab.org/download/905/</a>) and those published\nin the original Drop-seq paper (Macosko et al., Cell, 2015).\n</p>\n\n<h3>Raw read processing and computational analysis of single cell transcriptomes</h3>\n<p>\nThe raw sequencing reads were processed using the Drop-seq_tools-1.13 pipeline\nfrom the McCaroll lab\n(<a href="https://github.com/broadinstitute/Drop-seq/releases/tag/v1.13"\ntarget="_blank">https://github.com/broadinstitute/Drop-seq/releases/tag/v1.13</a>),\nfollowing the general guidelines from the Drop-seq Alignment Cookbook v1.2\n(<a href="https://github.com/broadinstitute/Drop-seq/files/2425535/Drop-seqAlignmentCookbookv1.2Jan2016.pdf"\ntarget="_blank">https://github.com/broadinstitute/Drop-seq/files/2425535/Drop-seqAlignmentCookbookv1.2Jan2016.pdf</a>)\n(Macosko et al., Cell, 2015). Indexed reads were aligned to the human reference\ngenome hg19 and digital gene expression matrices (DGEs) were generated by\ncounting the number of unique transcripts for each gene associated with each\ncell barcodes.\n</p>\n\n<p>\nDownstream computational analysis of scRNA-seq data was mainly performed using\nthe R package Seurat v2.3.3 (<a href="https://github.com/satijalab/seurat/releases/tag/v2.3.3"\ntarget="_blank">https://github.com/satijalab/seurat/releases/tag/v2.3.3</a>)\n(Butler et al., Nat Biotechnol, 2018) by largely following the standard guidelines from the Satija\nlab (<a href="https://satijalab.org/seurat/" target="_blank">https://satijalab.org/seurat/</a>).\nSeurat objects were generated with DGEs\nconstructed as described above. Violin plots of number of expressed genes and\nunique transcripts (nGene and nUMI, respectively) of each cell were generated\nand outliers with too high or too low nGene/nUMI were removed to exclude\npotential cell doublets/aggregates or low quality cells/cell debris,\nrespectively, based on each object/sample. Raw count data were normalized and\nscaled with regression on effects from cell cycle (Tirosh et al., Science,\n2016) and dissociation-associated stress response (van den Brink et al., Nat\nMethods, 2017). Highly variable genes from each dataset were obtained to\ncalculate the top variable PCs, which were further used in tSNE dimensional\nreduction and cell clustering. The skeletal muscle cell cluster from individual\ndataset was further isolated in silico and subjected to re-clustering to reveal\npotential myogenic subpopulations. Skeletal muscle stem and progenitor cells\nfrom different stage in vivo human samples were computationally purified and\nassembled to construct a trajectory of human myogenesis using diffusion map in\nSeurat. Myogenic progenitors derived from hPSCs in vitro were aligned to the\ntrajectory to determine their developmental identities.\n</p>\n
## 2 <h3>Single cell preparation, cell capture and library construction</h3>\n<p>\nWhole hindlimbs of developmental week 5-9 human embryos and fetuses (feet\nexcluded for week 7.75-9), total hindlimb skeletal muscles of week 12-18 human\nfetuses, and gastrocnemius or quadriceps muscles from juvenile and adult human\nsubjects were digested into single cells. Dissociated cells were either sorted\n(fetal week 12 and above) to exclude the hematopoietic and endothelial lineages\nor directly used for downstream processing. For directed myogenic\ndifferentiation, human pluripotent stem cells (hPSCs) were differentiated using\nthree published protocols (HX protocol: Xi et al., Cell Rep, 2017; JC protocol:\nChal et al., Nat Biotechnol, 2015; MS protocol: Shelton et al., Stem Cell Rep,\n2014). Cell cultures were dissociated at indicated time points. Cells were\neither not enriched or sorted based on an endogenous PAX7-driven GFP reporter\nor cell surface combination of ERBB3+/NGFR+/HNK1- (Hicks, et al., Nat Cell\nBiol, 2018).\n</p>\n\n<p>\nPrepared single cell solutions were subjected to single cell capture and\ndroplet formation following instructions in the online Drop-seq protocol v.3.1\n(<a href="http://mccarrolllab.org/download/905/"\ntarget="_blank">http://mccarrolllab.org/download/905/</a>) and those published\nin the original Drop-seq paper (Macosko et al., Cell, 2015).\n</p>\n\n<h3>Raw read processing and computational analysis of single cell transcriptomes</h3>\n<p>\nThe raw sequencing reads were processed using the Drop-seq_tools-1.13 pipeline\nfrom the McCaroll lab\n(<a href="https://github.com/broadinstitute/Drop-seq/releases/tag/v1.13"\ntarget="_blank">https://github.com/broadinstitute/Drop-seq/releases/tag/v1.13</a>),\nfollowing the general guidelines from the Drop-seq Alignment Cookbook v1.2\n(<a href="https://github.com/broadinstitute/Drop-seq/files/2425535/Drop-seqAlignmentCookbookv1.2Jan2016.pdf"\ntarget="_blank">https://github.com/broadinstitute/Drop-seq/files/2425535/Drop-seqAlignmentCookbookv1.2Jan2016.pdf</a>)\n(Macosko et al., Cell, 2015). Indexed reads were aligned to the human reference\ngenome hg19 and digital gene expression matrices (DGEs) were generated by\ncounting the number of unique transcripts for each gene associated with each\ncell barcodes.\n</p>\n\n<p>\nDownstream computational analysis of scRNA-seq data was mainly performed using\nthe R package Seurat v2.3.3 (<a href="https://github.com/satijalab/seurat/releases/tag/v2.3.3"\ntarget="_blank">https://github.com/satijalab/seurat/releases/tag/v2.3.3</a>)\n(Butler et al., Nat Biotechnol, 2018) by largely following the standard guidelines from the Satija\nlab (<a href="https://satijalab.org/seurat/" target="_blank">https://satijalab.org/seurat/</a>).\nSeurat objects were generated with DGEs\nconstructed as described above. Violin plots of number of expressed genes and\nunique transcripts (nGene and nUMI, respectively) of each cell were generated\nand outliers with too high or too low nGene/nUMI were removed to exclude\npotential cell doublets/aggregates or low quality cells/cell debris,\nrespectively, based on each object/sample. Raw count data were normalized and\nscaled with regression on effects from cell cycle (Tirosh et al., Science,\n2016) and dissociation-associated stress response (van den Brink et al., Nat\nMethods, 2017). Highly variable genes from each dataset were obtained to\ncalculate the top variable PCs, which were further used in tSNE dimensional\nreduction and cell clustering. The skeletal muscle cell cluster from individual\ndataset was further isolated in silico and subjected to re-clustering to reveal\npotential myogenic subpopulations. Skeletal muscle stem and progenitor cells\nfrom different stage in vivo human samples were computationally purified and\nassembled to construct a trajectory of human myogenesis using diffusion map in\nSeurat. Myogenic progenitors derived from hPSCs in vitro were aligned to the\ntrajectory to determine their developmental identities.\n</p>\n
## 3 <h3>Single cell preparation, cell capture and library construction</h3>\n<p>\nWhole hindlimbs of developmental week 5-9 human embryos and fetuses (feet\nexcluded for week 7.75-9), total hindlimb skeletal muscles of week 12-18 human\nfetuses, and gastrocnemius or quadriceps muscles from juvenile and adult human\nsubjects were digested into single cells. Dissociated cells were either sorted\n(fetal week 12 and above) to exclude the hematopoietic and endothelial lineages\nor directly used for downstream processing. For directed myogenic\ndifferentiation, human pluripotent stem cells (hPSCs) were differentiated using\nthree published protocols (HX protocol: Xi et al., Cell Rep, 2017; JC protocol:\nChal et al., Nat Biotechnol, 2015; MS protocol: Shelton et al., Stem Cell Rep,\n2014). Cell cultures were dissociated at indicated time points. Cells were\neither not enriched or sorted based on an endogenous PAX7-driven GFP reporter\nor cell surface combination of ERBB3+/NGFR+/HNK1- (Hicks, et al., Nat Cell\nBiol, 2018).\n</p>\n\n<p>\nPrepared single cell solutions were subjected to single cell capture and\ndroplet formation following instructions in the online Drop-seq protocol v.3.1\n(<a href="http://mccarrolllab.org/download/905/"\ntarget="_blank">http://mccarrolllab.org/download/905/</a>) and those published\nin the original Drop-seq paper (Macosko et al., Cell, 2015).\n</p>\n\n<h3>Raw read processing and computational analysis of single cell transcriptomes</h3>\n<p>\nThe raw sequencing reads were processed using the Drop-seq_tools-1.13 pipeline\nfrom the McCaroll lab\n(<a href="https://github.com/broadinstitute/Drop-seq/releases/tag/v1.13"\ntarget="_blank">https://github.com/broadinstitute/Drop-seq/releases/tag/v1.13</a>),\nfollowing the general guidelines from the Drop-seq Alignment Cookbook v1.2\n(<a href="https://github.com/broadinstitute/Drop-seq/files/2425535/Drop-seqAlignmentCookbookv1.2Jan2016.pdf"\ntarget="_blank">https://github.com/broadinstitute/Drop-seq/files/2425535/Drop-seqAlignmentCookbookv1.2Jan2016.pdf</a>)\n(Macosko et al., Cell, 2015). Indexed reads were aligned to the human reference\ngenome hg19 and digital gene expression matrices (DGEs) were generated by\ncounting the number of unique transcripts for each gene associated with each\ncell barcodes.\n</p>\n\n<p>\nDownstream computational analysis of scRNA-seq data was mainly performed using\nthe R package Seurat v2.3.3 (<a href="https://github.com/satijalab/seurat/releases/tag/v2.3.3"\ntarget="_blank">https://github.com/satijalab/seurat/releases/tag/v2.3.3</a>)\n(Butler et al., Nat Biotechnol, 2018) by largely following the standard guidelines from the Satija\nlab (<a href="https://satijalab.org/seurat/" target="_blank">https://satijalab.org/seurat/</a>).\nSeurat objects were generated with DGEs\nconstructed as described above. Violin plots of number of expressed genes and\nunique transcripts (nGene and nUMI, respectively) of each cell were generated\nand outliers with too high or too low nGene/nUMI were removed to exclude\npotential cell doublets/aggregates or low quality cells/cell debris,\nrespectively, based on each object/sample. Raw count data were normalized and\nscaled with regression on effects from cell cycle (Tirosh et al., Science,\n2016) and dissociation-associated stress response (van den Brink et al., Nat\nMethods, 2017). Highly variable genes from each dataset were obtained to\ncalculate the top variable PCs, which were further used in tSNE dimensional\nreduction and cell clustering. The skeletal muscle cell cluster from individual\ndataset was further isolated in silico and subjected to re-clustering to reveal\npotential myogenic subpopulations. Skeletal muscle stem and progenitor cells\nfrom different stage in vivo human samples were computationally purified and\nassembled to construct a trajectory of human myogenesis using diffusion map in\nSeurat. Myogenic progenitors derived from hPSCs in vitro were aligned to the\ntrajectory to determine their developmental identities.\n</p>\n
## 4 <h3>Single cell preparation, cell capture and library construction</h3>\n<p>\nWhole hindlimbs of developmental week 5-9 human embryos and fetuses (feet\nexcluded for week 7.75-9), total hindlimb skeletal muscles of week 12-18 human\nfetuses, and gastrocnemius or quadriceps muscles from juvenile and adult human\nsubjects were digested into single cells. Dissociated cells were either sorted\n(fetal week 12 and above) to exclude the hematopoietic and endothelial lineages\nor directly used for downstream processing. For directed myogenic\ndifferentiation, human pluripotent stem cells (hPSCs) were differentiated using\nthree published protocols (HX protocol: Xi et al., Cell Rep, 2017; JC protocol:\nChal et al., Nat Biotechnol, 2015; MS protocol: Shelton et al., Stem Cell Rep,\n2014). Cell cultures were dissociated at indicated time points. Cells were\neither not enriched or sorted based on an endogenous PAX7-driven GFP reporter\nor cell surface combination of ERBB3+/NGFR+/HNK1- (Hicks, et al., Nat Cell\nBiol, 2018).\n</p>\n\n<p>\nPrepared single cell solutions were subjected to single cell capture and\ndroplet formation following instructions in the online Drop-seq protocol v.3.1\n(<a href="http://mccarrolllab.org/download/905/"\ntarget="_blank">http://mccarrolllab.org/download/905/</a>) and those published\nin the original Drop-seq paper (Macosko et al., Cell, 2015).\n</p>\n\n<h3>Raw read processing and computational analysis of single cell transcriptomes</h3>\n<p>\nThe raw sequencing reads were processed using the Drop-seq_tools-1.13 pipeline\nfrom the McCaroll lab\n(<a href="https://github.com/broadinstitute/Drop-seq/releases/tag/v1.13"\ntarget="_blank">https://github.com/broadinstitute/Drop-seq/releases/tag/v1.13</a>),\nfollowing the general guidelines from the Drop-seq Alignment Cookbook v1.2\n(<a href="https://github.com/broadinstitute/Drop-seq/files/2425535/Drop-seqAlignmentCookbookv1.2Jan2016.pdf"\ntarget="_blank">https://github.com/broadinstitute/Drop-seq/files/2425535/Drop-seqAlignmentCookbookv1.2Jan2016.pdf</a>)\n(Macosko et al., Cell, 2015). Indexed reads were aligned to the human reference\ngenome hg19 and digital gene expression matrices (DGEs) were generated by\ncounting the number of unique transcripts for each gene associated with each\ncell barcodes.\n</p>\n\n<p>\nDownstream computational analysis of scRNA-seq data was mainly performed using\nthe R package Seurat v2.3.3 (<a href="https://github.com/satijalab/seurat/releases/tag/v2.3.3"\ntarget="_blank">https://github.com/satijalab/seurat/releases/tag/v2.3.3</a>)\n(Butler et al., Nat Biotechnol, 2018) by largely following the standard guidelines from the Satija\nlab (<a href="https://satijalab.org/seurat/" target="_blank">https://satijalab.org/seurat/</a>).\nSeurat objects were generated with DGEs\nconstructed as described above. Violin plots of number of expressed genes and\nunique transcripts (nGene and nUMI, respectively) of each cell were generated\nand outliers with too high or too low nGene/nUMI were removed to exclude\npotential cell doublets/aggregates or low quality cells/cell debris,\nrespectively, based on each object/sample. Raw count data were normalized and\nscaled with regression on effects from cell cycle (Tirosh et al., Science,\n2016) and dissociation-associated stress response (van den Brink et al., Nat\nMethods, 2017). Highly variable genes from each dataset were obtained to\ncalculate the top variable PCs, which were further used in tSNE dimensional\nreduction and cell clustering. The skeletal muscle cell cluster from individual\ndataset was further isolated in silico and subjected to re-clustering to reveal\npotential myogenic subpopulations. Skeletal muscle stem and progenitor cells\nfrom different stage in vivo human samples were computationally purified and\nassembled to construct a trajectory of human myogenesis using diffusion map in\nSeurat. Myogenic progenitors derived from hPSCs in vitro were aligned to the\ntrajectory to determine their developmental identities.\n</p>\n
## 5 <h3>Single cell preparation, cell capture and library construction</h3>\n<p>\nWhole hindlimbs of developmental week 5-9 human embryos and fetuses (feet\nexcluded for week 7.75-9), total hindlimb skeletal muscles of week 12-18 human\nfetuses, and gastrocnemius or quadriceps muscles from juvenile and adult human\nsubjects were digested into single cells. Dissociated cells were either sorted\n(fetal week 12 and above) to exclude the hematopoietic and endothelial lineages\nor directly used for downstream processing. For directed myogenic\ndifferentiation, human pluripotent stem cells (hPSCs) were differentiated using\nthree published protocols (HX protocol: Xi et al., Cell Rep, 2017; JC protocol:\nChal et al., Nat Biotechnol, 2015; MS protocol: Shelton et al., Stem Cell Rep,\n2014). Cell cultures were dissociated at indicated time points. Cells were\neither not enriched or sorted based on an endogenous PAX7-driven GFP reporter\nor cell surface combination of ERBB3+/NGFR+/HNK1- (Hicks, et al., Nat Cell\nBiol, 2018).\n</p>\n\n<p>\nPrepared single cell solutions were subjected to single cell capture and\ndroplet formation following instructions in the online Drop-seq protocol v.3.1\n(<a href="http://mccarrolllab.org/download/905/"\ntarget="_blank">http://mccarrolllab.org/download/905/</a>) and those published\nin the original Drop-seq paper (Macosko et al., Cell, 2015).\n</p>\n\n<h3>Raw read processing and computational analysis of single cell transcriptomes</h3>\n<p>\nThe raw sequencing reads were processed using the Drop-seq_tools-1.13 pipeline\nfrom the McCaroll lab\n(<a href="https://github.com/broadinstitute/Drop-seq/releases/tag/v1.13"\ntarget="_blank">https://github.com/broadinstitute/Drop-seq/releases/tag/v1.13</a>),\nfollowing the general guidelines from the Drop-seq Alignment Cookbook v1.2\n(<a href="https://github.com/broadinstitute/Drop-seq/files/2425535/Drop-seqAlignmentCookbookv1.2Jan2016.pdf"\ntarget="_blank">https://github.com/broadinstitute/Drop-seq/files/2425535/Drop-seqAlignmentCookbookv1.2Jan2016.pdf</a>)\n(Macosko et al., Cell, 2015). Indexed reads were aligned to the human reference\ngenome hg19 and digital gene expression matrices (DGEs) were generated by\ncounting the number of unique transcripts for each gene associated with each\ncell barcodes.\n</p>\n\n<p>\nDownstream computational analysis of scRNA-seq data was mainly performed using\nthe R package Seurat v2.3.3 (<a href="https://github.com/satijalab/seurat/releases/tag/v2.3.3"\ntarget="_blank">https://github.com/satijalab/seurat/releases/tag/v2.3.3</a>)\n(Butler et al., Nat Biotechnol, 2018) by largely following the standard guidelines from the Satija\nlab (<a href="https://satijalab.org/seurat/" target="_blank">https://satijalab.org/seurat/</a>).\nSeurat objects were generated with DGEs\nconstructed as described above. Violin plots of number of expressed genes and\nunique transcripts (nGene and nUMI, respectively) of each cell were generated\nand outliers with too high or too low nGene/nUMI were removed to exclude\npotential cell doublets/aggregates or low quality cells/cell debris,\nrespectively, based on each object/sample. Raw count data were normalized and\nscaled with regression on effects from cell cycle (Tirosh et al., Science,\n2016) and dissociation-associated stress response (van den Brink et al., Nat\nMethods, 2017). Highly variable genes from each dataset were obtained to\ncalculate the top variable PCs, which were further used in tSNE dimensional\nreduction and cell clustering. The skeletal muscle cell cluster from individual\ndataset was further isolated in silico and subjected to re-clustering to reveal\npotential myogenic subpopulations. Skeletal muscle stem and progenitor cells\nfrom different stage in vivo human samples were computationally purified and\nassembled to construct a trajectory of human myogenesis using diffusion map in\nSeurat. Myogenic progenitors derived from hPSCs in vitro were aligned to the\ntrajectory to determine their developmental identities.\n</p>\n
## 6 <h3>Single cell preparation, cell capture and library construction</h3>\n<p>\nWhole hindlimbs of developmental week 5-9 human embryos and fetuses (feet\nexcluded for week 7.75-9), total hindlimb skeletal muscles of week 12-18 human\nfetuses, and gastrocnemius or quadriceps muscles from juvenile and adult human\nsubjects were digested into single cells. Dissociated cells were either sorted\n(fetal week 12 and above) to exclude the hematopoietic and endothelial lineages\nor directly used for downstream processing. For directed myogenic\ndifferentiation, human pluripotent stem cells (hPSCs) were differentiated using\nthree published protocols (HX protocol: Xi et al., Cell Rep, 2017; JC protocol:\nChal et al., Nat Biotechnol, 2015; MS protocol: Shelton et al., Stem Cell Rep,\n2014). Cell cultures were dissociated at indicated time points. Cells were\neither not enriched or sorted based on an endogenous PAX7-driven GFP reporter\nor cell surface combination of ERBB3+/NGFR+/HNK1- (Hicks, et al., Nat Cell\nBiol, 2018).\n</p>\n\n<p>\nPrepared single cell solutions were subjected to single cell capture and\ndroplet formation following instructions in the online Drop-seq protocol v.3.1\n(<a href="http://mccarrolllab.org/download/905/"\ntarget="_blank">http://mccarrolllab.org/download/905/</a>) and those published\nin the original Drop-seq paper (Macosko et al., Cell, 2015).\n</p>\n\n<h3>Raw read processing and computational analysis of single cell transcriptomes</h3>\n<p>\nThe raw sequencing reads were processed using the Drop-seq_tools-1.13 pipeline\nfrom the McCaroll lab\n(<a href="https://github.com/broadinstitute/Drop-seq/releases/tag/v1.13"\ntarget="_blank">https://github.com/broadinstitute/Drop-seq/releases/tag/v1.13</a>),\nfollowing the general guidelines from the Drop-seq Alignment Cookbook v1.2\n(<a href="https://github.com/broadinstitute/Drop-seq/files/2425535/Drop-seqAlignmentCookbookv1.2Jan2016.pdf"\ntarget="_blank">https://github.com/broadinstitute/Drop-seq/files/2425535/Drop-seqAlignmentCookbookv1.2Jan2016.pdf</a>)\n(Macosko et al., Cell, 2015). Indexed reads were aligned to the human reference\ngenome hg19 and digital gene expression matrices (DGEs) were generated by\ncounting the number of unique transcripts for each gene associated with each\ncell barcodes.\n</p>\n\n<p>\nDownstream computational analysis of scRNA-seq data was mainly performed using\nthe R package Seurat v2.3.3 (<a href="https://github.com/satijalab/seurat/releases/tag/v2.3.3"\ntarget="_blank">https://github.com/satijalab/seurat/releases/tag/v2.3.3</a>)\n(Butler et al., Nat Biotechnol, 2018) by largely following the standard guidelines from the Satija\nlab (<a href="https://satijalab.org/seurat/" target="_blank">https://satijalab.org/seurat/</a>).\nSeurat objects were generated with DGEs\nconstructed as described above. Violin plots of number of expressed genes and\nunique transcripts (nGene and nUMI, respectively) of each cell were generated\nand outliers with too high or too low nGene/nUMI were removed to exclude\npotential cell doublets/aggregates or low quality cells/cell debris,\nrespectively, based on each object/sample. Raw count data were normalized and\nscaled with regression on effects from cell cycle (Tirosh et al., Science,\n2016) and dissociation-associated stress response (van den Brink et al., Nat\nMethods, 2017). Highly variable genes from each dataset were obtained to\ncalculate the top variable PCs, which were further used in tSNE dimensional\nreduction and cell clustering. The skeletal muscle cell cluster from individual\ndataset was further isolated in silico and subjected to re-clustering to reveal\npotential myogenic subpopulations. Skeletal muscle stem and progenitor cells\nfrom different stage in vivo human samples were computationally purified and\nassembled to construct a trajectory of human myogenesis using diffusion map in\nSeurat. Myogenic progenitors derived from hPSCs in vitro were aligned to the\ntrajectory to determine their developmental identities.\n</p>\n
##   geo
## 1    
## 2    
## 3    
## 4    
## 5    
## 6

Load the online datasets to Seurat

After manually check the extracted metadata, GEfetch2R provides ParseCBDatasets to load the online count matrix to Seurat. All the attributes available in ExtractCBDatasets are also same here. Please note that the loading process provided by ParseCBDatasets will load the online count matrix instead of downloading it to local. If multiple datasets available, users can choose to merge the SeuratObject with merge.

ParseCBDatasets supports extracting subset with metadata and gene:

# parse the whole datasets
hbb.sample.seu <- ParseCBDatasets(sample.df = hbb.sample.df)
# subset metadata and gene
hbb.sample.seu <- ParseCBDatasets(
  sample.df = hbb.sample.df, obs.value.filter = "Cell.Type == 'MP' & Phase == 'G2M'",
  include.genes = c(
    "PAX7", "MYF5", "C1QTNF3", "MYOD1", "MYOG", "RASSF4", "MYH3", "MYL4",
    "TNNT3", "PDGFRA", "OGN", "COL3A1"
  )
)

Show the returned SeuratObject:

hbb.sample.seu
## An object of class Seurat 
## 14 features across 5684 samples within 1 assay 
## Active assay: RNA (14 features, 0 variable features)