Skip to contents

Introduction

There are many tools have been developed to process scRNA-seq data, such as Scanpy, Seurat, scran and Monocle. These tools have their own objects, such as Anndata of Scanpy, SeuratObject of Seurat, SingleCellExperiment of scran and CellDataSet/cell_data_set of Monocle2/Monocle3. There are also some file format designed for large omics datasets, such as loom. To perform a comprehensive scRNA-seq data analysis, we usually need to combine multiple tools, which means we need to perform object conversion frequently. To facilitate user analysis of scRNA-seq data, scfetch provides multiple functions to perform object conversion between widely used tools and formats. The object conversion implemented in scfetch has two main advantages:

  • one-step conversion between different objects. There will be no conversion to intermediate objects, thus preventing unnecessary information loss.
  • tools used for object conversion are developed by the team of the source/destination object as far as possible. For example, we use SeuratDisk to convert SeuratObject to loom, use zellkonverter to perform conversion between SingleCellExperiment and Anndata. When there is no such tools, we use sceasy to perform conversion.

Test data

# library
library(scfetch)
## Setting options('download.file.method.GEOquery'='auto')
## Setting options('GEOquery.inmemory.gpl'=FALSE)
## Registered S3 method overwritten by 'SeuratDisk':
##   method            from  
##   as.sparse.H5Group Seurat
library(Seurat) # pbmc_small
## Attaching SeuratObject
library(scRNAseq) # seger
## Loading required package: SingleCellExperiment
## Loading required package: SummarizedExperiment
## Loading required package: MatrixGenerics
## Loading required package: matrixStats
## 
## Attaching package: 'MatrixGenerics'
## The following objects are masked from 'package:matrixStats':
## 
##     colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
##     colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
##     colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
##     colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
##     colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
##     colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
##     colWeightedMeans, colWeightedMedians, colWeightedSds,
##     colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
##     rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
##     rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
##     rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
##     rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
##     rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
##     rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
##     rowWeightedSds, rowWeightedVars
## Loading required package: GenomicRanges
## Loading required package: stats4
## Loading required package: BiocGenerics
## Warning: package 'BiocGenerics' was built under R version 4.0.5
## Loading required package: parallel
## 
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:parallel':
## 
##     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
##     clusterExport, clusterMap, parApply, parCapply, parLapply,
##     parLapplyLB, parRapply, parSapply, parSapplyLB
## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
## 
##     anyDuplicated, append, as.data.frame, basename, cbind, colnames,
##     dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
##     grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
##     order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
##     rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
##     union, unique, unsplit, which.max, which.min
## Loading required package: S4Vectors
## 
## Attaching package: 'S4Vectors'
## The following object is masked from 'package:base':
## 
##     expand.grid
## Loading required package: IRanges
## Loading required package: GenomeInfoDb
## Warning: package 'GenomeInfoDb' was built under R version 4.0.5
## Loading required package: Biobase
## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.
## 
## Attaching package: 'Biobase'
## The following object is masked from 'package:MatrixGenerics':
## 
##     rowMedians
## The following objects are masked from 'package:matrixStats':
## 
##     anyMissing, rowMedians
## 
## Attaching package: 'SummarizedExperiment'
## The following object is masked from 'package:SeuratObject':
## 
##     Assays
## The following object is masked from 'package:Seurat':
## 
##     Assays
## Possible Ensembl SSL connectivity problems detected.
## Please see the 'Connection Troubleshooting' section of the biomaRt vignette
## vignette('accessing_ensembl', package = 'biomaRt')Error in curl::curl_fetch_memory(url, handle = handle) : 
##   SSL peer certificate or SSH remote key was not OK: [uswest.ensembl.org] SSL certificate problem: certificate has expired

SeuratObject:

# object
pbmc_small
## An object of class Seurat 
## 230 features across 80 samples within 1 assay 
## Active assay: RNA (230 features, 20 variable features)
##  2 dimensional reductions calculated: pca, tsne
# metadata
head(pbmc_small@meta.data)
##                   orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.8
## ATGCCAGAACGACT SeuratProject         70           47               0
## CATGGCCTGTGCAT SeuratProject         85           52               0
## GAACCTGATGAACC SeuratProject         87           50               1
## TGACTGGATTCTCA SeuratProject        127           56               0
## AGTCAGACTGCACA SeuratProject        173           53               0
## TCTGATACACGTGT SeuratProject         70           48               0
##                letter.idents groups RNA_snn_res.1
## ATGCCAGAACGACT             A     g2             0
## CATGGCCTGTGCAT             A     g1             0
## GAACCTGATGAACC             B     g2             0
## TGACTGGATTCTCA             A     g2             0
## AGTCAGACTGCACA             A     g2             0
## TCTGATACACGTGT             A     g1             0

SingleCellExperiment:

seger <- scRNAseq::SegerstolpePancreasData()
## snapshotDate(): 2020-10-27
## see ?scRNAseq and browseVignettes('scRNAseq') for documentation
## loading from cache
## see ?scRNAseq and browseVignettes('scRNAseq') for documentation
## loading from cache
## see ?scRNAseq and browseVignettes('scRNAseq') for documentation
## loading from cache
## snapshotDate(): 2020-10-27
## see ?scRNAseq and browseVignettes('scRNAseq') for documentation
## loading from cache
seger
## class: SingleCellExperiment 
## dim: 26179 3514 
## metadata(0):
## assays(1): counts
## rownames(26179): SGIP1 AZIN2 ... BIVM-ERCC5 eGFP
## rowData names(2): symbol refseq
## colnames(3514): HP1502401_N13 HP1502401_D14 ... HP1526901T2D_O11
##   HP1526901T2D_A8
## colData names(8): Source Name individual ... age body mass index
## reducedDimNames(0):
## altExpNames(1): ERCC

Convert SeuratObject to other objects

Here, we will convert SeuratObject to SingleCellExperiment, CellDataSet/cell_data_set, Anndata, loom.

SeuratObject to SingleCellExperiment

The conversion is performed with functions implemented in Seurat:

sce.obj <- ExportSeurat(seu.obj = pbmc_small, assay = "RNA", to = "SCE")
## Convert SeuratObject to SingleCellExperiment (suitable for scater)!
sce.obj
## class: SingleCellExperiment 
## dim: 230 80 
## metadata(0):
## assays(2): counts logcounts
## rownames(230): MS4A1 CD79B ... SPON2 S100B
## rowData names(5): vst.mean vst.variance vst.variance.expected
##   vst.variance.standardized vst.variable
## colnames(80): ATGCCAGAACGACT CATGGCCTGTGCAT ... GGAACACTTCAGAC
##   CTTGATTGATCTTC
## colData names(8): orig.ident nCount_RNA ... RNA_snn_res.1 ident
## reducedDimNames(2): PCA TSNE
## altExpNames(0):

SeuratObject to CellDataSet/cell_data_set

To CellDataSet (The conversion is performed with functions implemented in Seurat):

# BiocManager::install("monocle") # reuqire monocle
cds.obj <- ExportSeurat(seu.obj = pbmc_small, assay = "RNA", reduction = "tsne", to = "CellDataSet")
## Convert SeuratObject to CellDataSet (suitable for Monocle)!
cds.obj
## CellDataSet (storageMode: environment)
## assayData: 230 features, 80 samples 
##   element names: exprs 
## protocolData: none
## phenoData
##   sampleNames: ATGCCAGAACGACT CATGGCCTGTGCAT ... CTTGATTGATCTTC (80
##     total)
##   varLabels: orig.ident nCount_RNA ... Size_Factor (8 total)
##   varMetadata: labelDescription
## featureData
##   featureNames: MS4A1 CD79B ... S100B (230 total)
##   fvarLabels: vst.mean vst.variance ... gene_short_name (6 total)
##   fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## Annotation:

To cell_data_set (The conversion is performed with functions implemented in SeuratWrappers):

# remotes::install_github('cole-trapnell-lab/monocle3') # reuqire monocle3
cds3.obj <- ExportSeurat(seu.obj = pbmc_small, assay = "RNA", to = "cell_data_set")
## Convert SeuratObject to cell_data_set (suitable for Monocle3)!
## Warning: Monocle 3 trajectories require cluster partitions, which Seurat does
## not calculate. Please run 'cluster_cells' on your cell_data_set object
cds3.obj
## class: cell_data_set 
## dim: 230 80 
## metadata(0):
## assays(2): counts logcounts
## rownames(230): MS4A1 CD79B ... SPON2 S100B
## rowData names(5): vst.mean vst.variance vst.variance.expected
##   vst.variance.standardized vst.variable
## colnames(80): ATGCCAGAACGACT CATGGCCTGTGCAT ... GGAACACTTCAGAC
##   CTTGATTGATCTTC
## colData names(9): orig.ident nCount_RNA ... ident Size_Factor
## reducedDimNames(2): PCA TSNE
## altExpNames(0):

SeuratObject to AnnData

AnnData is a Python object, reticulate is used to communicate between Python and R. User should create a Python environment which contains anndata package and specify the environment path with conda.path to ensure the exact usage of this environment.

The conversion is performed with functions implemented in sceasy:

# remove pbmc_small.h5ad first
ExportSeurat(
  seu.obj = pbmc_small, assay = "RNA", to = "AnnData", conda.path = "/Applications/anaconda3",
  anndata.file = "/Users/soyabean/Desktop/tmp/scdown/conversion/pbmc_small.h5ad"
)

SeuratObject to loom

The conversion is performed with functions implemented in SeuratDisk:

ExportSeurat(
  seu.obj = pbmc_small, assay = "RNA", to = "loom",
  loom.file = "/Users/soyabean/Desktop/tmp/scdown/conversion/pbmc_small.loom"
)

Convert other objects to SeuratObject

SingleCellExperiment to SeuratObject

The conversion is performed with functions implemented in Seurat:

seu.obj.sce <- ImportSeurat(obj = sce.obj, from = "SCE", count.assay = "counts", data.assay = "logcounts", assay = "RNA")
## Convert SingleCellExperiment to SeuratObject!
## Warning: Keys should be one or more alphanumeric characters followed by an
## underscore, setting key from PC__ to PC_
## Warning: All keys should be one or more alphanumeric characters followed by an
## underscore '_', setting key to PC_
## Warning: Keys should be one or more alphanumeric characters followed by an
## underscore, setting key from tSNE__ to tSNE_
## Warning: All keys should be one or more alphanumeric characters followed by an
## underscore '_', setting key to tSNE_
seu.obj.sce
## An object of class Seurat 
## 230 features across 80 samples within 1 assay 
## Active assay: RNA (230 features, 0 variable features)
##  2 dimensional reductions calculated: pca, tsne

CellDataSet/cell_data_set to SeuratObject

CellDataSet to SeuratObject (The conversion is performed with functions implemented in Seurat):

seu.obj.cds <- ImportSeurat(obj = cds.obj, from = "CellDataSet", count.assay = "counts", assay = "RNA")
## Convert CellDataSet (Monocle) to SeuratObject!
## Pulling expression data
## Building Seurat object
## Adding feature-level metadata
## No dispersion information in CellDataSet object
## No variable features present
## Adding tSNE dimensional reduction
seu.obj.cds
## An object of class Seurat 
## 230 features across 80 samples within 1 assay 
## Active assay: RNA (230 features, 0 variable features)
##  1 dimensional reduction calculated: tsne

cell_data_set to SeuratObject (The conversion is performed with functions implemented in Seurat):

seu.obj.cds3 <- ImportSeurat(obj = cds3.obj, from = "cell_data_set", count.assay = "counts", data.assay = "logcounts", assay = "RNA")
## Convert cell_data_set (Monocle3) to SeuratObject!
seu.obj.cds3
## An object of class Seurat 
## 230 features across 80 samples within 1 assay 
## Active assay: RNA (230 features, 0 variable features)
##  2 dimensional reductions calculated: pca, tsne

AnnData to SeuratObject

AnnData is a Python object, reticulate is used to communicate between Python and R. User should create a Python environment which contains anndata package and specify the environment path with conda.path to ensure the exact usage of this environment.

The conversion is performed with functions implemented in sceasy:

seu.obj.h5ad <- ImportSeurat(
  anndata.file = "/Users/soyabean/Desktop/tmp/scdown/conversion/pbmc_small.h5ad",
  from = "AnnData", assay = "RNA", conda.path = "/Applications/anaconda3"
)
## Convert AnnData to SeuratObject!
## X -> counts
seu.obj.h5ad
## An object of class Seurat 
## 230 features across 80 samples within 1 assay 
## Active assay: RNA (230 features, 0 variable features)
##  2 dimensional reductions calculated: pca, tsne

loom to SeuratObject

The conversion is performed with functions implemented in SeuratDisk and Seurat:

# loom will lose reduction
seu.obj.loom <- ImportSeurat(loom.file = "/Users/soyabean/Desktop/tmp/scdown/conversion/pbmc_small.loom", from = "loom")
## Convert loom to SeuratObject!
## Reading in /matrix
## Storing /matrix as counts
## Saving /matrix to assay 'RNA'
## Loading graph RNA_snn
seu.obj.loom
## An object of class Seurat 
## 230 features across 80 samples within 1 assay 
## Active assay: RNA (230 features, 0 variable features)

Conversion between SingleCellExperiment and AnnData

The conversion is performed with functions implemented in zellkonverter.

SingleCellExperiment to AnnData

# remove seger.h5ad first
SCEAnnData(
  from = "SingleCellExperiment", to = "AnnData", sce = seger, X_name = "counts",
  anndata.file = "/Users/soyabean/Desktop/tmp/scdown/conversion/seger.h5ad"
)

AnnData to SingleCellExperiment

seger.anndata <- SCEAnnData(
  from = "AnnData", to = "SingleCellExperiment",
  anndata.file = "/Users/soyabean/Desktop/tmp/scdown/conversion/seger.h5ad"
)
## Convert AnnData to SingleCellExperiment.
seger.anndata
## class: SingleCellExperiment 
## dim: 26179 3514 
## metadata(0):
## assays(1): counts
## rownames(26179): SGIP1 AZIN2 ... BIVM-ERCC5 eGFP
## rowData names(2): symbol refseq
## colnames(3514): HP1502401_N13 HP1502401_D14 ... HP1526901T2D_O11
##   HP1526901T2D_A8
## colData names(8): Source Name individual ... age body mass index
## reducedDimNames(0):
## altExpNames(0):

Conversion between SingleCellExperiment and loom

The conversion is performed with functions implemented in LoomExperiment.

SingleCellExperiment to loom

# remove seger.loom first
SCELoom(
  from = "SingleCellExperiment", to = "loom", sce = seger,
  loom.file = "/Users/soyabean/Desktop/tmp/scdown/conversion/seger.loom"
)

loom to SingleCellExperiment

seger.loom <- SCELoom(
  from = "loom", to = "SingleCellExperiment",
  loom.file = "/Users/soyabean/Desktop/tmp/scdown/conversion/seger.loom"
)
## Convert loom to SingleCellExperiment.
seger.loom
## class: SingleCellExperiment 
## dim: 26179 3514 
## metadata(0):
## assays(1): counts
## rownames(26179): SGIP1 AZIN2 ... BIVM-ERCC5.1 eGFP
## rowData names(2): refseq symbol
## colnames(3514): HP1502401_N13 HP1502401_D14 ... HP1526901T2D_O11
##   HP1526901T2D_A8
## colData names(8): Source.Name age ... sex single.cell.well.quality
## reducedDimNames(0):
## altExpNames(0):