Skip to contents

Introduction

There are many tools have been developed to process scRNA-seq data, such as Scanpy, Seurat, scran and Monocle. These tools have their own objects, such as Anndata of Scanpy, SeuratObject of Seurat, SingleCellExperiment of scran and CellDataSet/cell_data_set of Monocle2/Monocle3. There are also some file format designed for large omics datasets, such as loom. To perform a comprehensive scRNA-seq data analysis, we usually need to combine multiple tools, which means we need to perform object conversion frequently.

To facilitate user analysis of scRNA-seq data, GEfetch2R:

  • benchmarked the format conversion tools and provides some guides for tool selection under different scenarios(Anndata -> SeuratObject, SeuratObject to Anndata, Anndata to SingleCellExperiment, SingleCellExperiment to Anndata)
  • provides multiple functions to perform format conversion between widely used scRNA-seq objects (SeuratObject, AnnData, SingleCellExperiment, CellDataSet/cell_data_set and loom)

Test data

# library
library(GEfetch2R)
## Setting options('download.file.method.GEOquery'='auto')
## Setting options('GEOquery.inmemory.gpl'=FALSE)
## Registered S3 method overwritten by 'SeuratDisk':
##   method            from  
##   as.sparse.H5Group Seurat
## Warning: replacing previous import 'LoomExperiment::import' by
## 'reticulate::import' when loading 'GEfetch2R'
## Registered S3 method overwritten by 'zellkonverter':
##   method                                             from      
##   py_to_r.pandas.core.arrays.categorical.Categorical reticulate
library(Seurat) # pbmc_small
## Attaching SeuratObject
# library(scRNAseq) # seger

SeuratObject:

# object
pbmc_small
## An object of class Seurat 
## 230 features across 80 samples within 1 assay 
## Active assay: RNA (230 features, 20 variable features)
##  2 dimensional reductions calculated: pca, tsne
# metadata
head(pbmc_small@meta.data)
##                   orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.8
## ATGCCAGAACGACT SeuratProject         70           47               0
## CATGGCCTGTGCAT SeuratProject         85           52               0
## GAACCTGATGAACC SeuratProject         87           50               1
## TGACTGGATTCTCA SeuratProject        127           56               0
## AGTCAGACTGCACA SeuratProject        173           53               0
## TCTGATACACGTGT SeuratProject         70           48               0
##                letter.idents groups RNA_snn_res.1
## ATGCCAGAACGACT             A     g2             0
## CATGGCCTGTGCAT             A     g1             0
## GAACCTGATGAACC             B     g2             0
## TGACTGGATTCTCA             A     g2             0
## AGTCAGACTGCACA             A     g2             0
## TCTGATACACGTGT             A     g1             0

SingleCellExperiment:

# seger <- scRNAseq::SegerstolpePancreasData()
# load from local
seger = readRDS("/Users/soyabean/Desktop/tmp/scdown/conversion/seger.rds")
seger
## class: SingleCellExperiment 
## dim: 26179 3514 
## metadata(0):
## assays(1): counts
## rownames(26179): SGIP1 AZIN2 ... BIVM-ERCC5 eGFP
## rowData names(2): refseq symbol
## colnames(3514): HP1502401_N13 HP1502401_D14 ... HP1526901T2D_O11
##   HP1526901T2D_A8
## colData names(9): individual sex ... submitted single cell quality cell
##   type
## reducedDimNames(0):
## altExpNames(1): ERCC

Convert SeuratObject to other objects

Here, we will convert SeuratObject to SingleCellExperiment, CellDataSet/cell_data_set, Anndata, loom.

SeuratObject to SingleCellExperiment

The conversion is performed with functions implemented in Seurat:

sce.obj <- ExportSeurat(seu.obj = pbmc_small, assay = "RNA", to = "SCE")
## Convert SeuratObject to SingleCellExperiment (suitable for scater)!
sce.obj
## class: SingleCellExperiment 
## dim: 230 80 
## metadata(0):
## assays(2): counts logcounts
## rownames(230): MS4A1 CD79B ... SPON2 S100B
## rowData names(5): vst.mean vst.variance vst.variance.expected
##   vst.variance.standardized vst.variable
## colnames(80): ATGCCAGAACGACT CATGGCCTGTGCAT ... GGAACACTTCAGAC
##   CTTGATTGATCTTC
## colData names(8): orig.ident nCount_RNA ... RNA_snn_res.1 ident
## reducedDimNames(2): PCA TSNE
## altExpNames(0):

SeuratObject to CellDataSet/cell_data_set

To CellDataSet (The conversion is performed with functions implemented in Seurat):

# BiocManager::install("monocle") # reuqire monocle
cds.obj <- ExportSeurat(seu.obj = pbmc_small, assay = "RNA", reduction = "tsne", to = "CellDataSet")
## Convert SeuratObject to CellDataSet (suitable for Monocle)!
cds.obj
## CellDataSet (storageMode: environment)
## assayData: 230 features, 80 samples 
##   element names: exprs 
## protocolData: none
## phenoData
##   sampleNames: ATGCCAGAACGACT CATGGCCTGTGCAT ... CTTGATTGATCTTC (80
##     total)
##   varLabels: orig.ident nCount_RNA ... Size_Factor (8 total)
##   varMetadata: labelDescription
## featureData
##   featureNames: MS4A1 CD79B ... S100B (230 total)
##   fvarLabels: vst.mean vst.variance ... gene_short_name (6 total)
##   fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## Annotation:

To cell_data_set (The conversion is performed with functions implemented in SeuratWrappers):

# remotes::install_github('cole-trapnell-lab/monocle3') # reuqire monocle3
cds3.obj <- ExportSeurat(seu.obj = pbmc_small, assay = "RNA", to = "cell_data_set")
cds3.obj

SeuratObject to AnnData

There are multiple tools available for format conversion from SeuratObject to Anndata:

  • scDIOR is the best method in terms of information kept and usability
  • sceasy has best performance in running time and disk usage.
# sceasy
Seu2AD(
  seu.obj = pbmc_small, method = "sceasy", out.folder = "/Users/soyabean/Desktop/tmp/scdown/conversion",
  assay = "RNA", slot = "counts", conda.path = "/Applications/anaconda3"
)
## AnnData object with n_obs × n_vars = 80 × 230
##     obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'RNA_snn_res.0.8', 'letter.idents', 'groups', 'RNA_snn_res.1'
##     var: 'vst.mean', 'vst.variance', 'vst.variance.expected', 'vst.variance.standardized', 'vst.variable'
##     obsm: 'X_pca', 'X_tsne'
# # SeuratDisk
# Seu2AD(seu.obj = pbmc_small, method = "SeuratDisk", out.folder = "/Users/soyabean/Desktop/tmp/scdown/conversion",
#        assay = "RNA", save.scale = TRUE)
# # scDIOR
# Seu2AD(seu.obj = pbmc_small, method = "scDIOR",
#        out.folder = "/Users/soyabean/Desktop/tmp/scdown/conversion", assay = "RNA", save.scale = TRUE)

SeuratObject to loom

The conversion is performed with functions implemented in SeuratDisk:

ExportSeurat(
  seu.obj = pbmc_small, assay = "RNA", to = "loom",
  loom.file = "/Users/soyabean/Desktop/tmp/scdown/conversion/pbmc_small.loom"
)
## Convert SeuratObject to loom!
## Warning: Overwriting previous file
## /Users/soyabean/Desktop/tmp/scdown/conversion/pbmc_small.loom
## Saving data from RNA as /matrix
## Adding slot counts for assay RNA
## Adding layer counts
## Adding col attribute CellID
## Adding col attribute orig.ident
## Adding col attribute nCount_RNA
## Adding col attribute nFeature_RNA
## Adding col attribute RNA_snn_res.0.8
## Adding col attribute letter.idents
## Adding col attribute groups
## Adding col attribute RNA_snn_res.1
## Adding row attribute Gene

Convert other objects to SeuratObject

SingleCellExperiment to SeuratObject

The conversion is performed with functions implemented in Seurat:

seu.obj.sce <- ImportSeurat(
  obj = sce.obj, from = "SCE", count.assay = "counts",
  data.assay = "logcounts", assay = "RNA"
)
## Convert SingleCellExperiment to SeuratObject!
## Warning: Keys should be one or more alphanumeric characters followed by an
## underscore, setting key from PC__ to PC_
## Warning: All keys should be one or more alphanumeric characters followed by an
## underscore '_', setting key to PC_
## Warning: Keys should be one or more alphanumeric characters followed by an
## underscore, setting key from tSNE__ to tSNE_
## Warning: All keys should be one or more alphanumeric characters followed by an
## underscore '_', setting key to tSNE_
seu.obj.sce
## An object of class Seurat 
## 230 features across 80 samples within 1 assay 
## Active assay: RNA (230 features, 0 variable features)
##  2 dimensional reductions calculated: pca, tsne

CellDataSet/cell_data_set to SeuratObject

CellDataSet to SeuratObject (The conversion is performed with functions implemented in Seurat):

seu.obj.cds <- ImportSeurat(
  obj = cds.obj, from = "CellDataSet",
  count.assay = "counts", assay = "RNA"
)
## Convert CellDataSet (Monocle) to SeuratObject!
## Pulling expression data
## Building Seurat object
## Adding feature-level metadata
## No dispersion information in CellDataSet object
## No variable features present
## Adding tSNE dimensional reduction
seu.obj.cds
## An object of class Seurat 
## 230 features across 80 samples within 1 assay 
## Active assay: RNA (230 features, 0 variable features)
##  1 dimensional reduction calculated: tsne

cell_data_set to SeuratObject (The conversion is performed with functions implemented in Seurat):

seu.obj.cds3 <- ImportSeurat(
  obj = cds3.obj, from = "cell_data_set",
  count.assay = "counts", data.assay = "logcounts", assay = "RNA"
)
seu.obj.cds3

AnnData to SeuratObject

There are multiple tools available for format conversion from AnnData to SeuratObject:

  • scDIOR is the best method in terms of information kept (GEfetch2R integrates scDIOR and SeuratDisk to achieve the best performance in information kept)
  • schard is the best method in terms of usability
  • schard and sceasy have comparable performance when cell number below 200k, but sceasy has better performance in scalability
  • sceasy has better performance in disk usage
# sceasy
ann.sceasy <- AD2Seu(
  anndata.file = "/Users/soyabean/Desktop/tmp/scdown/conversion/pbmc_small_sceasy.h5ad", method = "sceasy",
  assay = "RNA", slot = "scale.data"
)
## X -> data
ann.sceasy
## An object of class Seurat 
## 230 features across 80 samples within 1 assay 
## Active assay: RNA (230 features, 0 variable features)
##  2 dimensional reductions calculated: pca, tsne
# # SeuratDisk
# ann.seu <- AD2Seu(anndata.file = "/Users/soyabean/Desktop/tmp/scdown/conversion/pbmc_small_sceasy.h5ad",
#                   method = "SeuratDisk", assay = "RNA", load.assays = c("RNA"))
# # scDIOR
# ann.scdior <- AD2Seu(anndata.file = "/Users/soyabean/Desktop/tmp/scdown/conversion/pbmc_small_sceasy.h5ad",
#                      method = "scDIOR", assay = "RNA")
# # schard
# ann.schard <- AD2Seu(anndata.file = "/Users/soyabean/Desktop/tmp/scdown/conversion/pbmc_small_sceasy.h5ad",
#                      method = "schard", assay = "RNA", use.raw = T)
# # SeuratDisk+scDIOR
# ann.seuscdior <- AD2Seu(anndata.file = "/Users/soyabean/Desktop/tmp/scdown/conversion/pbmc_small_sceasy.h5ad",
#                         method = "SeuratDisk+scDIOR", assay = "RNA", load.assays = c("RNA"))

loom to SeuratObject

The conversion is performed with functions implemented in SeuratDisk and Seurat:

# loom will lose reduction
seu.obj.loom <- ImportSeurat(loom.file = "/Users/soyabean/Desktop/tmp/scdown/conversion/pbmc_small.loom", from = "loom")
## Convert loom to SeuratObject!
## Reading in /matrix
## Storing /matrix as counts
## Saving /matrix to assay 'RNA'
## Loading graph RNA_snn
seu.obj.loom
## An object of class Seurat 
## 230 features across 80 samples within 1 assay 
## Active assay: RNA (230 features, 0 variable features)

Conversion between SingleCellExperiment and AnnData

SingleCellExperiment to AnnData

There are multiple tools available for format conversion from SingleCellExperiment to AnnData:

  • zellkonverter is the best method in terms of information kept and running time
  • scDIOR is the best method in terms of usability and disk usage
# zellkonverter
SCE2AD(
  sce.obj = seger, method = "zellkonverter",
  out.folder = "/Users/soyabean/Desktop/tmp/scdown/conversion", slot = "counts",
  conda.path = "/Applications/anaconda3"
)
## NULL
# # sceasy
# SCE2AD(sce.obj = seger, method = "sceasy", out.folder = "/Users/soyabean/Desktop/tmp/scdown/conversion",
#        slot = "counts", conda.path = "/Applications/anaconda3")
#
# # scDIOR
# seger.scdior <- seger
# library(SingleCellExperiment)
# # scDIOR does not support varm in rowData
# rowData(seger.scdior)$varm <- NULL
# SCE2AD(sce.obj = seger.scdior, method = "scDIOR", out.folder = "/Users/soyabean/Desktop/tmp/scdown/conversion")

AnnData to SingleCellExperiment

There are multiple tools available for format conversion from AnnData to SingleCellExperiment:

  • zellkonverter is the best method in terms of information kept
  • schard is the best method in terms of usability and running time
  • schard and scDIOR have comparable performance in disk usage
# zellkonverter
sce.zell <- AD2SCE(
  anndata.file = "/Users/soyabean/Desktop/tmp/scdown/conversion/seger_zellkonverter.h5ad",
  method = "zellkonverter", slot = "scale.data",
  use.raw = TRUE, conda.path = "/Applications/anaconda3"
)
## Warning in py_to_r.pandas.core.frame.DataFrame(adata$var): index contains
## duplicated values: row names not set
## Warning: The names of these selected obs columns have been modified to match R
## conventions: 'body mass index' -> 'body.mass.index', 'clinical information' ->
## 'clinical.information', 'single cell well quality' ->
## 'single.cell.well.quality', 'submitted single cell quality' ->
## 'submitted.single.cell.quality', and 'cell type' -> 'cell.type'
sce.zell
## class: SingleCellExperiment 
## dim: 26179 3514 
## metadata(1): X_name
## assays(1): scale.data
## rownames(26179): SGIP1 AZIN2 ... BIVM-ERCC5 eGFP
## rowData names(2): refseq symbol
## colnames(3514): HP1502401_N13 HP1502401_D14 ... HP1526901T2D_O11
##   HP1526901T2D_A8
## colData names(9): individual sex ... submitted.single.cell.quality
##   cell.type
## reducedDimNames(0):
## altExpNames(0):
# # scDIOR
# sce.scdior <- AD2SCE(anndata.file = "/Users/soyabean/Desktop/tmp/scdown/conversion/seger_zellkonverter.h5ad.h5ad",
#                      method = "scDIOR", assay = "RNA",
#                      use.raw = TRUE, conda.path = "/Applications/anaconda3")
# # schard
# sce.schard <- AD2SCE(anndata.file = "/Users/soyabean/Desktop/tmp/scdown/conversion/seger_zellkonverter.h5ad.h5ad",
#                      method = "schard", use.raw = TRUE)

Conversion between SingleCellExperiment and loom

The conversion is performed with functions implemented in LoomExperiment.

SingleCellExperiment to loom

# remove seger.loom first
SCELoom(
  from = "SingleCellExperiment", to = "loom", sce = seger,
  loom.file = "/Users/soyabean/Desktop/tmp/scdown/conversion/seger.loom"
)

loom to SingleCellExperiment

seger.loom <- SCELoom(
  from = "loom", to = "SingleCellExperiment",
  loom.file = "/Users/soyabean/Desktop/tmp/scdown/conversion/seger.loom"
)
## Convert loom to SingleCellExperiment.
seger.loom
## class: SingleCellExperiment 
## dim: 26179 3514 
## metadata(0):
## assays(1): counts
## rownames(26179): SGIP1 AZIN2 ... BIVM-ERCC5.1 eGFP
## rowData names(2): refseq symbol
## colnames(3514): HP1502401_N13 HP1502401_D14 ... HP1526901T2D_O11
##   HP1526901T2D_A8
## colData names(8): Source.Name age ... sex single.cell.well.quality
## reducedDimNames(0):
## altExpNames(0):