Introduction
There are many tools have been developed to process scRNA-seq data,
such as Scanpy,
Seurat, scran
and Monocle.
These tools have their own objects, such as Anndata
of
Scanpy
, SeuratObject
of Seurat
,
SingleCellExperiment
of scran
and
CellDataSet
/cell_data_set
of
Monocle2
/Monocle3
. There are also some file
format designed for large omics datasets, such as loom. To perform a comprehensive scRNA-seq
data analysis, we usually need to combine multiple tools, which means we
need to perform object conversion frequently.
To facilitate user analysis of scRNA-seq data,
GEfetch2R
:
-
benchmarked the format conversion tools and
provides some guides for tool selection under different
scenarios(
Anndata
->SeuratObject
,SeuratObject
toAnndata
,Anndata
toSingleCellExperiment
,SingleCellExperiment
toAnndata
) - provides multiple functions to perform format conversion
between widely used scRNA-seq objects
(
SeuratObject
,AnnData
,SingleCellExperiment
,CellDataSet/cell_data_set
andloom
)
Test data
## Setting options('download.file.method.GEOquery'='auto')
## Setting options('GEOquery.inmemory.gpl'=FALSE)
## Registered S3 method overwritten by 'SeuratDisk':
## method from
## as.sparse.H5Group Seurat
## Warning: replacing previous import 'LoomExperiment::import' by
## 'reticulate::import' when loading 'GEfetch2R'
## Registered S3 method overwritten by 'zellkonverter':
## method from
## py_to_r.pandas.core.arrays.categorical.Categorical reticulate
## Attaching SeuratObject
# library(scRNAseq) # seger
SeuratObject
:
# object
pbmc_small
## An object of class Seurat
## 230 features across 80 samples within 1 assay
## Active assay: RNA (230 features, 20 variable features)
## 2 dimensional reductions calculated: pca, tsne
# metadata
head(pbmc_small@meta.data)
## orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.8
## ATGCCAGAACGACT SeuratProject 70 47 0
## CATGGCCTGTGCAT SeuratProject 85 52 0
## GAACCTGATGAACC SeuratProject 87 50 1
## TGACTGGATTCTCA SeuratProject 127 56 0
## AGTCAGACTGCACA SeuratProject 173 53 0
## TCTGATACACGTGT SeuratProject 70 48 0
## letter.idents groups RNA_snn_res.1
## ATGCCAGAACGACT A g2 0
## CATGGCCTGTGCAT A g1 0
## GAACCTGATGAACC B g2 0
## TGACTGGATTCTCA A g2 0
## AGTCAGACTGCACA A g2 0
## TCTGATACACGTGT A g1 0
SingleCellExperiment
:
# seger <- scRNAseq::SegerstolpePancreasData()
# load from local
seger = readRDS("/Users/soyabean/Desktop/tmp/scdown/conversion/seger.rds")
seger
## class: SingleCellExperiment
## dim: 26179 3514
## metadata(0):
## assays(1): counts
## rownames(26179): SGIP1 AZIN2 ... BIVM-ERCC5 eGFP
## rowData names(2): refseq symbol
## colnames(3514): HP1502401_N13 HP1502401_D14 ... HP1526901T2D_O11
## HP1526901T2D_A8
## colData names(9): individual sex ... submitted single cell quality cell
## type
## reducedDimNames(0):
## altExpNames(1): ERCC
Convert SeuratObject to other objects
Here, we will convert SeuratObject to
SingleCellExperiment
,
CellDataSet
/cell_data_set
,
Anndata
, loom
.
SeuratObject to SingleCellExperiment
The conversion is performed with functions implemented in
Seurat
:
sce.obj <- ExportSeurat(seu.obj = pbmc_small, assay = "RNA", to = "SCE")
## Convert SeuratObject to SingleCellExperiment (suitable for scater)!
sce.obj
## class: SingleCellExperiment
## dim: 230 80
## metadata(0):
## assays(2): counts logcounts
## rownames(230): MS4A1 CD79B ... SPON2 S100B
## rowData names(5): vst.mean vst.variance vst.variance.expected
## vst.variance.standardized vst.variable
## colnames(80): ATGCCAGAACGACT CATGGCCTGTGCAT ... GGAACACTTCAGAC
## CTTGATTGATCTTC
## colData names(8): orig.ident nCount_RNA ... RNA_snn_res.1 ident
## reducedDimNames(2): PCA TSNE
## altExpNames(0):
SeuratObject to CellDataSet/cell_data_set
To CellDataSet
(The conversion is performed with
functions implemented in Seurat
):
# BiocManager::install("monocle") # reuqire monocle
cds.obj <- ExportSeurat(seu.obj = pbmc_small, assay = "RNA", reduction = "tsne", to = "CellDataSet")
## Convert SeuratObject to CellDataSet (suitable for Monocle)!
cds.obj
## CellDataSet (storageMode: environment)
## assayData: 230 features, 80 samples
## element names: exprs
## protocolData: none
## phenoData
## sampleNames: ATGCCAGAACGACT CATGGCCTGTGCAT ... CTTGATTGATCTTC (80
## total)
## varLabels: orig.ident nCount_RNA ... Size_Factor (8 total)
## varMetadata: labelDescription
## featureData
## featureNames: MS4A1 CD79B ... S100B (230 total)
## fvarLabels: vst.mean vst.variance ... gene_short_name (6 total)
## fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## Annotation:
To cell_data_set
(The conversion is performed with
functions implemented in SeuratWrappers
):
# remotes::install_github('cole-trapnell-lab/monocle3') # reuqire monocle3
cds3.obj <- ExportSeurat(seu.obj = pbmc_small, assay = "RNA", to = "cell_data_set")
cds3.obj
SeuratObject to AnnData
There are multiple tools available for format conversion from
SeuratObject
to Anndata
:
-
scDIOR
is the best method in terms of information kept and usability -
sceasy
has best performance in running time and disk usage.
# sceasy
Seu2AD(
seu.obj = pbmc_small, method = "sceasy", out.folder = "/Users/soyabean/Desktop/tmp/scdown/conversion",
assay = "RNA", slot = "counts", conda.path = "/Applications/anaconda3"
)
## AnnData object with n_obs × n_vars = 80 × 230
## obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'RNA_snn_res.0.8', 'letter.idents', 'groups', 'RNA_snn_res.1'
## var: 'vst.mean', 'vst.variance', 'vst.variance.expected', 'vst.variance.standardized', 'vst.variable'
## obsm: 'X_pca', 'X_tsne'
# # SeuratDisk
# Seu2AD(seu.obj = pbmc_small, method = "SeuratDisk", out.folder = "/Users/soyabean/Desktop/tmp/scdown/conversion",
# assay = "RNA", save.scale = TRUE)
# # scDIOR
# Seu2AD(seu.obj = pbmc_small, method = "scDIOR",
# out.folder = "/Users/soyabean/Desktop/tmp/scdown/conversion", assay = "RNA", save.scale = TRUE)
SeuratObject to loom
The conversion is performed with functions implemented in
SeuratDisk
:
ExportSeurat(
seu.obj = pbmc_small, assay = "RNA", to = "loom",
loom.file = "/Users/soyabean/Desktop/tmp/scdown/conversion/pbmc_small.loom"
)
## Convert SeuratObject to loom!
## Warning: Overwriting previous file
## /Users/soyabean/Desktop/tmp/scdown/conversion/pbmc_small.loom
## Saving data from RNA as /matrix
## Adding slot counts for assay RNA
## Adding layer counts
## Adding col attribute CellID
## Adding col attribute orig.ident
## Adding col attribute nCount_RNA
## Adding col attribute nFeature_RNA
## Adding col attribute RNA_snn_res.0.8
## Adding col attribute letter.idents
## Adding col attribute groups
## Adding col attribute RNA_snn_res.1
## Adding row attribute Gene
Convert other objects to SeuratObject
SingleCellExperiment to SeuratObject
The conversion is performed with functions implemented in
Seurat
:
seu.obj.sce <- ImportSeurat(
obj = sce.obj, from = "SCE", count.assay = "counts",
data.assay = "logcounts", assay = "RNA"
)
## Convert SingleCellExperiment to SeuratObject!
## Warning: Keys should be one or more alphanumeric characters followed by an
## underscore, setting key from PC__ to PC_
## Warning: All keys should be one or more alphanumeric characters followed by an
## underscore '_', setting key to PC_
## Warning: Keys should be one or more alphanumeric characters followed by an
## underscore, setting key from tSNE__ to tSNE_
## Warning: All keys should be one or more alphanumeric characters followed by an
## underscore '_', setting key to tSNE_
seu.obj.sce
## An object of class Seurat
## 230 features across 80 samples within 1 assay
## Active assay: RNA (230 features, 0 variable features)
## 2 dimensional reductions calculated: pca, tsne
CellDataSet/cell_data_set to SeuratObject
CellDataSet
to SeuratObject
(The conversion
is performed with functions implemented in Seurat
):
seu.obj.cds <- ImportSeurat(
obj = cds.obj, from = "CellDataSet",
count.assay = "counts", assay = "RNA"
)
## Convert CellDataSet (Monocle) to SeuratObject!
## Pulling expression data
## Building Seurat object
## Adding feature-level metadata
## No dispersion information in CellDataSet object
## No variable features present
## Adding tSNE dimensional reduction
seu.obj.cds
## An object of class Seurat
## 230 features across 80 samples within 1 assay
## Active assay: RNA (230 features, 0 variable features)
## 1 dimensional reduction calculated: tsne
cell_data_set
to SeuratObject
(The
conversion is performed with functions implemented in
Seurat
):
seu.obj.cds3 <- ImportSeurat(
obj = cds3.obj, from = "cell_data_set",
count.assay = "counts", data.assay = "logcounts", assay = "RNA"
)
seu.obj.cds3
AnnData to SeuratObject
There are multiple tools available for format conversion from
AnnData
to SeuratObject
:
-
scDIOR
is the best method in terms of information kept (GEfetch2R
integratesscDIOR
andSeuratDisk
to achieve the best performance in information kept) -
schard
is the best method in terms of usability -
schard
andsceasy
have comparable performance when cell number below 200k, butsceasy
has better performance in scalability -
sceasy
has better performance in disk usage
# sceasy
ann.sceasy <- AD2Seu(
anndata.file = "/Users/soyabean/Desktop/tmp/scdown/conversion/pbmc_small_sceasy.h5ad", method = "sceasy",
assay = "RNA", slot = "scale.data"
)
## X -> data
ann.sceasy
## An object of class Seurat
## 230 features across 80 samples within 1 assay
## Active assay: RNA (230 features, 0 variable features)
## 2 dimensional reductions calculated: pca, tsne
# # SeuratDisk
# ann.seu <- AD2Seu(anndata.file = "/Users/soyabean/Desktop/tmp/scdown/conversion/pbmc_small_sceasy.h5ad",
# method = "SeuratDisk", assay = "RNA", load.assays = c("RNA"))
# # scDIOR
# ann.scdior <- AD2Seu(anndata.file = "/Users/soyabean/Desktop/tmp/scdown/conversion/pbmc_small_sceasy.h5ad",
# method = "scDIOR", assay = "RNA")
# # schard
# ann.schard <- AD2Seu(anndata.file = "/Users/soyabean/Desktop/tmp/scdown/conversion/pbmc_small_sceasy.h5ad",
# method = "schard", assay = "RNA", use.raw = T)
# # SeuratDisk+scDIOR
# ann.seuscdior <- AD2Seu(anndata.file = "/Users/soyabean/Desktop/tmp/scdown/conversion/pbmc_small_sceasy.h5ad",
# method = "SeuratDisk+scDIOR", assay = "RNA", load.assays = c("RNA"))
loom to SeuratObject
The conversion is performed with functions implemented in
SeuratDisk
and Seurat
:
# loom will lose reduction
seu.obj.loom <- ImportSeurat(loom.file = "/Users/soyabean/Desktop/tmp/scdown/conversion/pbmc_small.loom", from = "loom")
## Convert loom to SeuratObject!
## Reading in /matrix
## Storing /matrix as counts
## Saving /matrix to assay 'RNA'
## Loading graph RNA_snn
seu.obj.loom
## An object of class Seurat
## 230 features across 80 samples within 1 assay
## Active assay: RNA (230 features, 0 variable features)
Conversion between SingleCellExperiment and AnnData
SingleCellExperiment to AnnData
There are multiple tools available for format conversion from
SingleCellExperiment
to AnnData
:
-
zellkonverter
is the best method in terms of information kept and running time -
scDIOR
is the best method in terms of usability and disk usage
# zellkonverter
SCE2AD(
sce.obj = seger, method = "zellkonverter",
out.folder = "/Users/soyabean/Desktop/tmp/scdown/conversion", slot = "counts",
conda.path = "/Applications/anaconda3"
)
## NULL
# # sceasy
# SCE2AD(sce.obj = seger, method = "sceasy", out.folder = "/Users/soyabean/Desktop/tmp/scdown/conversion",
# slot = "counts", conda.path = "/Applications/anaconda3")
#
# # scDIOR
# seger.scdior <- seger
# library(SingleCellExperiment)
# # scDIOR does not support varm in rowData
# rowData(seger.scdior)$varm <- NULL
# SCE2AD(sce.obj = seger.scdior, method = "scDIOR", out.folder = "/Users/soyabean/Desktop/tmp/scdown/conversion")
AnnData to SingleCellExperiment
There are multiple tools available for format conversion from
AnnData
to SingleCellExperiment
:
-
zellkonverter
is the best method in terms of information kept -
schard
is the best method in terms of usability and running time -
schard
andscDIOR
have comparable performance in disk usage
# zellkonverter
sce.zell <- AD2SCE(
anndata.file = "/Users/soyabean/Desktop/tmp/scdown/conversion/seger_zellkonverter.h5ad",
method = "zellkonverter", slot = "scale.data",
use.raw = TRUE, conda.path = "/Applications/anaconda3"
)
## Warning in py_to_r.pandas.core.frame.DataFrame(adata$var): index contains
## duplicated values: row names not set
## Warning: The names of these selected obs columns have been modified to match R
## conventions: 'body mass index' -> 'body.mass.index', 'clinical information' ->
## 'clinical.information', 'single cell well quality' ->
## 'single.cell.well.quality', 'submitted single cell quality' ->
## 'submitted.single.cell.quality', and 'cell type' -> 'cell.type'
sce.zell
## class: SingleCellExperiment
## dim: 26179 3514
## metadata(1): X_name
## assays(1): scale.data
## rownames(26179): SGIP1 AZIN2 ... BIVM-ERCC5 eGFP
## rowData names(2): refseq symbol
## colnames(3514): HP1502401_N13 HP1502401_D14 ... HP1526901T2D_O11
## HP1526901T2D_A8
## colData names(9): individual sex ... submitted.single.cell.quality
## cell.type
## reducedDimNames(0):
## altExpNames(0):
# # scDIOR
# sce.scdior <- AD2SCE(anndata.file = "/Users/soyabean/Desktop/tmp/scdown/conversion/seger_zellkonverter.h5ad.h5ad",
# method = "scDIOR", assay = "RNA",
# use.raw = TRUE, conda.path = "/Applications/anaconda3")
# # schard
# sce.schard <- AD2SCE(anndata.file = "/Users/soyabean/Desktop/tmp/scdown/conversion/seger_zellkonverter.h5ad.h5ad",
# method = "schard", use.raw = TRUE)
Conversion between SingleCellExperiment and loom
The conversion is performed with functions implemented in
LoomExperiment
.
SingleCellExperiment to loom
# remove seger.loom first
SCELoom(
from = "SingleCellExperiment", to = "loom", sce = seger,
loom.file = "/Users/soyabean/Desktop/tmp/scdown/conversion/seger.loom"
)
loom to SingleCellExperiment
seger.loom <- SCELoom(
from = "loom", to = "SingleCellExperiment",
loom.file = "/Users/soyabean/Desktop/tmp/scdown/conversion/seger.loom"
)
## Convert loom to SingleCellExperiment.
seger.loom
## class: SingleCellExperiment
## dim: 26179 3514
## metadata(0):
## assays(1): counts
## rownames(26179): SGIP1 AZIN2 ... BIVM-ERCC5.1 eGFP
## rowData names(2): refseq symbol
## colnames(3514): HP1502401_N13 HP1502401_D14 ... HP1526901T2D_O11
## HP1526901T2D_A8
## colData names(8): Source.Name age ... sex single.cell.well.quality
## reducedDimNames(0):
## altExpNames(0):