ObjectConversion

Introduction

There are many tools have been developed to process scRNA-seq data, such as Scanpy, Seurat, scran and Monocle. These tools have their own objects, such as Anndata of Scanpy, SeuratObject of Seurat, SingleCellExperiment of scran and CellDataSet/cell_data_set of Monocle2/Monocle3. There are also some file format designed for large omics datasets, such as loom. To perform a comprehensive scRNA-seq data analysis, we usually need to combine multiple tools, which means we need to perform object conversion frequently.

To facilitate user analysis of scRNA-seq data, GEfetch2R:

benchmarked the format conversion tools and provides some guides for tool selection under different scenarios(Anndata -> SeuratObject, SeuratObject to Anndata, Anndata to SingleCellExperiment, SingleCellExperiment to Anndata)
provides multiple functions to perform format conversion between widely used scRNA-seq objects (SeuratObject, AnnData, SingleCellExperiment, CellDataSet/cell_data_set and loom)

Test data

# library
library(GEfetch2R)

## Setting options('download.file.method.GEOquery'='auto')

## Setting options('GEOquery.inmemory.gpl'=FALSE)

## Registered S3 method overwritten by 'SeuratDisk':
##   method            from  
##   as.sparse.H5Group Seurat

## Warning: replacing previous import 'LoomExperiment::import' by
## 'reticulate::import' when loading 'GEfetch2R'

## Registered S3 method overwritten by 'zellkonverter':
##   method                                             from      
##   py_to_r.pandas.core.arrays.categorical.Categorical reticulate

library(Seurat) # pbmc_small

## Attaching SeuratObject

# library(scRNAseq) # seger

SeuratObject:

# object
pbmc_small

## An object of class Seurat 
## 230 features across 80 samples within 1 assay 
## Active assay: RNA (230 features, 20 variable features)
##  2 dimensional reductions calculated: pca, tsne

# metadata
head(pbmc_small@meta.data)

##                   orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.8
## ATGCCAGAACGACT SeuratProject         70           47               0
## CATGGCCTGTGCAT SeuratProject         85           52               0
## GAACCTGATGAACC SeuratProject         87           50               1
## TGACTGGATTCTCA SeuratProject        127           56               0
## AGTCAGACTGCACA SeuratProject        173           53               0
## TCTGATACACGTGT SeuratProject         70           48               0
##                letter.idents groups RNA_snn_res.1
## ATGCCAGAACGACT             A     g2             0
## CATGGCCTGTGCAT             A     g1             0
## GAACCTGATGAACC             B     g2             0
## TGACTGGATTCTCA             A     g2             0
## AGTCAGACTGCACA             A     g2             0
## TCTGATACACGTGT             A     g1             0

SingleCellExperiment:

# seger <- scRNAseq::SegerstolpePancreasData()
# load from local
seger = readRDS("/Users/soyabean/Desktop/tmp/scdown/conversion/seger.rds")
seger

## class: SingleCellExperiment 
## dim: 26179 3514 
## metadata(0):
## assays(1): counts
## rownames(26179): SGIP1 AZIN2 ... BIVM-ERCC5 eGFP
## rowData names(2): refseq symbol
## colnames(3514): HP1502401_N13 HP1502401_D14 ... HP1526901T2D_O11
##   HP1526901T2D_A8
## colData names(9): individual sex ... submitted single cell quality cell
##   type
## reducedDimNames(0):
## altExpNames(1): ERCC

Convert SeuratObject to other objects

Here, we will convert SeuratObject to SingleCellExperiment, CellDataSet/cell_data_set, Anndata, loom.

SeuratObject to SingleCellExperiment

The conversion is performed with functions implemented in Seurat:

sce.obj <- ExportSeurat(seu.obj = pbmc_small, assay = "RNA", to = "SCE")

## Convert SeuratObject to SingleCellExperiment (suitable for scater)!

sce.obj

## class: SingleCellExperiment 
## dim: 230 80 
## metadata(0):
## assays(2): counts logcounts
## rownames(230): MS4A1 CD79B ... SPON2 S100B
## rowData names(5): vst.mean vst.variance vst.variance.expected
##   vst.variance.standardized vst.variable
## colnames(80): ATGCCAGAACGACT CATGGCCTGTGCAT ... GGAACACTTCAGAC
##   CTTGATTGATCTTC
## colData names(8): orig.ident nCount_RNA ... RNA_snn_res.1 ident
## reducedDimNames(2): PCA TSNE
## altExpNames(0):

SeuratObject to CellDataSet/cell_data_set

To CellDataSet (The conversion is performed with functions implemented in Seurat):

# BiocManager::install("monocle") # reuqire monocle
cds.obj <- ExportSeurat(seu.obj = pbmc_small, assay = "RNA", reduction = "tsne", to = "CellDataSet")

## Convert SeuratObject to CellDataSet (suitable for Monocle)!

cds.obj

## CellDataSet (storageMode: environment)
## assayData: 230 features, 80 samples 
##   element names: exprs 
## protocolData: none
## phenoData
##   sampleNames: ATGCCAGAACGACT CATGGCCTGTGCAT ... CTTGATTGATCTTC (80
##     total)
##   varLabels: orig.ident nCount_RNA ... Size_Factor (8 total)
##   varMetadata: labelDescription
## featureData
##   featureNames: MS4A1 CD79B ... S100B (230 total)
##   fvarLabels: vst.mean vst.variance ... gene_short_name (6 total)
##   fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## Annotation:

To cell_data_set (The conversion is performed with functions implemented in SeuratWrappers):

# remotes::install_github('cole-trapnell-lab/monocle3') # reuqire monocle3
cds3.obj <- ExportSeurat(seu.obj = pbmc_small, assay = "RNA", to = "cell_data_set")
cds3.obj

SeuratObject to AnnData

There are multiple tools available for format conversion from SeuratObject to Anndata:

scDIOR is the best method in terms of information kept and usability
sceasy has best performance in running time and disk usage.

# sceasy
Seu2AD(
  seu.obj = pbmc_small, method = "sceasy", out.folder = "/Users/soyabean/Desktop/tmp/scdown/conversion",
  assay = "RNA", slot = "counts", conda.path = "/Applications/anaconda3"
)

## AnnData object with n_obs × n_vars = 80 × 230
##     obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'RNA_snn_res.0.8', 'letter.idents', 'groups', 'RNA_snn_res.1'
##     var: 'vst.mean', 'vst.variance', 'vst.variance.expected', 'vst.variance.standardized', 'vst.variable'
##     obsm: 'X_pca', 'X_tsne'

# # SeuratDisk
# Seu2AD(seu.obj = pbmc_small, method = "SeuratDisk", out.folder = "/Users/soyabean/Desktop/tmp/scdown/conversion",
#        assay = "RNA", save.scale = TRUE)
# # scDIOR
# Seu2AD(seu.obj = pbmc_small, method = "scDIOR",
#        out.folder = "/Users/soyabean/Desktop/tmp/scdown/conversion", assay = "RNA", save.scale = TRUE)

SeuratObject to loom

The conversion is performed with functions implemented in SeuratDisk:

ExportSeurat(
  seu.obj = pbmc_small, assay = "RNA", to = "loom",
  loom.file = "/Users/soyabean/Desktop/tmp/scdown/conversion/pbmc_small.loom"
)

## Convert SeuratObject to loom!

## Warning: Overwriting previous file
## /Users/soyabean/Desktop/tmp/scdown/conversion/pbmc_small.loom

## Saving data from RNA as /matrix

## Adding slot counts for assay RNA

## Adding layer counts

## Adding col attribute CellID

## Adding col attribute orig.ident

## Adding col attribute nCount_RNA

## Adding col attribute nFeature_RNA

## Adding col attribute RNA_snn_res.0.8

## Adding col attribute letter.idents

## Adding col attribute groups

## Adding col attribute RNA_snn_res.1

## Adding row attribute Gene

Convert other objects to SeuratObject

SingleCellExperiment to SeuratObject

The conversion is performed with functions implemented in Seurat:

seu.obj.sce <- ImportSeurat(
  obj = sce.obj, from = "SCE", count.assay = "counts",
  data.assay = "logcounts", assay = "RNA"
)

## Convert SingleCellExperiment to SeuratObject!

## Warning: Keys should be one or more alphanumeric characters followed by an
## underscore, setting key from PC__ to PC_

## Warning: All keys should be one or more alphanumeric characters followed by an
## underscore '_', setting key to PC_

## Warning: Keys should be one or more alphanumeric characters followed by an
## underscore, setting key from tSNE__ to tSNE_

## Warning: All keys should be one or more alphanumeric characters followed by an
## underscore '_', setting key to tSNE_

seu.obj.sce

## An object of class Seurat 
## 230 features across 80 samples within 1 assay 
## Active assay: RNA (230 features, 0 variable features)
##  2 dimensional reductions calculated: pca, tsne

CellDataSet/cell_data_set to SeuratObject

CellDataSet to SeuratObject (The conversion is performed with functions implemented in Seurat):

seu.obj.cds <- ImportSeurat(
  obj = cds.obj, from = "CellDataSet",
  count.assay = "counts", assay = "RNA"
)

## Convert CellDataSet (Monocle) to SeuratObject!

## Pulling expression data

## Building Seurat object

## Adding feature-level metadata

## No dispersion information in CellDataSet object

## No variable features present

## Adding tSNE dimensional reduction

seu.obj.cds

## An object of class Seurat 
## 230 features across 80 samples within 1 assay 
## Active assay: RNA (230 features, 0 variable features)
##  1 dimensional reduction calculated: tsne

cell_data_set to SeuratObject (The conversion is performed with functions implemented in Seurat):

seu.obj.cds3 <- ImportSeurat(
  obj = cds3.obj, from = "cell_data_set",
  count.assay = "counts", data.assay = "logcounts", assay = "RNA"
)
seu.obj.cds3

AnnData to SeuratObject

There are multiple tools available for format conversion from AnnData to SeuratObject:

scDIOR is the best method in terms of information kept (GEfetch2R integrates scDIOR and SeuratDisk to achieve the best performance in information kept)
schard is the best method in terms of usability
schard and sceasy have comparable performance when cell number below 200k, but sceasy has better performance in scalability
sceasy has better performance in disk usage

# sceasy
ann.sceasy <- AD2Seu(
  anndata.file = "/Users/soyabean/Desktop/tmp/scdown/conversion/pbmc_small_sceasy.h5ad", method = "sceasy",
  assay = "RNA", slot = "scale.data"
)

## X -> data

ann.sceasy

## An object of class Seurat 
## 230 features across 80 samples within 1 assay 
## Active assay: RNA (230 features, 0 variable features)
##  2 dimensional reductions calculated: pca, tsne

# # SeuratDisk
# ann.seu <- AD2Seu(anndata.file = "/Users/soyabean/Desktop/tmp/scdown/conversion/pbmc_small_sceasy.h5ad",
#                   method = "SeuratDisk", assay = "RNA", load.assays = c("RNA"))
# # scDIOR
# ann.scdior <- AD2Seu(anndata.file = "/Users/soyabean/Desktop/tmp/scdown/conversion/pbmc_small_sceasy.h5ad",
#                      method = "scDIOR", assay = "RNA")
# # schard
# ann.schard <- AD2Seu(anndata.file = "/Users/soyabean/Desktop/tmp/scdown/conversion/pbmc_small_sceasy.h5ad",
#                      method = "schard", assay = "RNA", use.raw = T)
# # SeuratDisk+scDIOR
# ann.seuscdior <- AD2Seu(anndata.file = "/Users/soyabean/Desktop/tmp/scdown/conversion/pbmc_small_sceasy.h5ad",
#                         method = "SeuratDisk+scDIOR", assay = "RNA", load.assays = c("RNA"))

loom to SeuratObject

The conversion is performed with functions implemented in SeuratDisk and Seurat:

# loom will lose reduction
seu.obj.loom <- ImportSeurat(loom.file = "/Users/soyabean/Desktop/tmp/scdown/conversion/pbmc_small.loom", from = "loom")

## Convert loom to SeuratObject!

## Reading in /matrix

## Storing /matrix as counts

## Saving /matrix to assay 'RNA'

## Loading graph RNA_snn

seu.obj.loom

## An object of class Seurat 
## 230 features across 80 samples within 1 assay 
## Active assay: RNA (230 features, 0 variable features)

Conversion between SingleCellExperiment and AnnData

SingleCellExperiment to AnnData

There are multiple tools available for format conversion from SingleCellExperiment to AnnData:

zellkonverter is the best method in terms of information kept and running time
scDIOR is the best method in terms of usability and disk usage

# zellkonverter
SCE2AD(
  sce.obj = seger, method = "zellkonverter",
  out.folder = "/Users/soyabean/Desktop/tmp/scdown/conversion", slot = "counts",
  conda.path = "/Applications/anaconda3"
)

## NULL

# # sceasy
# SCE2AD(sce.obj = seger, method = "sceasy", out.folder = "/Users/soyabean/Desktop/tmp/scdown/conversion",
#        slot = "counts", conda.path = "/Applications/anaconda3")
#
# # scDIOR
# seger.scdior <- seger
# library(SingleCellExperiment)
# # scDIOR does not support varm in rowData
# rowData(seger.scdior)$varm <- NULL
# SCE2AD(sce.obj = seger.scdior, method = "scDIOR", out.folder = "/Users/soyabean/Desktop/tmp/scdown/conversion")

AnnData to SingleCellExperiment

There are multiple tools available for format conversion from AnnData to SingleCellExperiment:

zellkonverter is the best method in terms of information kept
schard is the best method in terms of usability and running time
schard and scDIOR have comparable performance in disk usage

# zellkonverter
sce.zell <- AD2SCE(
  anndata.file = "/Users/soyabean/Desktop/tmp/scdown/conversion/seger_zellkonverter.h5ad",
  method = "zellkonverter", slot = "scale.data",
  use.raw = TRUE, conda.path = "/Applications/anaconda3"
)

## Warning in py_to_r.pandas.core.frame.DataFrame(adata$var): index contains
## duplicated values: row names not set

## Warning: The names of these selected obs columns have been modified to match R
## conventions: 'body mass index' -> 'body.mass.index', 'clinical information' ->
## 'clinical.information', 'single cell well quality' ->
## 'single.cell.well.quality', 'submitted single cell quality' ->
## 'submitted.single.cell.quality', and 'cell type' -> 'cell.type'

sce.zell

## class: SingleCellExperiment 
## dim: 26179 3514 
## metadata(1): X_name
## assays(1): scale.data
## rownames(26179): SGIP1 AZIN2 ... BIVM-ERCC5 eGFP
## rowData names(2): refseq symbol
## colnames(3514): HP1502401_N13 HP1502401_D14 ... HP1526901T2D_O11
##   HP1526901T2D_A8
## colData names(9): individual sex ... submitted.single.cell.quality
##   cell.type
## reducedDimNames(0):
## altExpNames(0):

# # scDIOR
# sce.scdior <- AD2SCE(anndata.file = "/Users/soyabean/Desktop/tmp/scdown/conversion/seger_zellkonverter.h5ad.h5ad",
#                      method = "scDIOR", assay = "RNA",
#                      use.raw = TRUE, conda.path = "/Applications/anaconda3")
# # schard
# sce.schard <- AD2SCE(anndata.file = "/Users/soyabean/Desktop/tmp/scdown/conversion/seger_zellkonverter.h5ad.h5ad",
#                      method = "schard", use.raw = TRUE)

Conversion between SingleCellExperiment and loom

The conversion is performed with functions implemented in LoomExperiment.

SingleCellExperiment to loom

# remove seger.loom first
SCELoom(
  from = "SingleCellExperiment", to = "loom", sce = seger,
  loom.file = "/Users/soyabean/Desktop/tmp/scdown/conversion/seger.loom"
)

loom to SingleCellExperiment

seger.loom <- SCELoom(
  from = "loom", to = "SingleCellExperiment",
  loom.file = "/Users/soyabean/Desktop/tmp/scdown/conversion/seger.loom"
)

## Convert loom to SingleCellExperiment.

seger.loom

## class: SingleCellExperiment 
## dim: 26179 3514 
## metadata(0):
## assays(1): counts
## rownames(26179): SGIP1 AZIN2 ... BIVM-ERCC5.1 eGFP
## rowData names(2): refseq symbol
## colnames(3514): HP1502401_N13 HP1502401_D14 ... HP1526901T2D_O11
##   HP1526901T2D_A8
## colData names(8): Source.Name age ... sex single.cell.well.quality
## reducedDimNames(0):
## altExpNames(0):

2023-07-30