ObjectConversion
2023-07-30
ObjectConversion.Rmd
Introduction
There are many tools have been developed to process scRNA-seq data,
such as Scanpy,
Seurat, scran
and Monocle.
These tools have their own objects, such as Anndata
of
Scanpy
, SeuratObject
of Seurat
,
SingleCellExperiment
of scran
and
CellDataSet
/cell_data_set
of
Monocle2
/Monocle3
. There are also some file
format designed for large omics datasets, such as loom. To perform a comprehensive scRNA-seq
data analysis, we usually need to combine multiple tools, which means we
need to perform object conversion frequently. To facilitate user
analysis of scRNA-seq data, scfetch
provides multiple
functions to perform object conversion between widely used tools and
formats. The object conversion implemented in scfetch
has
two main advantages:
- one-step conversion between different objects. There will be no conversion to intermediate objects, thus preventing unnecessary information loss.
-
tools used for object conversion are developed by the team
of the source/destination object as far as possible. For
example, we use
SeuratDisk
to convert SeuratObject to loom, usezellkonverter
to perform conversion betweenSingleCellExperiment
andAnndata
. When there is no such tools, we usesceasy
to perform conversion.
Test data
# library
library(scfetch)
## Setting options('download.file.method.GEOquery'='auto')
## Setting options('GEOquery.inmemory.gpl'=FALSE)
## Registered S3 method overwritten by 'SeuratDisk':
## method from
## as.sparse.H5Group Seurat
## Attaching SeuratObject
library(scRNAseq) # seger
## Loading required package: SingleCellExperiment
## Loading required package: SummarizedExperiment
## Loading required package: MatrixGenerics
## Loading required package: matrixStats
##
## Attaching package: 'MatrixGenerics'
## The following objects are masked from 'package:matrixStats':
##
## colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
## colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
## colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
## colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
## colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
## colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
## colWeightedMeans, colWeightedMedians, colWeightedSds,
## colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
## rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
## rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
## rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
## rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
## rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
## rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
## rowWeightedSds, rowWeightedVars
## Loading required package: GenomicRanges
## Loading required package: stats4
## Loading required package: BiocGenerics
## Warning: package 'BiocGenerics' was built under R version 4.0.5
## Loading required package: parallel
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:parallel':
##
## clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
## clusterExport, clusterMap, parApply, parCapply, parLapply,
## parLapplyLB, parRapply, parSapply, parSapplyLB
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## anyDuplicated, append, as.data.frame, basename, cbind, colnames,
## dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
## grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
## order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
## rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
## union, unique, unsplit, which.max, which.min
## Loading required package: S4Vectors
##
## Attaching package: 'S4Vectors'
## The following object is masked from 'package:base':
##
## expand.grid
## Loading required package: IRanges
## Loading required package: GenomeInfoDb
## Warning: package 'GenomeInfoDb' was built under R version 4.0.5
## Loading required package: Biobase
## Welcome to Bioconductor
##
## Vignettes contain introductory material; view with
## 'browseVignettes()'. To cite Bioconductor, see
## 'citation("Biobase")', and for packages 'citation("pkgname")'.
##
## Attaching package: 'Biobase'
## The following object is masked from 'package:MatrixGenerics':
##
## rowMedians
## The following objects are masked from 'package:matrixStats':
##
## anyMissing, rowMedians
##
## Attaching package: 'SummarizedExperiment'
## The following object is masked from 'package:SeuratObject':
##
## Assays
## The following object is masked from 'package:Seurat':
##
## Assays
## Possible Ensembl SSL connectivity problems detected.
## Please see the 'Connection Troubleshooting' section of the biomaRt vignette
## vignette('accessing_ensembl', package = 'biomaRt')Error in curl::curl_fetch_memory(url, handle = handle) :
## SSL peer certificate or SSH remote key was not OK: [uswest.ensembl.org] SSL certificate problem: certificate has expired
SeuratObject
:
# object
pbmc_small
## An object of class Seurat
## 230 features across 80 samples within 1 assay
## Active assay: RNA (230 features, 20 variable features)
## 2 dimensional reductions calculated: pca, tsne
# metadata
head(pbmc_small@meta.data)
## orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.8
## ATGCCAGAACGACT SeuratProject 70 47 0
## CATGGCCTGTGCAT SeuratProject 85 52 0
## GAACCTGATGAACC SeuratProject 87 50 1
## TGACTGGATTCTCA SeuratProject 127 56 0
## AGTCAGACTGCACA SeuratProject 173 53 0
## TCTGATACACGTGT SeuratProject 70 48 0
## letter.idents groups RNA_snn_res.1
## ATGCCAGAACGACT A g2 0
## CATGGCCTGTGCAT A g1 0
## GAACCTGATGAACC B g2 0
## TGACTGGATTCTCA A g2 0
## AGTCAGACTGCACA A g2 0
## TCTGATACACGTGT A g1 0
SingleCellExperiment
:
seger <- scRNAseq::SegerstolpePancreasData()
## snapshotDate(): 2020-10-27
## see ?scRNAseq and browseVignettes('scRNAseq') for documentation
## loading from cache
## see ?scRNAseq and browseVignettes('scRNAseq') for documentation
## loading from cache
## see ?scRNAseq and browseVignettes('scRNAseq') for documentation
## loading from cache
## snapshotDate(): 2020-10-27
## see ?scRNAseq and browseVignettes('scRNAseq') for documentation
## loading from cache
seger
## class: SingleCellExperiment
## dim: 26179 3514
## metadata(0):
## assays(1): counts
## rownames(26179): SGIP1 AZIN2 ... BIVM-ERCC5 eGFP
## rowData names(2): symbol refseq
## colnames(3514): HP1502401_N13 HP1502401_D14 ... HP1526901T2D_O11
## HP1526901T2D_A8
## colData names(8): Source Name individual ... age body mass index
## reducedDimNames(0):
## altExpNames(1): ERCC
Convert SeuratObject to other objects
Here, we will convert SeuratObject to
SingleCellExperiment
,
CellDataSet
/cell_data_set
,
Anndata
, loom
.
SeuratObject to SingleCellExperiment
The conversion is performed with functions implemented in
Seurat
:
sce.obj <- ExportSeurat(seu.obj = pbmc_small, assay = "RNA", to = "SCE")
## Convert SeuratObject to SingleCellExperiment (suitable for scater)!
sce.obj
## class: SingleCellExperiment
## dim: 230 80
## metadata(0):
## assays(2): counts logcounts
## rownames(230): MS4A1 CD79B ... SPON2 S100B
## rowData names(5): vst.mean vst.variance vst.variance.expected
## vst.variance.standardized vst.variable
## colnames(80): ATGCCAGAACGACT CATGGCCTGTGCAT ... GGAACACTTCAGAC
## CTTGATTGATCTTC
## colData names(8): orig.ident nCount_RNA ... RNA_snn_res.1 ident
## reducedDimNames(2): PCA TSNE
## altExpNames(0):
SeuratObject to CellDataSet/cell_data_set
To CellDataSet
(The conversion is performed with
functions implemented in Seurat
):
# BiocManager::install("monocle") # reuqire monocle
cds.obj <- ExportSeurat(seu.obj = pbmc_small, assay = "RNA", reduction = "tsne", to = "CellDataSet")
## Convert SeuratObject to CellDataSet (suitable for Monocle)!
cds.obj
## CellDataSet (storageMode: environment)
## assayData: 230 features, 80 samples
## element names: exprs
## protocolData: none
## phenoData
## sampleNames: ATGCCAGAACGACT CATGGCCTGTGCAT ... CTTGATTGATCTTC (80
## total)
## varLabels: orig.ident nCount_RNA ... Size_Factor (8 total)
## varMetadata: labelDescription
## featureData
## featureNames: MS4A1 CD79B ... S100B (230 total)
## fvarLabels: vst.mean vst.variance ... gene_short_name (6 total)
## fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## Annotation:
To cell_data_set
(The conversion is performed with
functions implemented in SeuratWrappers
):
# remotes::install_github('cole-trapnell-lab/monocle3') # reuqire monocle3
cds3.obj <- ExportSeurat(seu.obj = pbmc_small, assay = "RNA", to = "cell_data_set")
## Convert SeuratObject to cell_data_set (suitable for Monocle3)!
## Warning: Monocle 3 trajectories require cluster partitions, which Seurat does
## not calculate. Please run 'cluster_cells' on your cell_data_set object
cds3.obj
## class: cell_data_set
## dim: 230 80
## metadata(0):
## assays(2): counts logcounts
## rownames(230): MS4A1 CD79B ... SPON2 S100B
## rowData names(5): vst.mean vst.variance vst.variance.expected
## vst.variance.standardized vst.variable
## colnames(80): ATGCCAGAACGACT CATGGCCTGTGCAT ... GGAACACTTCAGAC
## CTTGATTGATCTTC
## colData names(9): orig.ident nCount_RNA ... ident Size_Factor
## reducedDimNames(2): PCA TSNE
## altExpNames(0):
SeuratObject to AnnData
AnnData
is a Python object, reticulate
is
used to communicate between Python and R. User should create a Python
environment which contains anndata
package and specify the
environment path with conda.path
to ensure the exact usage
of this environment.
The conversion is performed with functions implemented in
sceasy
:
# remove pbmc_small.h5ad first
ExportSeurat(
seu.obj = pbmc_small, assay = "RNA", to = "AnnData", conda.path = "/Applications/anaconda3",
anndata.file = "/Users/soyabean/Desktop/tmp/scdown/conversion/pbmc_small.h5ad"
)
SeuratObject to loom
The conversion is performed with functions implemented in
SeuratDisk
:
ExportSeurat(
seu.obj = pbmc_small, assay = "RNA", to = "loom",
loom.file = "/Users/soyabean/Desktop/tmp/scdown/conversion/pbmc_small.loom"
)
Convert other objects to SeuratObject
SingleCellExperiment to SeuratObject
The conversion is performed with functions implemented in
Seurat
:
seu.obj.sce <- ImportSeurat(obj = sce.obj, from = "SCE", count.assay = "counts", data.assay = "logcounts", assay = "RNA")
## Convert SingleCellExperiment to SeuratObject!
## Warning: Keys should be one or more alphanumeric characters followed by an
## underscore, setting key from PC__ to PC_
## Warning: All keys should be one or more alphanumeric characters followed by an
## underscore '_', setting key to PC_
## Warning: Keys should be one or more alphanumeric characters followed by an
## underscore, setting key from tSNE__ to tSNE_
## Warning: All keys should be one or more alphanumeric characters followed by an
## underscore '_', setting key to tSNE_
seu.obj.sce
## An object of class Seurat
## 230 features across 80 samples within 1 assay
## Active assay: RNA (230 features, 0 variable features)
## 2 dimensional reductions calculated: pca, tsne
CellDataSet/cell_data_set to SeuratObject
CellDataSet
to SeuratObject
(The conversion
is performed with functions implemented in Seurat
):
seu.obj.cds <- ImportSeurat(obj = cds.obj, from = "CellDataSet", count.assay = "counts", assay = "RNA")
## Convert CellDataSet (Monocle) to SeuratObject!
## Pulling expression data
## Building Seurat object
## Adding feature-level metadata
## No dispersion information in CellDataSet object
## No variable features present
## Adding tSNE dimensional reduction
seu.obj.cds
## An object of class Seurat
## 230 features across 80 samples within 1 assay
## Active assay: RNA (230 features, 0 variable features)
## 1 dimensional reduction calculated: tsne
cell_data_set
to SeuratObject
(The
conversion is performed with functions implemented in
Seurat
):
seu.obj.cds3 <- ImportSeurat(obj = cds3.obj, from = "cell_data_set", count.assay = "counts", data.assay = "logcounts", assay = "RNA")
## Convert cell_data_set (Monocle3) to SeuratObject!
seu.obj.cds3
## An object of class Seurat
## 230 features across 80 samples within 1 assay
## Active assay: RNA (230 features, 0 variable features)
## 2 dimensional reductions calculated: pca, tsne
AnnData to SeuratObject
AnnData
is a Python object, reticulate
is
used to communicate between Python and R. User should create a Python
environment which contains anndata
package and specify the
environment path with conda.path
to ensure the exact usage
of this environment.
The conversion is performed with functions implemented in
sceasy
:
seu.obj.h5ad <- ImportSeurat(
anndata.file = "/Users/soyabean/Desktop/tmp/scdown/conversion/pbmc_small.h5ad",
from = "AnnData", assay = "RNA", conda.path = "/Applications/anaconda3"
)
## Convert AnnData to SeuratObject!
## X -> counts
seu.obj.h5ad
## An object of class Seurat
## 230 features across 80 samples within 1 assay
## Active assay: RNA (230 features, 0 variable features)
## 2 dimensional reductions calculated: pca, tsne
loom to SeuratObject
The conversion is performed with functions implemented in
SeuratDisk
and Seurat
:
# loom will lose reduction
seu.obj.loom <- ImportSeurat(loom.file = "/Users/soyabean/Desktop/tmp/scdown/conversion/pbmc_small.loom", from = "loom")
## Convert loom to SeuratObject!
## Reading in /matrix
## Storing /matrix as counts
## Saving /matrix to assay 'RNA'
## Loading graph RNA_snn
seu.obj.loom
## An object of class Seurat
## 230 features across 80 samples within 1 assay
## Active assay: RNA (230 features, 0 variable features)
Conversion between SingleCellExperiment and AnnData
The conversion is performed with functions implemented in
zellkonverter
.
SingleCellExperiment to AnnData
# remove seger.h5ad first
SCEAnnData(
from = "SingleCellExperiment", to = "AnnData", sce = seger, X_name = "counts",
anndata.file = "/Users/soyabean/Desktop/tmp/scdown/conversion/seger.h5ad"
)
AnnData to SingleCellExperiment
seger.anndata <- SCEAnnData(
from = "AnnData", to = "SingleCellExperiment",
anndata.file = "/Users/soyabean/Desktop/tmp/scdown/conversion/seger.h5ad"
)
## Convert AnnData to SingleCellExperiment.
seger.anndata
## class: SingleCellExperiment
## dim: 26179 3514
## metadata(0):
## assays(1): counts
## rownames(26179): SGIP1 AZIN2 ... BIVM-ERCC5 eGFP
## rowData names(2): symbol refseq
## colnames(3514): HP1502401_N13 HP1502401_D14 ... HP1526901T2D_O11
## HP1526901T2D_A8
## colData names(8): Source Name individual ... age body mass index
## reducedDimNames(0):
## altExpNames(0):
Conversion between SingleCellExperiment and loom
The conversion is performed with functions implemented in
LoomExperiment
.
SingleCellExperiment to loom
# remove seger.loom first
SCELoom(
from = "SingleCellExperiment", to = "loom", sce = seger,
loom.file = "/Users/soyabean/Desktop/tmp/scdown/conversion/seger.loom"
)
loom to SingleCellExperiment
seger.loom <- SCELoom(
from = "loom", to = "SingleCellExperiment",
loom.file = "/Users/soyabean/Desktop/tmp/scdown/conversion/seger.loom"
)
## Convert loom to SingleCellExperiment.
seger.loom
## class: SingleCellExperiment
## dim: 26179 3514
## metadata(0):
## assays(1): counts
## rownames(26179): SGIP1 AZIN2 ... BIVM-ERCC5.1 eGFP
## rowData names(2): refseq symbol
## colnames(3514): HP1502401_N13 HP1502401_D14 ... HP1526901T2D_O11
## HP1526901T2D_A8
## colData names(8): Source.Name age ... sex single.cell.well.quality
## reducedDimNames(0):
## altExpNames(0):