Skip to contents

Introduction

GEfetch2R provides functions for users to download processed single-cell RNA-seq data from GEO, Zenodo, CELLxGENE and Human Cell Atlas, including files in rds, RData, h5ad, h5, loom formats.

Until now, the public resources supported and the returned values:

Resources URL Download Type Returned values
GEO https://www.ncbi.nlm.nih.gov/geo/ rds, RData, h5ad, loom SeuratObject(rds) or failed datasets
Zenodo https://zenodo.org/ count matrix, rds, RData, h5ad, et al. SeuratObject(rds) or failed datasets
CELLxGENE https://cellxgene.cziscience.com/ rds, h5ad SeuratObject(rds) or failed datasets
Human Cell Atlas https://www.humancellatlas.org/ rds, RData, h5, h5ad, loom SeuratObject(rds) or failed projects

Check API

Check the availability of APIs used:

CheckAPI(database = c("GEO", "Zenodo", "CELLxGENE", "Human Cell Atlas"))
# start checking APIs to access GEO!
# The API to access the GEO object is OK!
# trying URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE302nnn/GSE302912/suppl//GSE302912_counts.csv.gz?tool=geoquery'
# Content type 'application/x-gzip' length 332706 bytes (324 KB)
# ==================================================
# downloaded 324 KB
#
# The API to access supplementary files is OK!
# start checking APIs to access Zenodo!
# The API to access detailed information of a given doi is OK!
# The API to access available files is OK!
# start checking APIs to access CELLxGENE!
# The API to access all available collections is OK!
# The API to access detailed information of a given collection is OK!
# The API to access available files is OK!
# The API to access detailed information of a given dataset is OK!
# start checking APIs to access Human Cell Atlas!
# The API to access all available catalogs is OK!
# The API to access all available projects is OK!
# The API to access available files is OK!
# The API to access detailed information of a given project is OK!

GEO

GEO is an international public repository that archives and freely distributes microarray, next-generation sequencing, and other forms of high-throughput functional genomics data submitted by the research community. It provides a very convenient way for users to explore and select interested scRNA-seq datasets. Nowadays, in addition to the count matrix, GEO includes processed objects uploaded by users as supplementary files.

Extract metadata (optional)

ExtractGEOMeta provides two ways to extract sample metadata:

  • user-provided sample metadata when uploading to GEO (applicable to all GEO accessions), including sample title, source name/tissue, description, cell type, treatment, paper title, paper abstract, organism, protocol, data processing methods, et al:
# library
library(tidyverse)
library(GEfetch2R)

# set VROOM_CONNECTION_SIZE to avoid error: Error: The size of the connection buffer (786432) was not large enough
Sys.setenv("VROOM_CONNECTION_SIZE" = 131072 * 60)
# extract metadata
GSE285723.meta <- ExtractGEOMeta(acce = "GSE285723")
GSE285723.meta[1:3, c("title", "geo_accession", "source_name_ch1", "description", "cell type")]
#     title geo_accession source_name_ch1           description  cell type
# 1   DRA-F    GSM8707297            Lung   Library name: DRA-F Homogenate
# 2   DRA-M    GSM8707298            Lung   Library name: DRA-M Homogenate
# 3 DRAO3-F    GSM8707299            Lung Library name: DRAO3-F Homogenate
  • metadata in supplementary file:
# example
GSE297431.meta.supp <- ExtractGEOMeta(
  acce = "GSE297431", down.supp = TRUE,
  supp.idx = 2 # specify the index of used supplementary file
)
head(GSE297431.meta.supp)
#             Sample_ID  Batch  Plate   Type Growths Class
# 1   Plate1_mut_A1_S70 batch2 Plate1 mutant       3    M3
# 2 Plate1_mut_A11_S135 batch2 Plate1 mutant       3    M3
# 3 Plate1_mut_A12_S141 batch2 Plate1 mutant       1    M1
# 4   Plate1_mut_A2_S77 batch2 Plate1 mutant       1    M1
# 5   Plate1_mut_A3_S84 batch2 Plate1 mutant       2    M2
# 6   Plate1_mut_A4_S91 batch2 Plate1 mutant       2    M2

Download object and load to R

After downloading the metadata, users can download the specified objects with ParseGEOProcessed. The format of downloaded objects are controlled by file.ext (choose from "rds", "rdata", "h5ad" and "loom") and the provided object formats should be in lower case.

The processed objects in the supplementary files are in two forms:

  • single file (rds(.gz)/rdata(.gz)/h5ad(.gz)/loom(.gz)) contain the processed object, e.g. GSE285723-RDS.gz
  • (gzip) archive (contain rds(.gz)/rdata(.gz)/h5ad(.gz)/loom(.gz) files) contain the processed objects, e.g. GSE298041-RDS in tar.gz
# return SeuratObject
GSE285723.seu <- ParseGEOProcessed(
  acce = "GSE285723", supp.idx = 1,
  file.ext = c("rdata", "rds"), return.seu = T, timeout = 36000000,
  out.folder = "~/gefetch2r/doc/download_geoObj"
)
GSE285723.seu
# An object of class Seurat
# 56055 features across 49806 samples within 2 assays
# Active assay: SCT (23770 features, 3000 variable features)
#  1 other assay present: RNA
#  4 dimensional reductions calculated: pca, harmony, umap, tsne

# download h5ad objects
GSE311813.h5ad.log <- ParseGEOProcessed(
  acce = "GSE311813", supp.idx = 1,
  file.ext = c("h5ad"),
  out.folder = "~/gefetch2r/doc/download_geoObj"
)

# # The structure of downloaded files
# tree ~/gefetch2r/doc/download_geoObj
# ~/gefetch2r/doc/download_geoObj
# ├── GSE285723
# │   └── GSE285723_Final_Ballinger.RDS
# └── GSE311813
#     └── GSM9332642_merged_raw.ARS.vivo.clean.labelled.h5ad

# 2 directories, 2 files

Zenodo

Zenodo contains various types of processed objects, such as SeuratObject which has been clustered and annotated, AnnData which contains processed results generated by scanpy.

Extract metadata (optional)

GEfetch2R provides ExtractZenodoMeta to extract dataset metadata, including dataset title, description, available files and corresponding md5. Please note that when the dataset is restricted access, the returned dataframe will be empty.

# single doi
zebrafish.df <- ExtractZenodoMeta(doi = "10.5281/zenodo.7243603")
zebrafish.df
#                              title
# 1 zebrafish scRNA data set objects
# 2 zebrafish scRNA data set objects
#                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                description
# 1 <p>Combined and converted scRNA data from http://tome.gs.washington.edu/ (Qiu et al. 2022), see a detailed description of the study here: https://www.nature.com/articles/s41588-022-01018-x</p>\n\n<p>Data were downloaded from http://tome.gs.washington.edu/ as R rds files, combined into a single Seurat object and converted into loom and AnnData (h5ad) files to be able to analyse with e.g. python scanpy package.</p>\n\n<p>If you use this data, please cite Farrel et al. 2018, Wagner et al. 2018 and Qiu et al. 2022.</p>
# 2 <p>Combined and converted scRNA data from http://tome.gs.washington.edu/ (Qiu et al. 2022), see a detailed description of the study here: https://www.nature.com/articles/s41588-022-01018-x</p>\n\n<p>Data were downloaded from http://tome.gs.washington.edu/ as R rds files, combined into a single Seurat object and converted into loom and AnnData (h5ad) files to be able to analyse with e.g. python scanpy package.</p>\n\n<p>If you use this data, please cite Farrel et al. 2018, Wagner et al. 2018 and Qiu et al. 2022.</p>
#                                                                         url             filename
# 1  https://zenodo.org/api/records/7243603/files/zebrafish_data.h5ad/content  zebrafish_data.h5ad
# 2 https://zenodo.org/api/records/7243603/files/zebrafish_data.RData/content zebrafish_data.RData
#                                md5   license
# 1 124f2229128918b411a7dc7931558f97 cc-by-4.0
# 2 a08c3ebd285b370fcf34cf2f8f9bdb59 cc-by-4.0


# vector dois
multi.dois <- ExtractZenodoMeta(doi = c("1111", "10.5281/zenodo.7243603", "10.5281/zenodo.7244441"))
multi.dois
#                              title
# 1 zebrafish scRNA data set objects
# 2 zebrafish scRNA data set objects
# 3      frog scRNA data set objects
# 4      frog scRNA data set objects
#                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                description
# 1 <p>Combined and converted scRNA data from http://tome.gs.washington.edu/ (Qiu et al. 2022), see a detailed description of the study here: https://www.nature.com/articles/s41588-022-01018-x</p>\n\n<p>Data were downloaded from http://tome.gs.washington.edu/ as R rds files, combined into a single Seurat object and converted into loom and AnnData (h5ad) files to be able to analyse with e.g. python scanpy package.</p>\n\n<p>If you use this data, please cite Farrel et al. 2018, Wagner et al. 2018 and Qiu et al. 2022.</p>
# 2 <p>Combined and converted scRNA data from http://tome.gs.washington.edu/ (Qiu et al. 2022), see a detailed description of the study here: https://www.nature.com/articles/s41588-022-01018-x</p>\n\n<p>Data were downloaded from http://tome.gs.washington.edu/ as R rds files, combined into a single Seurat object and converted into loom and AnnData (h5ad) files to be able to analyse with e.g. python scanpy package.</p>\n\n<p>If you use this data, please cite Farrel et al. 2018, Wagner et al. 2018 and Qiu et al. 2022.</p>
# 3                     <p>Combined and converted scRNA data from http://tome.gs.washington.edu/ (Qiu et al. 2022), see a detailed description of the study here: https://www.nature.com/articles/s41588-022-01018-x</p>\n\n<p>Data were downloaded from http://tome.gs.washington.edu/ as R rds files, combined into a single Seurat object and converted into loom and AnnData (h5ad) files to be able to analyse with e.g. python scanpy package.</p>\n\n<p>If you use this data, please cite Briggs et al. 2018 and Qiu et al. 2022.</p>
# 4                     <p>Combined and converted scRNA data from http://tome.gs.washington.edu/ (Qiu et al. 2022), see a detailed description of the study here: https://www.nature.com/articles/s41588-022-01018-x</p>\n\n<p>Data were downloaded from http://tome.gs.washington.edu/ as R rds files, combined into a single Seurat object and converted into loom and AnnData (h5ad) files to be able to analyse with e.g. python scanpy package.</p>\n\n<p>If you use this data, please cite Briggs et al. 2018 and Qiu et al. 2022.</p>
#                                                                         url             filename
# 1  https://zenodo.org/api/records/7243603/files/zebrafish_data.h5ad/content  zebrafish_data.h5ad
# 2 https://zenodo.org/api/records/7243603/files/zebrafish_data.RData/content zebrafish_data.RData
# 3       https://zenodo.org/api/records/7244441/files/frog_data.h5ad/content       frog_data.h5ad
# 4      https://zenodo.org/api/records/7244441/files/frog_data.RData/content      frog_data.RData
#                                md5   license
# 1 124f2229128918b411a7dc7931558f97 cc-by-4.0
# 2 a08c3ebd285b370fcf34cf2f8f9bdb59 cc-by-4.0
# 3 7be7d6ff024ab2c8579b4d0edb2428e3 cc-by-4.0
# 4 c80f46320c0cff9e341bed195f12c3b1 cc-by-4.0

Download object and load to R

After manually check the extracted metadata, users can download the specified objects with ParseZenodo. The format of downloaded objects are controlled by file.ext and the provided object formats should be in lower case.

The returned value is a dataframe containing failed objects or a SeuratObject (if file.ext is rds and return.seu = TRUE). If dataframe, users can re-run ParseZenodo by setting doi.df to the returned value.

# download objects
multi.dois.parse <- ParseZenodo(
  doi = c("1111", "10.5281/zenodo.7243603", "10.5281/zenodo.7244441"),
  file.ext = c("rdata"), timeout = 36000000,
  out.folder = "~/gefetch2r/doc/download_zenodo"
)

# return SeuratObject
sinle.doi.parse.seu <- ParseZenodo(
  doi = "10.5281/zenodo.8011282",
  file.ext = c("rds"), return.seu = TRUE, timeout = 36000000,
  out.folder = "~/gefetch2r/doc/download_zenodo"
)
sinle.doi.parse.seu
# An object of class Seurat
# 19594 features across 9219 samples within 2 assays
# Active assay: RNA (17594 features, 0 variable features)
#  1 other assay present: integrated
#  2 dimensional reductions calculated: pca, umap

# # The structure of downloaded files
# tree ~/gefetch2r/doc/download_zenodo
# ~/gefetch2r/doc/download_zenodo
# ├── frog_data.RData
# ├── PyMTM_immune_scRNA.rds
# └── zebrafish_data.RData

CELLxGENE

The CELLxGENE is a web server contains 2043 single-cell datasets, users can explore, download and upload own datasets. The downloaded datasets provided by CELLxGENE have two formats: h5ad (AnnData v0.8) and rds (Seurat v4).

CELLxGENE does not support downloading SeuratObject in versions after 2025.Fortunately, we have downloaded all the CELLxGENE datasets in May 2024 and stored in all.cellxgene.datasets.rds. The all.cellxgene.datasets.rds contains the SeuratObject for downloading. However, this does not apply to the case of a given dataset.

CELLxGENE provides an R package (cellxgene.census) to access the data, but sometimes it’s not timely updated. GEfetch2R also supports users to access CELLxGENE via cellxgene.census (use.census = TRUE).

Given dataset

With the collection or dataset link(s), users can download the h5ad objects with ParseCELLxGENE.

The returned value is NULL or a dataframe containing failed objects. If dataframe, users can re-run ParseCELLxGENE by setting meta to the returned value.

cellxgene.given.h5ad <- ParseCELLxGENE(
  link = c(
    "https://cellxgene.cziscience.com/collections/77f9d7e9-5675-49c3-abed-ce02f39eef1b", # collection
    "https://cellxgene.cziscience.com/e/e12eb8a9-5e8b-4b59-90c8-77d29a811c00.cxg/" # dataset
  ),
  timeout = 36000000,
  out.folder = "~/gefetch2r/doc/download_cellxgene"
)
# The structure of downloaded files
# tree ~/gefetch2r/doc/download_cellxgene
# ~/gefetch2r/doc/download_cellxgene
# ├── Human.Immune.Health.Atlas.B.and.Plasma.cells.h5ad
# ├── Human.Immune.Health.Atlas.CD4.T.cells.h5ad
# ├── Human.Immune.Health.Atlas.CD8.T.cells.h5ad
# ├── Human.Immune.Health.Atlas.DCs.h5ad
# ├── Human.Immune.Health.Atlas.h5ad
# ├── Human.Immune.Health.Atlas.Monocytes.h5ad
# ├── Human.Immune.Health.Atlas.NK.cells.and.ILCs.h5ad
# └── Human.Immune.Health.Atlas.Other.cells.h5ad
#
# 0 directories, 8 files

Filter samples based on metadata

Show available datasets

GEfetch2R provides ShowCELLxGENEDatasets to extract dataset metadata, including dataset title, description, contact, organism, ethnicity, sex, tissue, disease, assay, suspension type, cell type, et al.

# all available datasets
all.cellxgene.datasets <- ShowCELLxGENEDatasets()
nrow(all.cellxgene.datasets)
# [1] 2043

Load datasets with SeuratObject:

# the datasets with SeuratObject
# wget https://github.com/showteeth/GEfetch2R/raw/ff2f19f3b557f90fce5f8bf2f8662cebdfd04298/man/benchmark/all.cellxgene.datasets.rds
all.cellxgene.datasets <- readRDS("all.cellxgene.datasets.rds")
nrow(all.cellxgene.datasets)
# [1] 1320

Summary attributes

GEfetch2R provides StatDBAttribute to summary attributes of CELLxGENE:

StatDBAttribute(
  df = all.cellxgene.datasets, filter = c("organism", "sex", "disease"),
  database = "CELLxGENE", combine = TRUE
)
# # A tibble: 280 × 4
# # Groups:   organism, sex [18]
#    organism     sex     disease    Num
#    <chr>        <chr>   <chr>    <int>
#  1 homo sapiens male    normal     687
#  2 homo sapiens female  normal     535
#  3 mus musculus male    normal     178
#  4 mus musculus female  normal     125
#  5 mus musculus unknown normal     100
#  6 homo sapiens unknown normal      79
#  7 homo sapiens female  covid-19    50
#  8 homo sapiens female  dementia    50
#  9 homo sapiens male    dementia    50
# 10 homo sapiens male    covid-19    38
# # ℹ 270 more rows
# # ℹ Use `print(n = ...)` to see more rows

# # use cellxgene.census
# StatDBAttribute(filter = c("disease", "tissue", "cell_type"), database = "CELLxGENE",
#                 use.census = TRUE, organism = "homo_sapiens")

Filter metadata

GEfetch2R provides ExtractCELLxGENEMeta to filter dataset metadata, the available values of attributes can be obtained with StatDBAttribute except cell number:

# human 10x v2 and v3 datasets
human.10x.cellxgene.meta <- ExtractCELLxGENEMeta(
  all.samples.df = all.cellxgene.datasets,
  assay = c("10x 3' v2", "10x 3' v3"), organism = "Homo sapiens"
)
nrow(human.10x.cellxgene.meta)
# [1] 627

# subset
cellxgene.down.meta <- human.10x.cellxgene.meta[human.10x.cellxgene.meta$cell_type == "oligodendrocyte" &
  human.10x.cellxgene.meta$tissue == "entorhinal cortex", ]
nrow(cellxgene.down.meta)
# [1] 1

Download object and load to R

After manually check the extracted metadata, users can download the specified objects with ParseCELLxGENE. The downloaded objects are controlled by file.ext (choose from "rds" and "h5ad") and the provided object formats should be in lower case.

The returned value is a dataframe containing failed objects or a SeuratObject (if file.ext is rds and return.seu = TRUE). If dataframe, users can re-run ParseCELLxGENE by setting meta to the returned value.

When using cellxgene.census, users can subset metadata and gene.

# download objects
cellxgene.down <- ParseCELLxGENE(
  meta = cellxgene.down.meta, file.ext = "rds", timeout = 36000000,
  out.folder = "~/gefetch2r/doc/download_cellxgene"
)
cellxgene.down
# NULL

# retuen SeuratObject
cellxgene.down.seu <- ParseCELLxGENE(
  meta = cellxgene.down.meta, file.ext = "rds", return.seu = TRUE, timeout = 36000000,
  obs.value.filter = "cell_type == 'oligodendrocyte' & disease == 'Alzheimer disease'",
  obs.keys = c("cell_type", "disease", "sex", "suspension_type", "development_stage"),
  out.folder = "~/gefetch2r/doc/download_cellxgene"
)
cellxgene.down.seu
# An object of class Seurat
# 32743 features across 6873 samples within 1 assay
# Active assay: RNA (32743 features, 0 variable features)
#  3 dimensional reductions calculated: cca, cca.aligned, tsne

# # The structure of downloaded files
# tree ~/gefetch2r/doc/download_cellxgene
# ~/gefetch2r/doc/download_cellxgene
# ├── all.cellxgene.datasets.rds
# ├── Human.Immune.Health.Atlas.B.and.Plasma.cells.h5ad
# ├── Human.Immune.Health.Atlas.CD4.T.cells.h5ad
# ├── Human.Immune.Health.Atlas.CD8.T.cells.h5ad
# ├── Human.Immune.Health.Atlas.DCs.h5ad
# ├── Human.Immune.Health.Atlas.h5ad
# ├── Human.Immune.Health.Atlas.Monocytes.h5ad
# ├── Human.Immune.Health.Atlas.NK.cells.and.ILCs.h5ad
# ├── Human.Immune.Health.Atlas.Other.cells.h5ad
# └── Molecular.characterization.of.selectively.vulnerable.neurons.in.Alzheimer.s.Disease..EC.oligodendrocyte.rds

# 0 directories, 10 files

# # use cellxgene.census (support subset, but update is not timely)
# cellxgene.down.census <- ParseCELLxGENE(
#   use.census = TRUE, organism = "Homo sapiens",
#   obs.value.filter = "cell_type == 'B cell' & tissue_general == 'lung' & disease == 'COVID-19'",
#   obs.keys = c("cell_type", "tissue_general", "disease", "sex"),
#   include.genes = c("ENSG00000161798", "ENSG00000188229")
# )

Human Cell Atlas

The Human Cell Atlas aims to map every cell type in the human body, it contains 546 projects, most of which are from Homo sapiens (also includes projects from Mus musculus, Macaca mulatta and canis lupus familiaris).

Given dataset

With the dataset link(s), users can download the processed objects with ParseHCA. The format of downloaded objects are controlled by file.ext (choose from "tsv", "rds", "rdata", "h5", "h5ad" and "loom") and the provided object formats should be in lower case.

The returned value is a dataframe containing failed objects or a SeuratObject (if file.ext is rds and return.seu = TRUE). If dataframe, users can re-run ParseHCA by setting meta to the returned value.

# download objects
hca.given.download <- ParseHCA(
  link = c(
    "https://explore.data.humancellatlas.org/projects/902dc043-7091-445c-9442-d72e163b9879",
    "https://explore.data.humancellatlas.org/projects/cdabcf0b-7602-4abf-9afb-3b410e545703"
  ), timeout = 36000000,
  out.folder = "~/gefetch2r/doc/download_hca"
)
# # The structure of downloaded files
# tree ~/gefetch2r/doc/download_hca
# ~/gefetch2r/doc/download_hca
# ├── COMBAT2022.h5ad
# └── seurat_object_hca_as_harmonized_AS_SP_nuc_refined_cells.rds.gz
#
# 0 directories, 2 files

# retuen SeuratObject
hypertrophic.heart.seu <- ParseHCA(
  link = c(
    "https://explore.data.humancellatlas.org/projects/902dc043-7091-445c-9442-d72e163b9879"
  ), timeout = 36000000, return.seu = TRUE,
  out.folder = "~/gefetch2r/doc/download_hca"
)

Filter samples based on metadata

Show available datasets

GEfetch2R provides ShowHCAProjects to extract detailed project metadata, including project title, description, organism, sex, organ/organPart, disease, assay, preservation method, sample type, suspension type, cell type, development stage, et al.

There are 546 unique projects:

all.hca.projects <- ShowHCAProjects()
nrow(all.hca.projects)
# [1] 546

Summary attributes

GEfetch2R provides StatDBAttribute to summary attributes of Human Cell Atlas:

StatDBAttribute(df = all.hca.projects, filter = c("organism", "sex"), database = "HCA")
# $organism
#                    Value Num      Key
# 1           homo sapiens 520 organism
# 2           mus musculus  58 organism
# 3 canis lupus familiaris   1 organism
# 4         macaca mulatta   1 organism
#
# $sex
#     Value Num Key
# 1  female 405 sex
# 2    male 392 sex
# 3 unknown 164 sex
# 4   mixed   6 sex

Filter metadata

GEfetch2R provides ExtractHCAMeta to filter projects metadata, the available values of attributes can be obtained with StatDBAttribute except cell number:

# human 10x v2 and v3 datasets
hca.human.10x.projects <- ExtractHCAMeta(
  all.projects.df = all.hca.projects, organism = "Homo sapiens",
  protocol = c("10x 3' v2", "10x 3' v3")
)
nrow(hca.human.10x.projects)
# [1] 251

Download object and load to R

After manually check the extracted metadata, users can download the specified objects with ParseHCA. The format of downloaded objects are controlled by file.ext (choose from "tsv", "rds", "rdata", "h5", "h5ad" and "loom") and the provided object formats should be in lower case.

The returned value is a dataframe containing failed objects or a SeuratObject (if file.ext is rds and return.seu = TRUE). If dataframe, users can re-run ParseHCA by setting meta to the returned value.

# download objects
hca.human.10x.down <- ParseHCA(
  meta = hca.human.10x.projects[1:3, ],
  out.folder = "~/gefetch2r/doc/download_hca",
  file.ext = c("h5ad", "rds"), timeout = 36000000
)
# file downloaded
# 4f30b962-d49b-4624-a233-64f048cf8632_b61a921b-7fa3-4b42-b455-aaaf32447920.h5ad

Process RData files

As illustrated before, the downloaded rds file containing SeuratObject will be automatically loaded into R. Here, GEfetch2R provides LoadRData to dissect and extract the RData files.

LoadRData loads the RData file to a separate environment, distinguishes the class of each object available, and processes the objects according to the following logic:

  • if widely used scRNA-seq objects (SeuratObject (Seurat v3/v4/v5 package), seuratobject (Seurat v2 package), SingleCellExperiment (SingleCellExperiment package), cell_data_set (Monocle v3 package), CellDataSet (Monocle package)) or bulk RNA-seq objects (DESeqDataSet (DESeq2 package), DGEList (edgeR package)) exist, LoadRData automatically distinguishes the object class and extracts the raw count matrix, normalized count matrix, scaled count matrix, and metadata:
Object raw count matrix normalized count matrix scaled count matrix metadata notes
SeuratObject (v3, v4) slot counts slot data slot scale.data obj@meta.data support multiple assays (e.g.: RNA, integrated)
SeuratObject (v5) layer counts layer data layer scale.data obj@meta.data support multiple assays (e.g.: RNA, integrated)
seuratobject obj@raw.data obj@data obj@scale.data obj@meta.data
SingleCellExperiment assay counts assay logcounts assay scaledata/scale.data colData(obj) support multiple experiments
cell_data_set assay counts assay logcounts assay scaledata/scale.data colData(obj) support multiple experiments
CellDataSet empty or exprs(obj) empty or exprs(obj) empty or exprs(obj) pData(obj)
DESeqDataSet counts(obj, normalized = FALSE) (obj <- estimateSizeFactors(obj)) counts(obj, normalized = TRUE) empty obj@colData
DGEList obj$counts obj <- calcNormFactors(obj); cpm(obj) empty obj$samples
  • else if non-standard objects (matrix, data.frame, dgCMatrix, dgRMatrix, dgTMatrix) exist:
    • if matrix/data.frame:
      • if the number of columns is greater than or equal to three:
        • if all column values are in the same class and the class is numeric/integer, treat this object as matrix
        • else if starting from the second column, all column values are numeric/integer, treat this object as matrix
        • else if column names contain pattern "sample|name|cell|id|library|well|barcode|index|type|condition|treat|group", treat this object as metadata/annotation
        • else display the structure of the object and load the object to R
      • else if column names contain pattern "sample|name|cell|id|library|well|barcode|index|type|condition|treat|group", treat this object as metadata/annotation
      • else display the structure of the object and load the object to R
    • else if dgCMatrix/dgRMatrix/dgTMatrix, treat this object as matrix
    • else print the first six (or less) elements of the objects

To fully demonstrate the usability of the LoadRData function and its cross-repository support, we performed tests on RData files downloaded from GEO (primary, selected some representative RData files), Zenodo, Human Cell Atlas, and a simulated RData file containing non-standard objects.

GEO examples

GSE244572: SeuratObject with multiple assays

# download the RData file
ParseGEOProcessed(acce = "GSE244572", timeout = 360000, supp.idx = 1, file.ext = c("rdata", "rds", "h5ad", "loom"))

# process the RData file
GSE244572.list <- LoadRData(
  rdata = "GSE244572/GSE244572_RPE_CITESeq.RData",
  accept.fmt = c("Seurat", "seurat", "SingleCellExperiment", "cell_data_set", "CellDataSet", "DESeqDataSet", "DGEList"),
  slot = "counts", return.obj = TRUE
)
# # message:
# The object classes stored in RData: Seurat.
#      Class
# obj Seurat
# Detect 1 object(s) in given class(s): Seurat, seurat, SingleCellExperiment, cell_data_set, CellDataSet, DESeqDataSet, DGEList.
# Load object: obj (Seurat) to global environment!
# Extract count matrix and metadata (if available) from: obj (Seurat).
# Detect Seurat version: 4.0.4, with assay(s): RNA, ADT, nADT, SCT, integrated, IADT.

# returned value(s)
ls() # list object(s) in the global environment
# [1] "GSE244572.list" "obj"
names(GSE244572.list) # one valid object
# [1] "obj"
head(GSE244572.list$obj$meta.data)
#                  orig.ident nCount_RNA nFeature_RNA sample percent.mt nCount_ADT nFeature_ADT donor time   rpe nCount_nADT nFeature_nADT nCount_SCT nFeature_SCT integrated_snn_res.0.5 seurat_clusters
# CCTAGATTAAT_1 SeuratProject     487751         3922 318_2W   13.34062     387973          121   318   2W adult          22            14     214660         3175                      5               0
# GTTGAGCTGCG_1 SeuratProject     124013         1477 318_2W   13.27925     312671          110   318   2W adult          21            14     215337         1475                      2               4
# GTTGATCTTCT_1 SeuratProject      95909         1344 318_2W   15.60333      44974          100   318   2W adult          35            27     214940         1359                      0               2
# TAACCTTGAGT_1 SeuratProject     328530         2361 319_2W   17.52930     279285          128   319   2W adult          23            18     215249         2349                      2               0
# CCTCCGCCTGC_1 SeuratProject     702493         4766 319_2W   22.13887    1579539          143   319   2W adult          32            17     215407         4027                      1               6
# CCATATACGAC_1 SeuratProject     497485         3694 319_2W   14.13791     873609          117   319   2W adult          23            12     214400         2913                      1               4
#               integrated.weight wsnn_res.0.5 wsnn_res.0.2
# CCTAGATTAAT_1         0.7954453            3            0
# GTTGAGCTGCG_1         0.6762115            5            4
# GTTGATCTTCT_1         0.9341439            0            2
# TAACCTTGAGT_1         0.7959367            3            0
# CCTCCGCCTGC_1         0.5048201           10            6
# CCATATACGAC_1         0.7556721            5            4
names(GSE244572.list$obj$count.mat) # list of six assay(s)
# [1] "RNA"        "ADT"        "nADT"       "SCT"        "integrated" "IADT"
names(GSE244572.list$obj$count.mat$RNA) # list of slot(s)
# [1] "counts"
GSE244572.list$obj$count.mat$RNA$counts[1:5, 1:5] # count matrix
# 5 x 5 sparse Matrix of class "dgCMatrix"
#            CCTAGATTAAT_1 GTTGAGCTGCG_1 GTTGATCTTCT_1 TAACCTTGAGT_1 CCTCCGCCTGC_1
# WASH7P                 .             .             .             .             .
# CICP27                 .             .             .             .             .
# AL627309.6             .             .             .             .             .
# AL627309.5             .             .             .             .             .
# FO538757.1             .             .             .             .             .

Key parameters:

  • accept.fmt: vector, the class of objects for loading.
  • slot: vector, the type of count matrix to pull. counts: raw, un-normalized counts, data: normalized data, scale.data: z-scored/variance-stabilized data.
  • return.obj: logical value, whether to load the available objects in accept.fmt to global environment.

GSE249307: multiple SeuratObject objects

# download the RData file
ParseGEOProcessed(acce = "GSE249307", timeout = 360000, supp.idx = 1, file.ext = c("rdata", "rds", "h5ad", "loom"))

# process the RData file
GSE249307.list <- LoadRData(
  rdata = "GSE249307/GSE249307_scRNA_seurat_data.RData",
  accept.fmt = c("Seurat", "seurat", "SingleCellExperiment", "cell_data_set", "CellDataSet", "DESeqDataSet", "DGEList"),
  slot = "counts"
)
# # message:
# The object classes stored in RData: Seurat.
#                          Class
# processed_seurat_object Seurat
# raw_seurat_object       Seurat
# Detect 2 object(s) in given class(s): Seurat, seurat, SingleCellExperiment, cell_data_set, CellDataSet, DESeqDataSet, DGEList.
# Load object: processed_seurat_object (Seurat) to global environment!
# Extract count matrix and metadata (if available) from: processed_seurat_object (Seurat).
# Detect Seurat version: 3.0.0.9000, with assay(s): RNA.
# Load object: raw_seurat_object (Seurat) to global environment!
# Extract count matrix and metadata (if available) from: raw_seurat_object (Seurat).
# Detect Seurat version: 3.0.0.9000, with assay(s): RNA.

# returned value(s)
ls() # list object(s) in the global environment
# [1] "GSE249307.list"          "processed_seurat_object" "raw_seurat_object"
names(GSE249307.list) # two valid objects
# [1] "processed_seurat_object" "raw_seurat_object"
head(GSE249307.list$raw_seurat_object$meta.data)
#                               orig.ident nCount_RNA nFeature_RNA sample species umi_count_50dup rnd1_well rnd2_well rnd3_well valid library        cell_barcode cells_unique treatment1  treatment2 cytokine_timepoint
# lib1_AGATCGCACATCAAGT_42 lib1_2_analysis     356571        13917 samp43    hg38        689652.3        42         7        24     1    lib1 AGATCGCACATCAAGT_42       unique  IL17+TNFa  Virus_6hpi               late
# lib1_ACTATGCAATAGCGAC_22 lib1_2_analysis     198322        11831 samp23    hg38        383579.2        22        79        22     1    lib1 ACTATGCAATAGCGAC_22       unique    Vehicle  Virus_6hpi           baseline
# lib1_CGACTGGAGACAGTGC_23 lib1_2_analysis     187640        11865 samp24    hg38        362918.9        23        92        40     1    lib1 CGACTGGAGACAGTGC_23       unique       IL13  Virus_6hpi              early
# lib1_AACGCTTAGTACGCAA_24 lib1_2_analysis     177905        11530 samp25    hg38        344090.2        24        55        14     1    lib1 AACGCTTAGTACGCAA_24       unique  IL17+TNFa  Virus_6hpi              early
# lib1_ACACGACCAGTCACTA_11 lib1_2_analysis     156939        10976 samp12    hg38        303539.4        11        26        73     1    lib1 ACACGACCAGTCACTA_11       unique    Vehicle Virus_72hpi           baseline
# lib1_TCTTCACAATAGCGAC_42 lib1_2_analysis     142185        11049 samp43    hg38        275003.3        42        79        62     1    lib1 TCTTCACAATAGCGAC_42       unique  IL17+TNFa  Virus_6hpi               late
#                          donor
# lib1_AGATCGCACATCAAGT_42     C
# lib1_ACTATGCAATAGCGAC_22     B
# lib1_CGACTGGAGACAGTGC_23     B
# lib1_AACGCTTAGTACGCAA_24     B
# lib1_ACACGACCAGTCACTA_11     A
# lib1_TCTTCACAATAGCGAC_42     C
names(GSE249307.list$raw_seurat_object$count.mat) # list of available assay(s)
# [1] "RNA"
names(GSE249307.list$raw_seurat_object$count.mat$RNA) # list of slot(s)
# [1] "counts"
GSE249307.list$raw_seurat_object$count.mat$RNA$counts[1:5, 1:5] # count matrix
# 5 x 5 sparse Matrix of class "dgCMatrix"
#          lib1_AGATCGCACATCAAGT_42 lib1_ACTATGCAATAGCGAC_22 lib1_CGACTGGAGACAGTGC_23 lib1_AACGCTTAGTACGCAA_24 lib1_ACACGACCAGTCACTA_11
# A1BG                            .                        .                        .                        .                        .
# A1BG-AS1                        1                        .                        1                        .                        .
# A1CF                            .                        .                        .                        .                        .
# A2M                             .                        .                        .                        .                        .
# A2ML1                           .                        .                        .                        .                        .

GSE282783: SeuratObject and non-standard objects

# download the RData file
ParseGEOProcessed(acce = "GSE282783", timeout = 360000, supp.idx = 1, file.ext = c("rdata", "rds", "h5ad", "loom"))

# process the RData file
GSE282783.list <- LoadRData(
  rdata = "GSE282783/GSE282783_E16_FT_E17_hGFAP-Cre_mek12dcko.Rdata",
  accept.fmt = c("Seurat", "seurat", "SingleCellExperiment", "cell_data_set", "CellDataSet", "DESeqDataSet", "DGEList"),
  slot = "counts"
)
# # message:
# The object classes stored in RData: character, data.frame, IntegrationAnchorSet, Seurat, function.
#                                Class
# all.genes                  character
# Cluster                   data.frame
# g2m.genes                 data.frame
# immune.anchors  IntegrationAnchorSet
# immune.combined               Seurat
# Mapkdcko_CTR                  Seurat
# Mapkdcko_EXP                  Seurat
# MarkerPlot                  function
# s.genes                   data.frame
# Detect 3 object(s) in given class(s): Seurat, seurat, SingleCellExperiment, cell_data_set, CellDataSet, DESeqDataSet, DGEList.
# Load object: immune.combined (Seurat) to global environment!
# Extract count matrix and metadata (if available) from: immune.combined (Seurat).
# Detect Seurat version: 4.0.4, with assay(s): RNA, integrated.
# Load object: Mapkdcko_CTR (Seurat) to global environment!
# Extract count matrix and metadata (if available) from: Mapkdcko_CTR (Seurat).
# Detect Seurat version: 4.0.4, with assay(s): RNA.
# Load object: Mapkdcko_EXP (Seurat) to global environment!
# Extract count matrix and metadata (if available) from: Mapkdcko_EXP (Seurat).
# Detect Seurat version: 4.0.4, with assay(s): RNA.

# returned value(s)
ls() # list object(s) in the global environment
# [1] "GSE282783.list"  "immune.combined" "Mapkdcko_CTR"    "Mapkdcko_EXP"
names(GSE282783.list) # three valid objects
# [1] "immune.combined" "Mapkdcko_CTR"    "Mapkdcko_EXP"
head(GSE282783.list$immune.combined$meta.data)
#                        orig.ident nCount_RNA nFeature_RNA       Sample      S.Score   G2M.Score Phase    old.ident integrated_snn_res.0.6 seurat_clusters
# AAACCCAAGCACGTCC-1_1 Mapkdcko_CTR       3584         1586 Mapkdcko_CTR -0.010667752 -0.10155295    G1 Mapkdcko_CTR                      0               0
# AAACCCAAGTGCAGGT-1_1 Mapkdcko_CTR      11187         3232 Mapkdcko_CTR  0.544009405  0.92782765   G2M Mapkdcko_CTR                      9               9
# AAACCCACAACGTAAA-1_1 Mapkdcko_CTR       6498         2217 Mapkdcko_CTR -0.138904079 -0.09983576    G1 Mapkdcko_CTR                      2               2
# AAACCCACAACGTTAC-1_1 Mapkdcko_CTR       4531         1873 Mapkdcko_CTR -0.008647976 -0.16412585    G1 Mapkdcko_CTR                      0               0
# AAACCCACACTGCGAC-1_1 Mapkdcko_CTR       6588         2568 Mapkdcko_CTR -0.175856885 -0.19708486    G1 Mapkdcko_CTR                      5               5
# AAACCCAGTACGATCT-1_1 Mapkdcko_CTR       3790         1616 Mapkdcko_CTR -0.101067444 -0.06548076    G1 Mapkdcko_CTR                      0               0
names(GSE282783.list$immune.combined$count.mat) # list of available assay(s)
# [1] "RNA"        "integrated"
names(GSE282783.list$immune.combined$count.mat$RNA) # list of slot(s)
# [1] "counts"
GSE282783.list$immune.combined$count.mat$RNA$counts[1:5, 1:5] # count matrix
# 5 x 5 sparse Matrix of class "dgCMatrix"
#         AAACCCAAGCACGTCC-1_1 AAACCCAAGTGCAGGT-1_1 AAACCCACAACGTAAA-1_1 AAACCCACAACGTTAC-1_1 AAACCCACACTGCGAC-1_1
# Xkr4                       .                    .                    .                    .                    .
# Gm1992                     .                    .                    .                    .                    .
# Gm37381                    .                    .                    .                    .                    .
# Rp1                        .                    .                    .                    .                    .
# Rp1.1                      .                    .                    .                    .                    .

Zenodo examples

Download the RData files:

multi.dois.parse <- ParseZenodo(
  doi = c("1111", "10.5281/zenodo.7243603", "10.5281/zenodo.7244441"),
  file.ext = c("rdata"), timeout = 36000000,
  out.folder = "~/gefetch2r/doc/download_zenodo"
)

# # The structure of downloaded files
# tree ~/gefetch2r/doc/download_zenodo
# ~/gefetch2r/doc/download_zenodo
# ├── frog_data.RData
# └── zebrafish_data.RData

Dissect and extract the RData file (frog_data.RData):

zenodo.frog.list <- LoadRData(
  rdata = "~/gefetch2r/doc/download_zenodo/frog_data.RData",
  accept.fmt = c("Seurat", "seurat", "SingleCellExperiment", "cell_data_set", "CellDataSet", "DESeqDataSet", "DGEList"),
  return.obj = FALSE
)
# # message:
# The object classes stored in RData: Seurat.
#            Class
# frog_data Seurat
# Detect 1 object(s) in given class(s): Seurat, seurat, SingleCellExperiment, cell_data_set, CellDataSet, DESeqDataSet, DGEList.
# Extract count matrix and metadata (if available) from: frog_data (Seurat).
# Detect Seurat version: 4.0.4, with assay(s): RNA.

# returned value(s)
ls() # list object(s) in the global environment
# [1] "zenodo.frog.list"
names(zenodo.frog.list) # one valid object
# [1] "frog_data"
head(zenodo.frog.list$frog_data$meta.data)
#           orig.ident nCount_RNA nFeature_RNA sample stage    group  cell_state cell_type
# S8_cell_1       cell      20658         5506 cell_1    S8 Clutch_1 S8:blastula  blastula
# S8_cell_2       cell      17002         5209 cell_2    S8 Clutch_1 S8:blastula  blastula
# S8_cell_3       cell      16190         4880 cell_3    S8 Clutch_1 S8:blastula  blastula
# S8_cell_4       cell      15652         4930 cell_4    S8 Clutch_1 S8:blastula  blastula
# S8_cell_5       cell      14325         4598 cell_5    S8 Clutch_1 S8:blastula  blastula
# S8_cell_6       cell      12658         4242 cell_6    S8 Clutch_1 S8:blastula  blastula
names(zenodo.frog.list$frog_data$count.mat) # list of available assay(s)
# [1] "RNA"
names(zenodo.frog.list$frog_data$count.mat$RNA) # list of slot(s)
# [1] "counts"     "data"       "scale.data"
zenodo.frog.list$frog_data$count.mat$RNA$counts[1:5, 1:5] # count matrix
# 5 x 5 sparse Matrix of class "dgCMatrix"
#                      S8_cell_1 S8_cell_2 S8_cell_3 S8_cell_4 S8_cell_5
# 42Sp43                       4         1         1         8         3
# 42Sp50                       1         .         7         .         1
# 6330408a02rik-like.1         .         .         .         .         .
# 6330408a02rik-like.2         .         .         .         .         .
# AK6                          2         .         .         .         1

Dissect and extract the RData file (zebrafish_data.RData):

zenodo.zebrafish.list <- LoadRData(
  rdata = "~/gefetch2r/doc/download_zenodo/zebrafish_data.RData",
  accept.fmt = c("Seurat", "seurat", "SingleCellExperiment", "cell_data_set", "CellDataSet", "DESeqDataSet", "DGEList"),
  return.obj = FALSE
)
# # message:
# The object classes stored in RData: Seurat.
#                 Class
# zebrafish_data Seurat
# Detect 1 object(s) in given class(s): Seurat, seurat, SingleCellExperiment, cell_data_set, CellDataSet, DESeqDataSet, DGEList.
# Extract count matrix and metadata (if available) from: zebrafish_data (Seurat).
# Detect Seurat version: 4.0.4, with assay(s): RNA.

# returned value(s)
ls() # list object(s) in the global environment
# [1] "zenodo.zebrafish.list"
names(zenodo.zebrafish.list) # one valid object
# [1] "zebrafish_data"
head(zenodo.zebrafish.list$zebrafish_data$meta.data)
#                                   orig.ident nCount_RNA nFeature_RNA                     sample  stage group        cell_state  cell_type
# hpf3.3_ZFHIGH_WT_DS5_AAAAGTTGCCTC     ZFHIGH       5773         2570 ZFHIGH_WT_DS5_AAAAGTTGCCTC hpf3.3 F_3.3 hpf3.3:blastomere blastomere
# hpf3.3_ZFHIGH_WT_DS5_AAACAAGTGTAT     ZFHIGH       2312         1451 ZFHIGH_WT_DS5_AAACAAGTGTAT hpf3.3 F_3.3 hpf3.3:blastomere blastomere
# hpf3.3_ZFHIGH_WT_DS5_AAACACCTCGTC     ZFHIGH       4180         2166 ZFHIGH_WT_DS5_AAACACCTCGTC hpf3.3 F_3.3 hpf3.3:blastomere blastomere
# hpf3.3_ZFHIGH_WT_DS5_AAATGAGGTTTN     ZFHIGH       6686         2845 ZFHIGH_WT_DS5_AAATGAGGTTTN hpf3.3 F_3.3 hpf3.3:blastomere blastomere
# hpf3.3_ZFHIGH_WT_DS5_AACCCTCTCGAT     ZFHIGH      20095         4993 ZFHIGH_WT_DS5_AACCCTCTCGAT hpf3.3 F_3.3 hpf3.3:blastomere blastomere
# hpf3.3_ZFHIGH_WT_DS5_AACGAAAGGTAA     ZFHIGH       1443         1019 ZFHIGH_WT_DS5_AACGAAAGGTAA hpf3.3 F_3.3 hpf3.3:blastomere blastomere
names(zenodo.zebrafish.list$zebrafish_data$count.mat) # list of available assay(s)
# [1] "RNA"
names(zenodo.zebrafish.list$zebrafish_data$count.mat$RNA) # list of slot(s)
# [1] "counts"     "data"       "scale.data"
zenodo.zebrafish.list$zebrafish_data$count.mat$RNA$counts[1:5, 1:5] # count matrix
# 5 x 5 sparse Matrix of class "dgCMatrix"
#                    hpf3.3_ZFHIGH_WT_DS5_AAAAGTTGCCTC hpf3.3_ZFHIGH_WT_DS5_AAACAAGTGTAT hpf3.3_ZFHIGH_WT_DS5_AAACACCTCGTC hpf3.3_ZFHIGH_WT_DS5_AAATGAGGTTTN hpf3.3_ZFHIGH_WT_DS5_AACCCTCTCGAT
# ENSDARG00000002968                                 .                                 .                                 .                                 .                                 .
# ENSDARG00000056314                                 .                                 .                                 .                                 .                                 .
# ENSDARG00000102274                                 .                                 .                                 .                                 .                                 .
# ENSDARG00000012468                                 .                                 .                                 .                                 .                                 .
# ENSDARG00000063621                                 .                                 .                                 .                                 .                                 .

Human Cell Atlas

Download the RData files:

hca.given.download <- ParseHCA(
  link = c(
    "https://explore.data.humancellatlas.org/projects/c302fe54-d22d-451f-a130-e24df3d6afca",
    "https://explore.data.humancellatlas.org/projects/34c9a62c-a610-4e31-b343-8fb7be676f8c"
  ), timeout = 360000000000000, file.ext = "rdata", parallel = F,
  out.folder = "./RData"
)

# # The structure of downloaded files
# tree RData/
# RData/
# ├── GSE130560_matrix.RData.gz
# └── GSE134174_Processed_invivo_seurat.Rdata.gz
#
# 0 directories, 2 files

Dissect and extract the RData file (GSE130560_matrix.RData.gz):

hca.GSE130560.list <- LoadRData(
  rdata = "RData/GSE130560_matrix.RData.gz",
  accept.fmt = c("Seurat", "seurat", "SingleCellExperiment", "cell_data_set", "CellDataSet", "DESeqDataSet", "DGEList"),
  return.obj = FALSE
)
# # message:
# Detect RData file in compressed format, decompressing now!
# The object classes stored in RData: dgCMatrix.
#            Class
# matrix dgCMatrix
# No valid object in given class(s): Seurat, seurat, SingleCellExperiment, cell_data_set, CellDataSet, DESeqDataSet, DGEList. Now we will guess the type!
# The slot parameter does not work here!
# matrix is a sparse matrix. Most likely a count matrix!

# returned value(s)
ls() # list object(s) in the global environment
# [1] "hca.GSE130560.list"
names(hca.GSE130560.list) # two elements: count matrix and metadata
# [1] "count" "meta"
names(hca.GSE130560.list$meta) # list of available metadata (no metadata)
# NULL
names(hca.GSE130560.list$count) # list of available count matrices
# [1] "matrix"
hca.GSE130560.list$count$matrix[1:5, 1:5] # count matrix
# 5 x 5 sparse Matrix of class "dgCMatrix"
#               AAACCTGGTCTAACGT_1 AACACGTGTATATGAG_1 AACTGGTAGTTAGGTA_1 AACTTTCTCATCGCTC_1 AAGACCTAGCTAGCCC_1
# FO538757.2                     .                  .                  .                  .                  .
# AP006222.2                     1                  .                  .                  1                  .
# RP11-206L10.9                  .                  .                  .                  .                  .
# LINC00115                      .                  .                  .                  .                  .
# FAM41C                         .                  1                  .                  .                  .

Dissect and extract the RData file (GSE134174_Processed_invivo_seurat.Rdata.gz):

hca.GSE134174.list <- LoadRData(
  rdata = "RData/GSE134174_Processed_invivo_seurat.Rdata.gz",
  accept.fmt = c("Seurat", "seurat", "SingleCellExperiment", "cell_data_set", "CellDataSet", "DESeqDataSet", "DGEList"),
  return.obj = FALSE
)
# # message:
# Detect RData file in compressed format, decompressing now!
# The object classes stored in RData: Seurat.
#          Class
# T15_int Seurat
# Detect 1 object(s) in given class(s): Seurat, seurat, SingleCellExperiment, cell_data_set, CellDataSet, DESeqDataSet, DGEList.
# Extract count matrix and metadata (if available) from: T15_int (Seurat).
# Loading required package: Seurat
# Attaching SeuratObject
# Detect Seurat version: 3.0.3.9015, with assay(s): RNA, SCT, integrated.

# returned value(s)
ls() # list object(s) in the global environment
# [1] "hca.GSE134174.list"
names(hca.GSE134174.list) # one valid object
# [1] "T15_int"
head(hca.GSE134174.list$T15_int$meta.data)
#                    orig.ident nCount_RNA nFeature_RNA     propMT donor smoke smoke_noT89 Smoke_status pack_years age
# AAACCCAAGGCGACAT_1        T15      13929         4785 0.13461624  T101 heavy       heavy        heavy         25  55
# AAACCCAGTACTCGAT_1        T15       8738         3489 0.14035088  T101 heavy       heavy        heavy         25  55
# AAACCCAGTATGTGTC_1        T15       3108         1671 0.01377682  T101 heavy       heavy        heavy         25  55
# AAACCCAGTTAGCGGA_1        T15      30747         6538 0.14076632  T101 heavy       heavy        heavy         25  55
# AAACCCAGTTGCCGAC_1        T15      55390         8317 0.14026411  T101 heavy       heavy        heavy         25  55
# AAACCCATCTTTCTAG_1        T15      27888         5982 0.13987778  T101 heavy       heavy        heavy         25  55
#                    sex clusters_10         cluster_ident clusters10_smoke clusters_16a      subcluster_ident
# AAACCCAAGGCGACAT_1   M          c2 Differentiating.basal         c2_heavy           c2 Differentiating.basal
# AAACCCAGTACTCGAT_1   M          c3             SMG.basal         c3_heavy          c3a           SMG.basal.A
# AAACCCAGTATGTGTC_1   M          c4             KRT8.high         c4_heavy           c4             KRT8.high
# AAACCCAGTTAGCGGA_1   M          c4             KRT8.high         c4_heavy           c4             KRT8.high
# AAACCCAGTTGCCGAC_1   M          c1   Proliferating.basal         c1_heavy           c1   Proliferating.basal
# AAACCCATCTTTCTAG_1   M          c4             KRT8.high         c4_heavy           c4             KRT8.high
#                    clusters16a_smoke
# AAACCCAAGGCGACAT_1          c2_heavy
# AAACCCAGTACTCGAT_1         c3a_heavy
# AAACCCAGTATGTGTC_1          c4_heavy
# AAACCCAGTTAGCGGA_1          c4_heavy
# AAACCCAGTTGCCGAC_1          c1_heavy
# AAACCCATCTTTCTAG_1          c4_heavy
names(hca.GSE134174.list$T15_int$count.mat) # list of available assay(s)
# [1] "RNA"        "SCT"        "integrated"
names(hca.GSE134174.list$T15_int$count.mat$RNA) # list of slot(s)
# [1] "counts"     "data"       "scale.data"
hca.GSE134174.list$T15_int$count.mat$RNA$counts[1:5, 1:5] # count matrix
# 5 x 5 sparse Matrix of class "dgCMatrix"
#            AAACCCAAGGCGACAT_1 AAACCCAGTACTCGAT_1 AAACCCAGTATGTGTC_1 AAACCCAGTTAGCGGA_1 AAACCCAGTTGCCGAC_1
# AL627309.1                  .                  .                  .                  .                  .
# AL669831.5                  .                  .                  .                  2                  .
# LINC00115                   .                  .                  .                  .                  .
# FAM41C                      .                  .                  .                  .                  .
# AL645608.3                  .                  .                  .                  .                  .

Simulated non-standard objects

Generate RData file containing a mixture of non-standard objects:

# dgCMatrix
sparse.mat <- SeuratObject::GetAssayData(SeuratObject::pbmc_small, assay = "RNA", slot = "counts")
# count matrix and metadata from GSE297431
GSE297431.meta.supp <- ExtractGEOMeta(acce = "GSE297431", down.supp = TRUE, supp.idx = 2)
GSE297431.cnt <- ParseGEO(acce = "GSE297431", down.supp = TRUE, supp.idx = 1, supp.type = "count", load2R = F)
# move rownames to dataframe (first column)
GSE297431.cnt.row2col <- GSE297431.cnt %>%
  tibble::rownames_to_column(var = "Gene") %>%
  dplyr::relocate()
# dataframe to matrix
GSE297431.cnt.mat <- GSE297431.cnt %>% as.matrix()
# list (noise)
cc.genes <- Seurat::cc.genes
# dataframe (noise)
s.genes <- data.frame(gene = cc.genes$s.genes)
# vector (noise)
g2m.genes <- cc.genes$g2m.genes
# save
save(sparse.mat, GSE297431.meta.supp, GSE297431.cnt, GSE297431.cnt.row2col, GSE297431.cnt.mat, cc.genes, s.genes, g2m.genes,
  file = "simulated_non_standard_objects.RData"
)

Dissect and extract the generated RData file:

# process the object
non.standard.list <- LoadRData(
  rdata = "simulated_non_standard_objects.RData",
  accept.fmt = c("Seurat", "seurat", "SingleCellExperiment", "cell_data_set", "CellDataSet", "DESeqDataSet", "DGEList"),
  return.obj = FALSE
)
# # message:
# The object classes stored in RData: list, character, data.frame, matrix, array, dgCMatrix.
#                               Class
# cc.genes                       list
# g2m.genes                 character
# GSE297431.cnt            data.frame
# GSE297431.cnt.mat     matrix, array
# GSE297431.cnt.row2col    data.frame
# GSE297431.meta.supp      data.frame
# s.genes                  data.frame
# sparse.mat                dgCMatrix
# No valid object in given class(s): Seurat, seurat, SingleCellExperiment, cell_data_set, CellDataSet, DESeqDataSet, DGEList. Now we will guess the type!
# The slot parameter does not work here!
# cc.genes is list.
# $s.genes
#  [1] "MCM5"     "PCNA"     "TYMS"     "FEN1"     "MCM2"     "MCM4"     "RRM1"     "UNG"      "GINS2"    "MCM6"     "CDCA7"    "DTL"      "PRIM1"    "UHRF1"    "MLF1IP"   "HELLS"    "RFC2"     "RPA2"     "NASP"
# [20] "RAD51AP1" "GMNN"     "WDR76"    "SLBP"     "CCNE2"    "UBR7"     "POLD3"    "MSH2"     "ATAD2"    "RAD51"    "RRM2"     "CDC45"    "CDC6"     "EXO1"     "TIPIN"    "DSCC1"    "BLM"      "CASP8AP2" "USP1"
# [39] "CLSPN"    "POLA1"    "CHAF1B"   "BRIP1"    "E2F8"
#
# $g2m.genes
#  [1] "HMGB2"   "CDK1"    "NUSAP1"  "UBE2C"   "BIRC5"   "TPX2"    "TOP2A"   "NDC80"   "CKS2"    "NUF2"    "CKS1B"   "MKI67"   "TMPO"    "CENPF"   "TACC3"   "FAM64A"  "SMC4"    "CCNB2"   "CKAP2L"  "CKAP2"   "AURKB"
# [22] "BUB1"    "KIF11"   "ANP32E"  "TUBB4B"  "GTSE1"   "KIF20B"  "HJURP"   "CDCA3"   "HN1"     "CDC20"   "TTK"     "CDC25C"  "KIF2C"   "RANGAP1" "NCAPD2"  "DLGAP5"  "CDCA2"   "CDCA8"   "ECT2"    "KIF23"   "HMMR"
# [43] "AURKA"   "PSRC1"   "ANLN"    "LBR"     "CKAP5"   "CENPE"   "CTCF"    "NEK2"    "G2E3"    "GAS2L3"  "CBX5"    "CENPA"
#
# g2m.genes is character.
# [1] "HMGB2"  "CDK1"   "NUSAP1" "UBE2C"  "BIRC5"  "TPX2"
# GSE297431.cnt has 107 columns and each column is numerical! Most likely a count matrix!
# GSE297431.cnt.mat has 107 columns and each column is numerical! Most likely a count matrix!
# GSE297431.cnt.row2col has 108 columns, all of which are numerical except for the first column! Maybe a count matrix!
# Detect possible sample metadata keys: Sample_ID, Type in GSE297431.meta.supp. Maybe metadata/annotation!
# Can not determine if s.genes is metadata/annotation. Load to the global environment, please manually check!
# 'data.frame': 43 obs. of  1 variable:
#  $ gene: chr  "MCM5" "PCNA" "TYMS" "FEN1" ...
# NULL
# sparse.mat is a sparse matrix. Most likely a count matrix!

# returned value(s)
ls() # list object(s) in the global environment
# [1] "non.standard.list" "s.genes"
names(non.standard.list) # two elements: count matrix and metadata
# [1] "count" "meta"
names(non.standard.list$meta) # list of available metadata
# [1] "GSE297431.meta.supp"
head(non.standard.list$meta$GSE297431.meta.supp)
#             Sample_ID  Batch  Plate   Type Growths Class
# 1   Plate1_mut_A1_S70 batch2 Plate1 mutant       3    M3
# 2 Plate1_mut_A11_S135 batch2 Plate1 mutant       3    M3
# 3 Plate1_mut_A12_S141 batch2 Plate1 mutant       1    M1
# 4   Plate1_mut_A2_S77 batch2 Plate1 mutant       1    M1
# 5   Plate1_mut_A3_S84 batch2 Plate1 mutant       2    M2
# 6   Plate1_mut_A4_S91 batch2 Plate1 mutant       2    M2
names(non.standard.list$count) # list of available count matrices
# [1] "GSE297431.cnt"         "GSE297431.cnt.mat"     "GSE297431.cnt.row2col" "sparse.mat"
non.standard.list$count$GSE297431.cnt[1:5, 1:5] # count matrix
#       Plate1_mut_A1_S70 Plate1_mut_A11_S135 Plate1_mut_A12_S141 Plate1_mut_A2_S77 Plate1_mut_A3_S84
# Gnai3               382                 201                 279               261                 8
# Pbsn                  0                   0                   0                 0                 0
# Cdc45               117                  86                  56               230                 7
# Scml2               268                 116                 204               105                31
# Apoh                  0                   0                   0                 0                 0