Skip to contents

Download Matrix from GEO and Load to Seurat/DESeq2.

Usage

ParseGEO(
  acce,
  platform = NULL,
  down.supp = FALSE,
  supp.idx = 1,
  timeout = 3600,
  data.type = c("sc", "bulk"),
  supp.type = c("count", "10x", "10xSingle"),
  file.regex = NULL,
  extra.cols = c("chr", "start", "end", "strand", "length", "width", "chromosome",
    "seqnames", "seqname", "chrom", "chromosome_name", "seqid", "stop"),
  transpose = TRUE,
  out.folder = NULL,
  accept.fmt = c("MEX", "h5"),
  gene2feature = TRUE,
  load2R = TRUE,
  merge = TRUE,
  meta.data = NULL,
  fmu = NULL,
  ...
)

Arguments

acce

GEO accession number.

platform

Platform information/field. Disable when down.supp is TRUE. Default: NULL (disable).

down.supp

Logical value, whether to download supplementary files to create count matrix. If TRUE, always download supplementary files. If FALSE, use ExpressionSet (If contains non-integer or empty, download supplementary files automatically). Default: FALSE.

supp.idx

The index of supplementary files to download. This should be consistent with platform. Used when supp.type is 10x/count. Default: 1.

timeout

Timeout for download.file. Default: 3600.

data.type

The data type of the dataset, choose from "sc" (single-cell) and "bulk" (bulk). Default: "sc".

supp.type

The type of downloaded supplementary files, choose from count (count matrix file or single count matrix file), 10x (cellranger output files in tar/gz supplementary files, contains barcodes, genes/features and matrix, e.g. GSE200257) and 10xSingle (cellranger output files in supplementary files directly, e.g. GSE236082). Default: count.

file.regex

The regex to extract correct count matrix files. Used when supp.type is count. Default: NULL (when multiple file extensions are available in the downloaded tar file, use the first file extension).

extra.cols

Extra columns to remove, e.g., "Chr", "Start", "End", "Strand", "Length" (featureCounts). Used when supp.type is count. Default: "chr", "start", "end", "strand", "length", "width", "chromosome", "seqnames", "seqname", "chrom", "chromosome_name", "seqid", "stop".

transpose

Logical value, whether to transpose the matrix. Used when the number of rows is less than the number of columns. Used when supp.type is count. Default: TRUE.

out.folder

Output folder to save 10x files. Used when supp.type is 10x/10xSingle. Default: NULL (current working directory).

accept.fmt

Vector of accepted 10x output format, MEX (barcode/feature/gene/matrix), h5 (h5/h5.gz). Used when supp.type is 10x/10xSingle. Default: c("MEX", "h5").

gene2feature

Logical value, whether to rename genes.tsv.gz to features.tsv.gz. Used when supp.type is 10x/10xSingle. Default: TRUE.

load2R

Logical value, whether to load the count matrix to R. Default: TRUE.

merge

Logical value, whether to merge Seurat list when there are multiple 10x files (supp.type is 10x). Used when supp.type is 10x/10xSingle. Default: FALSE.

meta.data

Dataframe contains sample information for DESeqDataSet. Used when data.type is bulk. Default: NULL.

fmu

Column of meta.data contains group information. Used when data.type is bulk. Default: NULL.

...

Parameters for getGEO. Used when down.supp is FALSE.

Value

If load2R is FALSE, return count matrix. If data.type is "sc", return SeuratObject (if merge is TRUE), SeuratObject list (if merge is FALSE), NULL (no SeuratObject detected). If data.type is "bulk", return DESeqDataSet.

Examples

if (FALSE) { # \dontrun{
# the supp files are count matrix
GSE94820.seu <- ParseGEO(acce = "GSE94820", down.supp = TRUE, supp.idx = 1, supp.type = "count")
# the supp files are cellranger output files: barcodes, genes/features and matrix
# need users to provide the output folder
GSE200257.seu <- ParseGEO(
  acce = "GSE200257", down.supp = TRUE, supp.idx = 1, supp.type = "10x",
  out.folder = "/path/to/output/folder"
)
# need users to provide the output folder
GSE236082.seu <- ParseGEO(
  acce = "GSE236082", down.supp = TRUE, supp.type = "10xSingle",
  out.folder = "/path/to/output/folder"
)
} # }