Download Matrix from GEO and Load to Seurat/DESeq2.
Usage
ParseGEO(
acce,
platform = NULL,
down.supp = FALSE,
supp.idx = 1,
timeout = 3600,
data.type = c("sc", "bulk"),
supp.type = c("count", "10x", "10xSingle"),
file.regex = NULL,
extra.cols = c("chr", "start", "end", "strand", "length", "width", "chromosome",
"seqnames", "seqname", "chrom", "chromosome_name", "seqid", "stop"),
transpose = TRUE,
out.folder = NULL,
accept.fmt = c("MEX", "h5"),
gene2feature = TRUE,
load2R = TRUE,
merge = TRUE,
meta.data = NULL,
fmu = NULL,
...
)Arguments
- acce
GEO accession number.
- platform
Platform information/field. Disable when
down.suppis TRUE. Default: NULL (disable).- down.supp
Logical value, whether to download supplementary files to create count matrix. If TRUE, always download supplementary files. If FALSE, use
ExpressionSet(If contains non-integer or empty, download supplementary files automatically). Default: FALSE.- supp.idx
The index of supplementary files to download. This should be consistent with
platform. Used whensupp.typeis 10x/count. Default: 1.- timeout
Timeout for
download.file. Default: 3600.- data.type
The data type of the dataset, choose from "sc" (single-cell) and "bulk" (bulk). Default: "sc".
- supp.type
The type of downloaded supplementary files, choose from count (count matrix file or single count matrix file), 10x (cellranger output files in tar/gz supplementary files, contains barcodes, genes/features and matrix, e.g. GSE200257) and 10xSingle (cellranger output files in supplementary files directly, e.g. GSE236082). Default: count.
- file.regex
The regex to extract correct count matrix files. Used when
supp.typeis count. Default: NULL (when multiple file extensions are available in the downloaded tar file, use the first file extension).- extra.cols
Extra columns to remove, e.g., "Chr", "Start", "End", "Strand", "Length" (featureCounts). Used when
supp.typeis count. Default: "chr", "start", "end", "strand", "length", "width", "chromosome", "seqnames", "seqname", "chrom", "chromosome_name", "seqid", "stop".- transpose
Logical value, whether to transpose the matrix. Used when the number of rows is less than the number of columns. Used when
supp.typeis count. Default: TRUE.- out.folder
Output folder to save 10x files. Used when
supp.typeis 10x/10xSingle. Default: NULL (current working directory).- accept.fmt
Vector of accepted 10x output format, MEX (barcode/feature/gene/matrix), h5 (h5/h5.gz). Used when
supp.typeis 10x/10xSingle. Default: c("MEX", "h5").- gene2feature
Logical value, whether to rename
genes.tsv.gztofeatures.tsv.gz. Used whensupp.typeis 10x/10xSingle. Default: TRUE.- load2R
Logical value, whether to load the count matrix to R. Default: TRUE.
- merge
Logical value, whether to merge Seurat list when there are multiple 10x files (
supp.typeis 10x). Used whensupp.typeis 10x/10xSingle. Default: FALSE.- meta.data
Dataframe contains sample information for DESeqDataSet. Used when
data.typeis bulk. Default: NULL.- fmu
Column of
meta.datacontains group information. Used whendata.typeis bulk. Default: NULL.- ...
Parameters for
getGEO. Used whendown.suppis FALSE.
Value
If load2R is FALSE, return count matrix. If data.type is "sc", return SeuratObject (if merge is TRUE), SeuratObject list (if merge is FALSE), NULL (no SeuratObject detected).
If data.type is "bulk", return DESeqDataSet.
Examples
if (FALSE) { # \dontrun{
# the supp files are count matrix
GSE94820.seu <- ParseGEO(acce = "GSE94820", down.supp = TRUE, supp.idx = 1, supp.type = "count")
# the supp files are cellranger output files: barcodes, genes/features and matrix
# need users to provide the output folder
GSE200257.seu <- ParseGEO(
acce = "GSE200257", down.supp = TRUE, supp.idx = 1, supp.type = "10x",
out.folder = "/path/to/output/folder"
)
# need users to provide the output folder
GSE236082.seu <- ParseGEO(
acce = "GSE236082", down.supp = TRUE, supp.type = "10xSingle",
out.folder = "/path/to/output/folder"
)
} # }