Conduct Differential Analysis with DESeq2.

ConductDESeq2(
  counts.folder,
  count.matrix.file = NULL,
  meta.file,
  group.key = NULL,
  count.type = c("htseq-count", "featurecounts"),
  min.count = 10,
  ref.group = NULL,
  out.folder = NULL,
  data.type = c("RNA", "ChIP", "ATAC"),
  peak.anno.key = c("Promoter", "5' UTR", "3' UTR", "Exon", "Intron", "Downstream",
    "Distal Intergenic", "All"),
  qc.ndepth = 10,
  transform.method = c("rlog", "vst", "ntd"),
  var.genes = NULL,
  batch = NULL,
  outlier.detection = T,
  rpca.method = c("PcaGrid", "PcaHubert"),
  k = 2,
  pca.x = "PC1",
  pca.y = "PC2",
  pca.z = "PC3",
  loding.pc = 1:5,
  loading.gene.num = 10,
  loading.ncol = 2,
  enrich.loading.pc = 1:5,
  enrich.loading.gene = 200,
  gene.type = c("ENSEMBL", "ENTREZID", "SYMBOL"),
  enrich.type = c("ALL", "GO", "KEGG"),
  go.type = c("ALL", "BP", "MF", "CC"),
  enrich.pvalue = 0.05,
  enrich.qvalue = 0.05,
  org.db = "org.Mm.eg.db",
  organism = "mmu",
  padj.method = c("BH", "holm", "hochberg", "hommel", "bonferroni", "BY", "fdr",
    "none"),
  show.term = 15,
  str.width = 30,
  signif = "padj",
  signif.threshold = 0.05,
  l2fc.threshold = 1,
  gene.map = NULL,
  gtf.file = NULL,
  norm.type = c("DESeq2", "TMM", "CPM", "RPKM", "TPM"),
  log.counts = TRUE,
  deg.label.df = NULL,
  deg.label.key = NULL,
  deg.label.num = 2,
  deg.label.color = NULL,
  fe.gene.key = NULL,
  gmt.file,
  gene.sets = NULL,
  minGSSize = 10,
  maxGSSize = 500,
  gsea.pvalue = 0.05
)

Arguments

counts.folder	Folder contains all sample's count file. Count file should be SampleName.txt.
count.matrix.file	File contains count matrix, if provided, use this instead of `counts.folder`. Default: NULL.
meta.file	File contains sample metadata.
group.key	Column in `meta.file` that represents sample group information. Default: NULL.
count.type	The source of count file, chosen from htseq-count, featurecounts. Default: htseq-count.
min.count	A feature is considered to be detected if the corresponding number of read counts is > `min.count`. By default, `min.count` = 10.
ref.group	Reference group name. When set NULL, select first element of groups. Default: NULL.
out.folder	Folder to save enrichment results. Default: wording directory.
data.type	Input data type, choose from RNA, ChIP, ATAC. Default: RNA.
peak.anno.key	Peak location, chosen from "Promoter", "5' UTR", "3' UTR", "Exon", "Intron", "Downstream", "Distal Intergenic","All". Default: "Promoter".
qc.ndepth	Number of different sequencing depths to be simulated and plotted apart from the real depth. Default: 10. This parameter is only used by type "saturation".
transform.method	Data transformation methods, chosen from rlog, vst and ntd. Default: rlog.
var.genes	Select genes with larger variance for PCA analysis. Default: all genes.
batch	Batch column to conduct batch correction. Default value is NULL, do not conduct batch correction.
outlier.detection	Logical value. If TRUE, conduct outlier detection with robust PCA.
rpca.method	robust PCA method, chosen from `PcaGrid`, `PcaHubert`. Default: PcaGrid.
k	number of principal components to compute, for `PcaGrid`, `PcaHubert`. Default: 2.
pca.x	The principal component to display on the x axis. Default: PC1.
pca.y	The principal component to display on the y axis. Default: PC2.
pca.z	The principal component to display on the z axis. Default: PC3.
loding.pc	Specify PC to create loding plot. Default: 1:5.
loading.gene.num	Specify gene number of PC to create loding plot. Default: 10.
loading.ncol	The columns of loading bar or heatmap. Default: 2.
enrich.loading.pc	Specify PC to conduct enrichment analysis. Default: 1:5.
enrich.loading.gene	Specify gene number of PC to conduct enrichment analysis. Default: 200.
gene.type	Gene name type. Chosen from ENSEMBL, ENTREZID,SYMBOL. Default: ENSEMBL.
enrich.type	Enrichment type, chosen from ALL, GO, KEGG. Default: ALL.
go.type	GO enrichment type, chosen from ALL, BP, MF, CC. Default: ALL.
enrich.pvalue	Cutoff value of pvalue. Default: 0.05.
enrich.qvalue	Cutoff value of qvalue. Default: 0.05.
org.db	Organism database. Default: org.Mm.eg.db.
organism	Supported organism listed in 'http://www.genome.jp/kegg/catalog/org_list.html'. Default: mmu.
padj.method	One of "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none". Default: BH.
show.term	Number of enrichment term to show. Default: 15.
str.width	Length of enrichment term in plot. Default: 30.
signif	Significance criterion. For DESeq2 results, can be chosen from padj, pvalue. For edgeR results, can be chosen from FDR, PValue. Default: padj.
signif.threshold	Significance threshold to get differentially expressed genes or accessible/binding peaks. Default: 0.05.
l2fc.threshold	Log2 fold change threshold to get differentially expressed genes or accessible/binding peaks. Default: 1.
gene.map	Use data frame instead of `org.db` to conduct gene id conversion, first column should be Gene. Default: NULL.
gtf.file	Gene annotation file used to get gene length, used if `norm.type=="RPKM"` or `norm.type=="TPM"`. Default: NULL.
norm.type	Normalization method, chosen from DESeq2, TMM, CPM, RPKM, TPM. Default: DESeq2.
log.counts	Logical value, if TRUE, export log2(normalized.counts + 1), else export normalized.counts. Default: TRUE.
deg.label.df	Label data frame, at least contains Gene column. Default: NULL. When set NULL, use `deg.label.num`. When provided, the second column should not be SYMBOL, ENSEMBL, ENTREZID.
deg.label.key	Which column to use as label. Default: NULL (use Gene column of `deg.label.df`).
deg.label.num	Gene number to label, choose according to log2FoldChange. When `deg.label.df` is set NULL, use this to determine genes to label. Default: NULL.
deg.label.color	Color vector for labels. Default: NULL.
fe.gene.key	Column name to conduct enrichment analysis. Default: NULL.
gmt.file	Gene Matrix Transposed file format.
gene.sets	Gene sets information, containing two columns: gs_name, entrez_gene. Default: NULL.
minGSSize	Minimal size of each geneSet for analyzing. Default: 10.
maxGSSize	Maximal size of genes annotated for testing. Default: 500.
gsea.pvalue	Cutoff value of pvalue. Default: 0.05.

Examples

# library(DESeq2)
# library(DEbPeak)
# count.file <- system.file("extdata", "snon_count.txt", package = "DEbPeak")
# meta.file <- system.file("extdata", "snon_meta.txt", package = "DEbPeak")
# gmt.file <- system.file("extdata", "m5.go.bp.v2022.1.Mm.entrez.gmt", package = "DEbPeak")
# ConductDESeq2(count.matrix.file = count.file, meta.file = meta.file, group.key = "condition",
#               count.type = "htseq-count", ref.group = "WT", signif = "pvalue", l2fc.threshold = 0.3, gmt.file = gmt.file)