Conduct Differential Analysis with DESeq2.

ConductDESeq2(
  counts.folder,
  count.matrix.file = NULL,
  meta.file,
  group.key = NULL,
  count.type = c("htseq-count", "featurecounts"),
  min.count = 10,
  ref.group = NULL,
  out.folder = NULL,
  data.type = c("RNA", "ChIP", "ATAC"),
  peak.anno.key = c("Promoter", "5' UTR", "3' UTR", "Exon", "Intron", "Downstream",
    "Distal Intergenic", "All"),
  qc.ndepth = 10,
  transform.method = c("rlog", "vst", "ntd"),
  var.genes = NULL,
  batch = NULL,
  outlier.detection = T,
  rpca.method = c("PcaGrid", "PcaHubert"),
  k = 2,
  pca.x = "PC1",
  pca.y = "PC2",
  pca.z = "PC3",
  loding.pc = 1:5,
  loading.gene.num = 10,
  loading.ncol = 2,
  enrich.loading.pc = 1:5,
  enrich.loading.gene = 200,
  gene.type = c("ENSEMBL", "ENTREZID", "SYMBOL"),
  enrich.type = c("ALL", "GO", "KEGG"),
  go.type = c("ALL", "BP", "MF", "CC"),
  enrich.pvalue = 0.05,
  enrich.qvalue = 0.05,
  org.db = "org.Mm.eg.db",
  organism = "mmu",
  padj.method = c("BH", "holm", "hochberg", "hommel", "bonferroni", "BY", "fdr",
    "none"),
  show.term = 15,
  str.width = 30,
  signif = "padj",
  signif.threshold = 0.05,
  l2fc.threshold = 1,
  gene.map = NULL,
  gtf.file = NULL,
  norm.type = c("DESeq2", "TMM", "CPM", "RPKM", "TPM"),
  log.counts = TRUE,
  deg.label.df = NULL,
  deg.label.key = NULL,
  deg.label.num = 2,
  deg.label.color = NULL,
  fe.gene.key = NULL,
  gmt.file,
  gene.sets = NULL,
  minGSSize = 10,
  maxGSSize = 500,
  gsea.pvalue = 0.05
)

Arguments

counts.folder

Folder contains all sample's count file. Count file should be SampleName.txt.

count.matrix.file

File contains count matrix, if provided, use this instead of counts.folder. Default: NULL.

meta.file

File contains sample metadata.

group.key

Column in meta.file that represents sample group information. Default: NULL.

count.type

The source of count file, chosen from htseq-count, featurecounts. Default: htseq-count.

min.count

A feature is considered to be detected if the corresponding number of read counts is > min.count. By default, min.count = 10.

ref.group

Reference group name. When set NULL, select first element of groups. Default: NULL.

out.folder

Folder to save enrichment results. Default: wording directory.

data.type

Input data type, choose from RNA, ChIP, ATAC. Default: RNA.

peak.anno.key

Peak location, chosen from "Promoter", "5' UTR", "3' UTR", "Exon", "Intron", "Downstream", "Distal Intergenic","All". Default: "Promoter".

qc.ndepth

Number of different sequencing depths to be simulated and plotted apart from the real depth. Default: 10. This parameter is only used by type "saturation".

transform.method

Data transformation methods, chosen from rlog, vst and ntd. Default: rlog.

var.genes

Select genes with larger variance for PCA analysis. Default: all genes.

batch

Batch column to conduct batch correction. Default value is NULL, do not conduct batch correction.

outlier.detection

Logical value. If TRUE, conduct outlier detection with robust PCA.

rpca.method

robust PCA method, chosen from PcaGrid, PcaHubert. Default: PcaGrid.

k

number of principal components to compute, for PcaGrid, PcaHubert. Default: 2.

pca.x

The principal component to display on the x axis. Default: PC1.

pca.y

The principal component to display on the y axis. Default: PC2.

pca.z

The principal component to display on the z axis. Default: PC3.

loding.pc

Specify PC to create loding plot. Default: 1:5.

loading.gene.num

Specify gene number of PC to create loding plot. Default: 10.

loading.ncol

The columns of loading bar or heatmap. Default: 2.

enrich.loading.pc

Specify PC to conduct enrichment analysis. Default: 1:5.

enrich.loading.gene

Specify gene number of PC to conduct enrichment analysis. Default: 200.

gene.type

Gene name type. Chosen from ENSEMBL, ENTREZID,SYMBOL. Default: ENSEMBL.

enrich.type

Enrichment type, chosen from ALL, GO, KEGG. Default: ALL.

go.type

GO enrichment type, chosen from ALL, BP, MF, CC. Default: ALL.

enrich.pvalue

Cutoff value of pvalue. Default: 0.05.

enrich.qvalue

Cutoff value of qvalue. Default: 0.05.

org.db

Organism database. Default: org.Mm.eg.db.

organism

Supported organism listed in 'http://www.genome.jp/kegg/catalog/org_list.html'. Default: mmu.

padj.method

One of "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none". Default: BH.

show.term

Number of enrichment term to show. Default: 15.

str.width

Length of enrichment term in plot. Default: 30.

signif

Significance criterion. For DESeq2 results, can be chosen from padj, pvalue. For edgeR results, can be chosen from FDR, PValue. Default: padj.

signif.threshold

Significance threshold to get differentially expressed genes or accessible/binding peaks. Default: 0.05.

l2fc.threshold

Log2 fold change threshold to get differentially expressed genes or accessible/binding peaks. Default: 1.

gene.map

Use data frame instead of org.db to conduct gene id conversion, first column should be Gene. Default: NULL.

gtf.file

Gene annotation file used to get gene length, used if norm.type=="RPKM" or norm.type=="TPM". Default: NULL.

norm.type

Normalization method, chosen from DESeq2, TMM, CPM, RPKM, TPM. Default: DESeq2.

log.counts

Logical value, if TRUE, export log2(normalized.counts + 1), else export normalized.counts. Default: TRUE.

deg.label.df

Label data frame, at least contains Gene column. Default: NULL. When set NULL, use deg.label.num. When provided, the second column should not be SYMBOL, ENSEMBL, ENTREZID.

deg.label.key

Which column to use as label. Default: NULL (use Gene column of deg.label.df).

deg.label.num

Gene number to label, choose according to log2FoldChange. When deg.label.df is set NULL, use this to determine genes to label. Default: NULL.

deg.label.color

Color vector for labels. Default: NULL.

fe.gene.key

Column name to conduct enrichment analysis. Default: NULL.

gmt.file

Gene Matrix Transposed file format.

gene.sets

Gene sets information, containing two columns: gs_name, entrez_gene. Default: NULL.

minGSSize

Minimal size of each geneSet for analyzing. Default: 10.

maxGSSize

Maximal size of genes annotated for testing. Default: 500.

gsea.pvalue

Cutoff value of pvalue. Default: 0.05.

Examples

# library(DESeq2) # library(DEbPeak) # count.file <- system.file("extdata", "snon_count.txt", package = "DEbPeak") # meta.file <- system.file("extdata", "snon_meta.txt", package = "DEbPeak") # gmt.file <- system.file("extdata", "m5.go.bp.v2022.1.Mm.entrez.gmt", package = "DEbPeak") # ConductDESeq2(count.matrix.file = count.file, meta.file = meta.file, group.key = "condition", # count.type = "htseq-count", ref.group = "WT", signif = "pvalue", l2fc.threshold = 0.3, gmt.file = gmt.file)