LicenseCODE_SIZE

Introduction

DEbPeak aims to explore, visualize, interpret multi-omics data and unravel the regulation of gene expression by combining RNA-seq with peak-related data (eg: ChIP-seq, ATAC-seq, m6a-seq et al.). It contains ten functional modules:

  • Parse GEO: Extract study information, raw count matrix and metadata from GEO database.
  • Quality Control (QC): QC on count matrix and samples.
    • QC on count matrix: Proportion of genes detected in different samples under different CPM thresholds and the saturation of the number of genes detected.
    • QC on samples: Euclidean distance and pearson correlation coefficient of samples across different conditions, sample similarity on selected principal components (check batch information and conduct batch correction) and outlier detection with robust PCA.
  • Principal Component Analysis (PCA): this module can be divided into three sub modules, basic info, loading related and 3D visualization.
    • Basic info: scree plot (help to select the useful PCs), biplot (sample similarity with corresponding genes with larger loadings) and PC pairs plot (sample similarity under different PC combinations).
    • Loading related: visualize genes with larger positive and negative loadings on selected PCs, conduct GO enrichment analysis on genes with larger positive and negative loadings on selected PCs.
    • 3D visualization: visualize samples on three selected PCs.
  • Differential Analysis and Visualization: this module includes seven powerful visualization methods (Volcano Plot, Scatter Plot, MA Plot, Rank Plot, Gene/Peak Plot, Heatmap, Pie Plot for peak-related data).
  • Functional Enrichment Analysis (FEA): GO enrichment analysis, KEGG enrichment analysis, Gene Set Enrichment Analysis (GSEA).
    • GO (Biological Process, Molecular Function, Cellular Component) and KEGG on differential expression genes or accessible/binding peaks.
    • GSEA on all genes (Notice: GSEA is not available for peak-related data)
  • Predict transcription factors (PredictTFs): Identify transcription factors with differentially expressed genes, DEbPeak provides three methods (BART, ChEA3 and TFEA.ChIP).
  • Integrate RNA-seq with peak-related data:
    • Get consensus peaks: For multiple peak files, get consensus peaks; for single peak file, use it directly (used in consensus integration mode).
    • Peak profile plots: Heatmap of peak binding to TSS regions, Average Profile of ChIP peaks binding to TSS region, Profile of ChIP peaks binding to different regions (used in consensus integration mode).
    • Peak annotaion (used in consensus integration mode).
    • Integrate RNA-seq with peak-related data (consensus mode): Integrate RNA-seq with peak-related data to find direct targets, including up-regulated and down-regulated.
    • Integrate RNA-seq with peak-related data (differential mode): Integrate RNA-seq and peak-related data based on differential analysis.
    • Integration summary: include venn diagram and quadrant diagram (differential mode).
    • GO enrichment on integrated results.
    • Find motif on integrated results: Due to the nature of ATAC-seq, we usually need to find motif on integrated results to obtain potential regulatory factors.
  • Integrate RNA-seq with RNA-seq:
    • Integration summary: include venn diagram and quadrant diagram.
    • GO enrichment on integrated results.
  • Integrate peak-related data with peak-related data:
    • Integration summary: include venn diagram and quadrant diagram (differential mode).
    • GO enrichment on integrated results.
  • Utils: useful functions, including creating enrichment plot for selected enrichment terms, gene ID conversion and count normalization(DESeq2’s median of ratios, TMM, CPM, TPM, RPKM).

To enhance the ease of use of the tool, we have also developed an web server for DEbPeak that allows users to submit files to the web page and set parameters to get the desired results. Unlike the standalone R package, the web server has built-in DESeq2 for differential analysis, while the R package can accept user input results from DESeq2 or edgeR, which will be more flexible.

By the way, all plots generated are publication-ready , and most of them are based on ggplot2, so that users can easily modify them according to their needs. We also provide various color palettes, including discrete and continuous, color blind friendly and multiple categorical variables.


Framework

DEbPeak_framework

Installation

You can install the package via the Github repository:

# install.package("remotes")   #In case you have not installed it.
remotes::install_github("showteeth/DEbPeak")

In general, it is recommended to install from Github repository (update more timely).


Usage

Vignette

Detailed usage is available in here. We divide these vignettes into four categories:

Function list

Type Function Description Key packages
Parse GEO ParseGEO Extract study information, raw count matrix and metadata from GEO database GEOquery
Quality Control CountQC Quality control on count matrix (gene detection sensitivity and sequencing depth saturation) NOISeq
SampleRelation Quality control on samples (sample clustering based on euclidean distance and pearson correlation coefficient) stats
OutlierDetection Detect outlier with robust PCA rrcov
QCPCA PCA related functions used in quality control (batch detection and correction, outlier detection) stats, sva, rrcov
Principal Component Analysis PCA Conduct principal component analysis stats
PCABasic Generated PCA baisc plots, including screen plot, biplot and pairs plot PCAtools
ExportPCGenes Export genes of selected PCs tidyverse
LoadingPlot PCA loading plot, including bar plot and heatmap ggplot2, ComplexHeatmap
LoadingGO GO enrichment on PC’s loading genes clusterProfiler
PCA3D Create 3D PCA plot plot3D
Differential Axpression Analysis ExtractDA Extract differential analysis results tidyverse
VolcanoPlot VolcanoPlot for differential analysis results ggplot2
ScatterPlot ScatterPlot for differential analysis results ggplot2
MAPlot MA-plot for differential analysis results ggplot2
RankPlot Rank plot for differential analysis results ggplot2
GenePlot Gene expresion or peak accessibility/binding plot ggplot2
DEHeatmap Heatmap for differential analysis results ComplexHeatmap
DiffPeakPie Stat genomic regions of differential peaks with pie plot ggpie
ConductDESeq2 Conduct differential analysis with DESeq2 NOISeq, stats, sva, rrcov, PCAtools, DESeq2, ggplot2, ComplexHeatmap, clusterProfiler, plot3D, tidyverse
Functional Enrichment Analysis ConductFE Conduct functional enrichment analysis (GO and KEGG) clusterProfiler
ConductGSEA Conduct gene set enrichment analysis (GSEA) clusterProfiler
VisGSEA Visualize GSEA results enrichplot
Peak-related Analysis



PeakMatrix Prepare count matrix and sample metadata for peak-related data DiffBind, ChIPseeker
GetConsensusPeak Get consensus peak from replicates MSPC
PeakProfile Visualize peak accessibility/binding profile ChIPseeker
AnnoPeak Assign peaks with the genomic binding region and nearby genes ChIPseeker
PeakAnnoPie Visualize peak annotation results with pie plot ggpie
MotifEnrich Motif enrichment for differentially accessible/binding peaks HOMER
Integrate RNA-seq with Peak-related Data DEbPeak Integrate differential expression results and peak annotation/differential analysis results. tidyverse
InteVenn Create venn diagram for integration results (support DEbPeak, DEbDE, PeakbPeak) ggvenn
InteDiffQuad Create quadrant diagram for integration results (support DEbPeak, DEbDE, PeakbPeak) ggplot2
InteFE GO enrichment on integration results (support DEbPeak, DEbDE, PeakbPeak) clusterProfiler
FindMotif Find motif on integration results HOMER
DEbCA Integrate differential expression results and peak annotation results (two kinds of peak-related data) tidyverse
Integrate RNA-seq with RNA-seq DEbDE Integrate Two Differential Expression Results tidyverse
DEbDEFE GO Enrichment on Two Differential Expression Integration Results. clusterProfiler
Integrate Peak-related Data with Peak-related Data PeakbPeak Integrate Two Peak Annotation/Differential Analysis Results. tidyverse
PeakbPeakFE GO Enrichment on Two Peak Annotation/Differential Analysis Integration Results. clusterProfiler
Utils EnrichPlot Create bar or dot plot for selected functional enrichment analysis results (GO and KEGG) ggplot2
IDConversion Gene ID conversion between ENSEMBL ENTREZID SYMBOL clusterProfiler
GetGeneLength Get gene length from GTF GenomicFeatures, GenomicRanges
NormalizedCount Perform counts normalization (DESeq2’s median of ratios, TMM, CPM, RPKM, TPM) DESeq2, edgeR, tidyverse

Notice

  • The KEGG API has changed, to perform KEGG enrichment, you’d better update clusterProfiler >= 4.7.1.

Contact

For any question, feature request or bug report please write an email to .


Code of Conduct

Please note that the DEbPeak project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.