Introduction

For peak-related data, motif enrichment can be used to find the TFs involved in the gene regulation, especially for ATAC-seq (ATAC-seq can proflie global chromatin accessibility. Thus, the peaks of ATAC-seq may originate from hundreds of TFs.).

Here, we will perform motif enrichment on differentially accessible (DA) peaks.


Example data

The data used here are from In vivo CD8+ T cell CRISPR screening reveals control by Fli1 in infection and cancer:

  • RNA-seq data: the sgCtrl vs sgFli1 RNA sequencing at D8 Cl13 p.i., the raw data are stored in GSE149838
  • ATAC-seq data: sgCtrl vs sgFli1 ATAC sequencing at D9 Cl13 p.i., the raw data are stored in GSE149836

Load packages


Differential Expression Analysis

To simplify the steps of obtaining differential expression analysis results, DEbPeak provides ConductDESeq2 to perfrom all steps of differential expression analysis, including quality control, principal component analysis, differential expression analysis and functional enrichment analysis.

peak.matrix.file <- system.file("extdata", "RA_ATAC_count.txt", package = "DEbPeak")
peak.meta.file <- system.file("extdata", "RA_ATAC_meta.txt", package = "DEbPeak")
ConductDESeq2(count.matrix.file = peak.matrix.file, meta.file = peak.meta.file, data.type = "ATAC",
              gene.type = "SYMBOL", outlier.detection = F, min.count = 0,
              peak.anno.key = "All", group.key = "condition", ref.group = "WT",
              out.folder = '/home/songyabing/R/learn/tmp/DEbPeak/ATACDE',
              signif = "padj", signif.threshold = 0.05, l2fc.threshold = 0, gmt.file = '')
## Create DESeqDataSet from count matrix!
## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors
## Conduct Quality Control on Counts!
## Differential expression analysis with DESeq2!
## [1] "Warning: 7 features with 0 counts in all samples are to be removed for this analysis."
## [1] "Count distributions are to be computed for:"
## [1] "Fli1KO_1" "Fli1KO_2" "Fli1KO_3" "Fli1KO_4" "WT_1"     "WT_2"
## Differential expression analysis with DESeq2!
## Raw gene number: 101921
## Gene number after filtered with counts bigger than 0 : 101921
## Conduct Quality Control on Samples!
## Differential expression analysis with DESeq2!
## Conduct PCA analysis!
## Conduct PCA analysis!
## Differential expression analysis with DESeq2!
## Use all genes for PCA!
## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.
## Selecting by Loadding
## Selecting by Loadding
## Differential expression analysis with DESeq2!
## 
## Selecting by Loadding
## Selecting by Loadding
## 
## 载入程辑包:'BiocGenerics'
## 
## 
## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs
## 
## 
## The following objects are masked from 'package:base':
## 
##     anyDuplicated, append, as.data.frame, basename, cbind, colnames,
##     dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
##     grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
##     order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
##     rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
##     union, unique, unsplit, which.max, which.min
## 
## 
## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.
## 
## 
## 
## 载入程辑包:'S4Vectors'
## 
## 
## The following object is masked from 'package:base':
## 
##     expand.grid
## 
## 
## 
## 
## Conduct Functional Enrichment on PC Loading genes!
## 
## Selecting by Loadding
## Selecting by Loadding
## Convert SYMBOL to ENTREZID!
## 
## 'select()' returned 1:1 mapping between keys and columns
## 
## conduct ALL GO enrichment analysis on: PC1_Positive
## 
## conduct KEGG enrichment analysis: PC1_Positive
## 
## Reading KEGG annotation online: "https://rest.kegg.jp/link/mmu/pathway"...
## 
## Reading KEGG annotation online: "https://rest.kegg.jp/list/pathway/mmu"...
## 
## Convert SYMBOL to ENTREZID!
## 
## 'select()' returned 1:1 mapping between keys and columns
## 
## conduct ALL GO enrichment analysis on: PC1_Negative
## 
## conduct KEGG enrichment analysis: PC1_Negative
## 
## wrong orderBy parameter; set to default `orderBy = "x"`
## 
## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.
## Convert SYMBOL to ENTREZID!
## 
## 'select()' returned 1:1 mapping between keys and columns
## 
## conduct ALL GO enrichment analysis on: PC2_Positive
## 
## conduct KEGG enrichment analysis: PC2_Positive
## 
## wrong orderBy parameter; set to default `orderBy = "x"`
## 
## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.
## Convert SYMBOL to ENTREZID!
## 
## 'select()' returned 1:1 mapping between keys and columns
## 
## conduct ALL GO enrichment analysis on: PC2_Negative
## 
## conduct KEGG enrichment analysis: PC2_Negative
## 
## wrong orderBy parameter; set to default `orderBy = "x"`
## 
## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.
## Convert SYMBOL to ENTREZID!
## 
## 'select()' returned 1:1 mapping between keys and columns
## 
## conduct ALL GO enrichment analysis on: PC3_Positive
## 
## conduct KEGG enrichment analysis: PC3_Positive
## 
## wrong orderBy parameter; set to default `orderBy = "x"`
## 
## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.
## Convert SYMBOL to ENTREZID!
## 
## 'select()' returned 1:1 mapping between keys and columns
## 
## conduct ALL GO enrichment analysis on: PC3_Negative
## 
## conduct KEGG enrichment analysis: PC3_Negative
## 
## wrong orderBy parameter; set to default `orderBy = "x"`
## 
## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.
## Convert SYMBOL to ENTREZID!
## 
## 'select()' returned 1:1 mapping between keys and columns
## 
## conduct ALL GO enrichment analysis on: PC4_Positive
## 
## conduct KEGG enrichment analysis: PC4_Positive
## 
## wrong orderBy parameter; set to default `orderBy = "x"`
## 
## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.
## Convert SYMBOL to ENTREZID!
## 
## 'select()' returned 1:1 mapping between keys and columns
## 
## conduct ALL GO enrichment analysis on: PC4_Negative
## 
## conduct KEGG enrichment analysis: PC4_Negative
## 
## wrong orderBy parameter; set to default `orderBy = "x"`
## 
## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.
## wrong orderBy parameter; set to default `orderBy = "x"`
## 
## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.
## wrong orderBy parameter; set to default `orderBy = "x"`
## 
## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.
## wrong orderBy parameter; set to default `orderBy = "x"`
## 
## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.
## Convert SYMBOL to ENTREZID!
## 
## 'select()' returned 1:1 mapping between keys and columns
## 
## conduct ALL GO enrichment analysis on: PC5_Positive
## 
## conduct KEGG enrichment analysis: PC5_Positive
## 
## wrong orderBy parameter; set to default `orderBy = "x"`
## 
## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.
## wrong orderBy parameter; set to default `orderBy = "x"`
## 
## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.
## wrong orderBy parameter; set to default `orderBy = "x"`
## 
## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.
## Convert SYMBOL to ENTREZID!
## 
## 'select()' returned 1:1 mapping between keys and columns
## 
## conduct ALL GO enrichment analysis on: PC5_Negative
## 
## conduct KEGG enrichment analysis: PC5_Negative
## 
## wrong orderBy parameter; set to default `orderBy = "x"`
## 
## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.
## wrong orderBy parameter; set to default `orderBy = "x"`
## 
## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.
## wrong orderBy parameter; set to default `orderBy = "x"`
## 
## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.
## Conduct Differential Expression Analysis!
## 
## estimating size factors
## 
## estimating dispersions
## 
## gene-wise dispersion estimates
## 
## mean-dispersion relationship
## 
## -- note: fitType='parametric', but the dispersion trend was not well captured by the
##    function: y = a/x + b, and a local regression fit was automatically substituted.
##    specify fitType='local' or 'mean' to avoid this message next time.
## 
## final dispersion estimates
## 
## fitting model and testing
## 
## Conduct Gene ID Conversion!
## 
## 'select()' returned 1:many mapping between keys and columns
## Warning in clusterProfiler::bitr(de.df[, "Gene2Convert"], fromType =
## gene.type, : 1.63% of input gene IDs are fail to map...
## Visualize Differential Analysis Results!
## Differential expression analysis with DESeq2!
## Differential expression analysis with DESeq2!
## Differential expression analysis with DESeq2!
## Differential expression analysis with DESeq2!
## Differential expression analysis with DESeq2!
## Differential expression analysis with DESeq2!
## Conduct Functional Enrichment Analysis!
## Differential expression analysis with DESeq2!
## Convert SYMBOL to ENTREZID!
## 'select()' returned 1:1 mapping between keys and columns
## conduct ALL GO enrichment analysis on: UP
## conduct KEGG enrichment analysis: UP
## wrong orderBy parameter; set to default `orderBy = "x"`
## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.wrong orderBy parameter; set to default `orderBy = "x"`
## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.wrong orderBy parameter; set to default `orderBy = "x"`
## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.wrong orderBy parameter; set to default `orderBy = "x"`
## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.Convert SYMBOL to ENTREZID!
## 'select()' returned 1:1 mapping between keys and columns
## conduct ALL GO enrichment analysis on: DOWN
## conduct KEGG enrichment analysis: DOWN
## wrong orderBy parameter; set to default `orderBy = "x"`
## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.wrong orderBy parameter; set to default `orderBy = "x"`
## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.wrong orderBy parameter; set to default `orderBy = "x"`
## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.wrong orderBy parameter; set to default `orderBy = "x"`
## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.Gene Set Enrichment Analysis is not available for ChIP-seq or ATAC-seq data!
## All Analysis Done!

The structure of result directory:

tree /home/songyabing/R/learn/tmp/DEbPeak/ATACDE
## /home/songyabing/R/learn/tmp/DEbPeak/ATACDE
## ├── CountQC_CPM.pdf
## ├── CountQC_saturation.pdf
## ├── DA
## │   ├── Condition_KO_WT_all.csv
## │   ├── Condition_KO_WT_padj0.05_FC0.csv
## │   ├── DA_GenePlot.pdf
## │   ├── DA_Heatmap.pdf
## │   ├── DA_MAPlot.pdf
## │   ├── DA_RankPlot.pdf
## │   ├── DA_ScatterPlot.pdf
## │   ├── DA_VolcanoPlot.pdf
## │   └── normalized_counts.csv
## ├── DEG
## │   ├── Condition_KO_WT_all.csv
## │   ├── Condition_KO_WT_padj0.05_FC0.csv
## │   ├── DEG_GenePlot.pdf
## │   ├── DEG_Heatmap.pdf
## │   ├── DEG_MAPlot.pdf
## │   ├── DEG_RankPlot.pdf
## │   ├── DEG_ScatterPlot.pdf
## │   ├── DEG_VolcanoPlot.pdf
## │   └── normalized_counts.csv
## ├── FE
## │   ├── DOWN_ALL_GO.csv
## │   ├── DOWN_Biological_Process.png
## │   ├── DOWN_Cellular_Component.png
## │   ├── DOWN_Functional_Enrichment.pdf
## │   ├── DOWN_KEGG.csv
## │   ├── DOWN_KEGG_Enrichment.png
## │   ├── DOWN_Molecular_Function.png
## │   ├── UP_ALL_GO.csv
## │   ├── UP_Biological_Process.png
## │   ├── UP_Cellular_Component.png
## │   ├── UP_Functional_Enrichment.pdf
## │   ├── UP_KEGG.csv
## │   ├── UP_KEGG_Enrichment.png
## │   └── UP_Molecular_Function.png
## ├── PCA
## │   ├── PC1
## │   │   ├── Negative
## │   │   │   ├── PC1_Negative_ALL_GO.csv
## │   │   │   └── PC1_Negative_KEGG.csv
## │   │   └── Positive
## │   │       ├── PC1_Positive_ALL_GO.csv
## │   │       └── PC1_Positive_KEGG.csv
## │   ├── PC1_Negative_Biological_Process.png
## │   ├── PC1_Negative_Cellular_Component.png
## │   ├── PC1_Negative_Functional_Enrichment.pdf
## │   ├── PC1_Negative_KEGG_Enrichment.png
## │   ├── PC1_Negative_Molecular_Function.png
## │   ├── PC1_Positive_Biological_Process.png
## │   ├── PC1_Positive_Cellular_Component.png
## │   ├── PC1_Positive_Functional_Enrichment.pdf
## │   ├── PC1_Positive_KEGG_Enrichment.png
## │   ├── PC1_Positive_Molecular_Function.png
## │   ├── PC2
## │   │   ├── Negative
## │   │   │   ├── PC2_Negative_ALL_GO.csv
## │   │   │   └── PC2_Negative_KEGG.csv
## │   │   └── Positive
## │   │       ├── PC2_Positive_ALL_GO.csv
## │   │       └── PC2_Positive_KEGG.csv
## │   ├── PC2_Negative_Biological_Process.png
## │   ├── PC2_Negative_Cellular_Component.png
## │   ├── PC2_Negative_Functional_Enrichment.pdf
## │   ├── PC2_Negative_KEGG_Enrichment.png
## │   ├── PC2_Negative_Molecular_Function.png
## │   ├── PC2_Positive_Biological_Process.png
## │   ├── PC2_Positive_Cellular_Component.png
## │   ├── PC2_Positive_Functional_Enrichment.pdf
## │   ├── PC2_Positive_KEGG_Enrichment.png
## │   ├── PC2_Positive_Molecular_Function.png
## │   ├── PC3
## │   │   ├── Negative
## │   │   │   ├── PC3_Negative_ALL_GO.csv
## │   │   │   └── PC3_Negative_KEGG.csv
## │   │   └── Positive
## │   │       ├── PC3_Positive_ALL_GO.csv
## │   │       └── PC3_Positive_KEGG.csv
## │   ├── PC3_Negative_Biological_Process.png
## │   ├── PC3_Negative_Cellular_Component.png
## │   ├── PC3_Negative_Functional_Enrichment.pdf
## │   ├── PC3_Negative_KEGG_Enrichment.png
## │   ├── PC3_Negative_Molecular_Function.png
## │   ├── PC3_Positive_Biological_Process.png
## │   ├── PC3_Positive_Cellular_Component.png
## │   ├── PC3_Positive_Functional_Enrichment.pdf
## │   ├── PC3_Positive_KEGG_Enrichment.png
## │   ├── PC3_Positive_Molecular_Function.png
## │   ├── PC4
## │   │   ├── Negative
## │   │   │   ├── PC4_Negative_ALL_GO.csv
## │   │   │   └── PC4_Negative_KEGG.csv
## │   │   └── Positive
## │   │       ├── PC4_Positive_ALL_GO.csv
## │   │       └── PC4_Positive_KEGG.csv
## │   ├── PC4_Negative_Biological_Process.png
## │   ├── PC4_Negative_Cellular_Component.png
## │   ├── PC4_Negative_Functional_Enrichment.pdf
## │   ├── PC4_Negative_KEGG_Enrichment.png
## │   ├── PC4_Negative_Molecular_Function.png
## │   ├── PC4_Positive_Biological_Process.png
## │   ├── PC4_Positive_Cellular_Component.png
## │   ├── PC4_Positive_Functional_Enrichment.pdf
## │   ├── PC4_Positive_KEGG_Enrichment.png
## │   ├── PC4_Positive_Molecular_Function.png
## │   ├── PC5
## │   │   ├── Negative
## │   │   │   ├── PC5_Negative_ALL_GO.csv
## │   │   │   └── PC5_Negative_KEGG.csv
## │   │   └── Positive
## │   │       ├── PC5_Positive_ALL_GO.csv
## │   │       └── PC5_Positive_KEGG.csv
## │   ├── PC5_Negative_Biological_Process.png
## │   ├── PC5_Negative_Cellular_Component.png
## │   ├── PC5_Negative_Functional_Enrichment.pdf
## │   ├── PC5_Negative_KEGG_Enrichment.png
## │   ├── PC5_Negative_Molecular_Function.png
## │   ├── PC5_Positive_Biological_Process.png
## │   ├── PC5_Positive_Cellular_Component.png
## │   ├── PC5_Positive_Functional_Enrichment.pdf
## │   ├── PC5_Positive_KEGG_Enrichment.png
## │   ├── PC5_Positive_Molecular_Function.png
## │   ├── PCA_3DPCA.pdf
## │   ├── PCA_biplot.pdf
## │   ├── PCA_loading_bar.pdf
## │   ├── PCA_loading_heat.pdf
## │   ├── PCA_overview.pdf
## │   ├── PCA_pairs_plot.pdf
## │   ├── PCA_screen_plot.pdf
## │   └── PCA_scree_plot.pdf
## └── SampleQC_dist_pcc.pdf
## 
## 19 directories, 113 files

Read the results:

dds.peak.results.ordered = read.csv( "/home/songyabing/R/learn/tmp/DEbPeak/ATACDE/DEG/Condition_KO_WT_all.csv", row.names = 1)
head(dds.peak.results.ordered)
##                                           baseMean log2FoldChange    lfcSE
## chr8:94825359-94825432|Ciapin1|P         12.106577       6.439216 1.527019
## chr3:105628355-105628501|Ddx20|I         11.180004       6.323179 1.523447
## chr1:118440678-118440831|2900060B14Rik|I  7.836836       5.790918 1.569651
## chr1:165368382-165368517|Dcaf6|I          6.140173       5.439287 1.619636
## chr16:59434969-59435053|Gabrr3|I          5.706767       5.348676 1.631421
## chr11:20780454-20780545|Aftph|DI          5.390470       5.269629 1.636609
##                                              stat       pvalue        padj
## chr8:94825359-94825432|Ciapin1|P         4.216854 2.477343e-05 0.001265734
## chr3:105628355-105628501|Ddx20|I         4.150575 3.316417e-05 0.001598853
## chr1:118440678-118440831|2900060B14Rik|I 3.689304 2.248687e-04 0.006892699
## chr1:165368382-165368517|Dcaf6|I         3.358339 7.841226e-04 0.017866061
## chr16:59434969-59435053|Gabrr3|I         3.278539 1.043461e-03 0.021915949
## chr11:20780454-20780545|Aftph|DI         3.219846 1.282594e-03 0.025309971
##                                                 SYMBOL            ENSEMBL
## chr8:94825359-94825432|Ciapin1|P               Ciapin1 ENSMUSG00000031781
## chr3:105628355-105628501|Ddx20|I                 Ddx20 ENSMUSG00000027905
## chr1:118440678-118440831|2900060B14Rik|I 2900060B14Rik ENSMUSG00000107722
## chr1:165368382-165368517|Dcaf6|I                 Dcaf6 ENSMUSG00000026571
## chr16:59434969-59435053|Gabrr3|I                Gabrr3 ENSMUSG00000074991
## chr11:20780454-20780545|Aftph|DI                 Aftph ENSMUSG00000049659
##                                          ENTREZID
## chr8:94825359-94825432|Ciapin1|P           109006
## chr3:105628355-105628501|Ddx20|I            53975
## chr1:118440678-118440831|2900060B14Rik|I    68204
## chr1:165368382-165368517|Dcaf6|I            74106
## chr16:59434969-59435053|Gabrr3|I           328699
## chr11:20780454-20780545|Aftph|DI           216549

Motif enrichment

There are three categories for users to choose to perform functional enrichment: Up_regulated, Down_regulated, Not_regulated. Here, we will use Up_regulated as example.

# this step is time consuming!
da.peak.motif = MotifEnrich(de.res = dds.peak.results.ordered, reg.type = "Up_regulated",
                            homer.motif.path = '~/anaconda3/bin/findMotifsGenome.pl',
                            out.folder = "/home/songyabing/R/learn/tmp/DEbPeak/DAmotif",
                            other.paras = "-len 8,10,12 -size -100,50 -S 25")