Prepare Count Matrix and Sample Metadata for Peak-related data.

PeakMatrix(
  meta.file,
  count.matrix = FALSE,
  min.overlap = 2,
  summits = 200,
  use.summarizeOverlaps = TRUE,
  filter = 1,
  blacklist = TRUE,
  sub.control = TRUE,
  used.cols = c("SampleID", "Condition"),
  out.folder = NULL,
  species = c("Human", "Mouse", "Rat", "Fly", "Arabidopsis", "Yeast", "Zebrafish",
    "Worm", "Bovine", "Pig", "Chicken", "Rhesus", "Canine", "Xenopus", "Anopheles",
    "Chimp", "E coli strain Sakai", "Myxococcus xanthus DK 1622"),
  seq.style = c("UCSC", "NCBI", "Ensembl", "None"),
  gtf.file = NULL,
  up.dist = 3000,
  down.dist = 3000,
  ...
)

Arguments

meta.file

Sample metadata contains peak related information (eg: sample, peakPath, bamPath, condition) or peak count matrix file. Should be tab-separated.

count.matrix

Logical value, whether the meta.file is a count matrix file. Default: FALSE.

min.overlap

Only include peaks in at least this many peaksets in the main binding matrix. Default: 2. Parameter of dba. Used when count.matrix is FALSE.

summits

If the value is greater than zero, all consensus peaks will be re-centered around a consensus summit, with the value of summits indicating how many base pairs to include upstream and downstream of the summit (so all consensus peaks will be of the same width, namely 2 * summits + 1). Default: 200. Parameter of dba.count. Used when count.matrix is FALSE.

use.summarizeOverlaps

Logical value, indicating that summarizeOverlaps should be used for counting instead of the built-in counting code. Default: TRUE. Parameter of dba.count. Used when count.matrix is FALSE.

filter

Filter intervals with low read counts based on RPKM values. Default: 1. Parameter of dba.count. Used when count.matrix is FALSE.

blacklist

Species-specific abnormal regions to be removed. Choose from "DBA_BLACKLIST_HG19", "DBA_BLACKLIST_HG38", "DBA_BLACKLIST_GRCH37", "DBA_BLACKLIST_GRCH38", "DBA_BLACKLIST_MM9", "DBA_BLACKLIST_MM10", "DBA_BLACKLIST_CE10", "DBA_BLACKLIST_CE11", "DBA_BLACKLIST_DM3", "DBA_BLACKLIST_DM6", TRUE (auto-detection genome), a GRanges object containing the blacklisted regions. Default:TRUE. Parameter of dba.blacklist. Used when count.matrix is FALSE.

sub.control

Logical value, whether Control read counts are subtracted for each site in each sample. Default: TRUE. Parameter of dba.count. Used when count.matrix is FALSE.

used.cols

Used columns used to create sample metadata. If specified, sampleID should be placed first. Default: c("SampleID", "Condition"). Used when count.matrix is FALSE.

out.folder

Output folder to save created count matrix and sample metadata. Default: NULL (current working directory).

species

Species used, chosen from "Human","Mouse","Rat","Fly","Arabidopsis","Yeast","Zebrafish","Worm","Bovine","Pig","Chicken","Rhesus", "Canine","Xenopus","Anopheles","Chimp","E coli strain Sakai","Myxococcus xanthus DK 1622". Default: "Human".

seq.style

The style of sequence, chosen from UCSC, NCBI, Ensembl, None. This should be compatible with the genome and gtf file you used to generate count matrix and peak files. Default: "UCSC".

gtf.file

GTF file used to create TxDb object. Useful when specie you used is not available in species. Default: NULL.

up.dist

The upstream distance from the TSS. Default: 3000bp.

down.dist

The downstream distance from the TSS. Default: 3000bp.

...

Parameters for annotatePeak.

Value

A dataframe contains count matrix, peak annotation and sample metadata (if provided used.cols). And all save the corresponding results to consensus_peak_matrix.txt, consensus_peak_anno.txt and peak_metadata.txt files.

Examples

# library(DEbPeak) # library(DESeq2) # metadata file contains peak and bam information # beaware of the PeakCaller type (determine the score column) # meta.file = 'path/to/metadata' # PeakMatrix(meta.file = meta.file, species = "Human", seq.style = "UCSC", # up.dist = 20000, down.dist = 20000)