Peak Calling

Identification of likely regions where a protein is bound given a set of mapped reads.

“Peaks” are defined as an increased signal point where the signal is the number of mapped reads at a particular location along the genome.

Peak calling transforms the mapping data into signal data and identifies the peaks in the signal as a region on the genome. These peaks are usually stored in a Bed File.

Intuitively, one can think of “peaks” as a stack of aligned reads. The higher the stack, the higher the peak. The wider the stack is (alignments can increase width), the wider the peak.

Peak data is aquired from either protein binding (ChIP-seq, Cut&RUN), chromatin-conformation (Hi-C, 4C-seq, or chromatin accessibilitiy (ATAC-seq) assays.

  • separating peaks logically is difficult

Regulatory Elements

  • Transcription factors interact with Histones
  • Histone modifications forms a “code” that results in different regulatory effects

Assays

  • HiChIP and Hi-C are chromatin conformation assays
  • DNase-seq and ATAC-seq are accessibility assays
  • ChIP-seq and CUT and RUN and CUT and TAG are TF binding assays

When there is no antibody for the protein

  • Fuse a GFP tag to protein of interest, use antibody with GFP
Links to this page
  • ChIP-seq

    The initial output of ChIP-seq is a set of (short) reads corresponding to the sequences that are bound by a protein. These reads are then run through quality control, then mapped to a genome. The mapping locations do not yet tell us which part of the genome the protein was bound, so Peak Calling must be performed to identify likely regions where the protein is bound (peaks). These peaks are then annotated with relevant information, such as gene start sites, known promoter locations, etc.