Widespread evidence to the role of flanking regions on transcription factor binding preferences
Genome-wide technologies identify the DNA binding motifs of hundreds of different transcription factors (TFs). However, it is still not clear what distinguishes bound from the vast amount of similar but unbound motifs in the genome. Whereas it is well established that TF binding depends on different mechanisms, such as methylation level, chromatin structure and cofactors, these effects have not been sufficient to explain the majority of TF binding preferences. Moreover, recent in vitro binding assays, which exclude the effect of cellular environment, demonstrate selective binding of motifs for many TFs. These observations raises the possibility that the information determining TF specificity is also directly encoded within the DNA, most likely with the context surrounding a motif playing an important role. Here we aimed to investigate the direct contribution of the DNA sequences flanking the core binding motif on DNA binding. To this end, we analyzed the DNA sequences and the three-dimensional DNA structure flanking the motifs of 192 and 72 TFs, extracted from in vitro binding assays and in vivo ChIP-seq data. Selecting all bound sequences containing the known motifs and comparing their DNA sequence and structural properties to non-binding sequences revealed significant differences between bound and unbound motifs at the regions flanking the core motifs. Notably, the binding sites of TFs belonging to similar families exhibited common features, both in vitro and in vivo. We propose that these unique features assist in guiding TFs to their cognate binding sites.