Covariation between homeodomain transcription factors and the shape of their DNA binding sites
Protein–DNA recognition is a critical component of gene regulatory processes but the underlying molecular mechanisms are not yet completely understood. Whereas the DNA binding preferences of transcription factors (TFs) are commonly described using nucleotide sequences, the 3D DNA structure is recognized by proteins and is crucial for achieving binding specificity. However, the ability to analyze DNA shape in a high-through-put manner made it only recently feasible to integrate structural information into studies of protein–DNA binding. Here we focused on the homeodomain family of TFs and analyzed the DNA shape of thousands of their DNA binding sites, investigating the covariation between the protein sequence and the sequence and shape of their DNA targets. We found distinct homeodomain regions that were more correlated with either the nucleotide sequence or the DNA shape of their preferred binding sites, demonstrating different readout mechanisms through which homeodomains attain DNA binding specificity. We identified specific homeodomain residues that likely play key roles in DNA recognition via shape readout. Finally, we showed that adding DNA shape information when characterizing binding sites improved the prediction accuracy of homeodomain binding specificities. Taken together, our findings indicate that DNA shape information can generally provide new mech-anistic insights into TF binding.
Correlation between homeodomain sequences and the DNA sequences and shapes of their binding sites. (A) PCC between pair-wise HD sequence similarity and pair-wise DNA sequence and shape similarity are shown for HDs in mouse derived from PBM data (14). (B) Co-crystal structure of engrailed HD in complex with DNA (PDB ID 3HDD) (29). Purple represents the N-terminal tail, and blue highlights the recognition helix. (C) PCCs for the comparison of pair-wise HD sequence similarity scores, using only the sequence of the N-terminal tail (purple) or the recognition helix (blue), with pair-wise DNA sequence and shape similarity scores. PBM data for 168 mouse HDs (14) were used in this analysis.