|
Advertisement | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
J. Biol. Chem., Vol. 281, Issue 51, 39236-39248, December 22, 2006
Indirect Recognition in Sequence-specific DNA Binding by Escherichia coli Integration Host FactorTHE ROLE OF DNA DEFORMATION ENERGY*![]() 1![]() 2 ¶2![]() ![]() ¶ ||3
From the
Received for publication, July 5, 2006 , and in revised form, October 10, 2006.
Integration host factor (IHF) is a bacterial histone-like protein whose primary biological role is to condense the bacterial nucleoid and to constrain DNA supercoils. It does so by binding in a sequence-independent manner throughout the genome. However, unlike other structurally related bacterial histone-like proteins, IHF has evolved a sequence-dependent, high affinity DNA-binding motif. The high affinity binding sites are important for the regulation of a wide range of cellular processes. A remarkable feature of IHF is that it employs an indirect readout mechanism to bind and wrap DNA at both the nonspecific and high affinity (sequence-dependent) DNA sites. In this study we assessed the contributions of pre-formed and protein-induced DNA conformations to the energetics of IHF binding. Binding energies determined experimentally were compared with energies predicted for the IHF-induced deformation of the DNA helix (DNA deformation energy) in the IHF-DNA complex. Combinatorial sets of de novo DNA sequences were designed to systematically evaluate the influence of sequence-dependent structural characteristics of the conserved IHF recognition elements of the consensus DNA sequence. We show that IHF recognizes pre-formed conformational characteristics of the consensus DNA sequence at high affinity sites, whereas at all other sites relative affinity is determined by the deformational energy required for nearest-neighbor base pairs to adopt the DNA structure of the bound DNA-IHF complex.
Site-specific DNA binding by regulatory proteins is a feature of the regulatory processes that maintain, expand, and express genetic information such as replication, recombination, transposition, and transcription. The chemical and physical mechanisms that underlie sequence-specific recognition of regulatory elements by cognate DNA-binding proteins are typically classified as direct versus indirect readout. The former refers primarily to hydrogen bonds between proteins and the unique extra-cyclic substituents at C-4 of pyrimidines, C-6 of purines, and N-7 of purines. These groups provide a base pair-specific pattern of hydrogen bond donors and acceptors in the major groove of DNA that can be directly read by a complementary pattern of amino acid side chain donors and acceptors. Indirect readout refers to recognition of aspects of DNA structure such as intrinsic curvature, topology of major and minor grooves, ordered water structures, local geometry of backbone phosphates, and flexibility or deformability. Because both the local DNA structure and energy to deform DNA are themselves intrinsic sequence-dependent properties, the conserved sequences that distinguish binding sites necessarily include contributions from both direct and indirect mechanisms. Consequently, although the contribution from indirect mechanisms is expected to be significant in protein-DNA complexes that feature substantial DNA deformation, it has proven difficult to evaluate these contributions quantitatively. A protein that relies exclusively, or primarily, on indirect readout would clearly be advantageous for this purpose. The Escherichia coli integration host factor (IHF)4 is one such example of a DNA-binding protein that relies on indirect readout for sequence-specific recognition. IHF is a small, basic (22 kDa) heterodimeric protein that belongs to a general class of histone-like DNA minor groove-binding proteins present in both prokaryotes and eukaryotes (1, 2). Like other members of its class, IHF forms a complex in which the DNA is wrapped around the protein, producing a bend of 180°. Bending plays an architectural role in the primary role for IHF in chromosome condensation (3). However, unlike other class members that exhibit no sequence specificity in DNA binding, IHF also binds in a sequence-specific manner to sites at which bending aids in the formation of higher order structures required for a variety of cellular functions such as site-specific recombination (4), transposition of mobile genetic elements (5), gene regulation (6), and DNA replication (7).
Evaluation of known IHF-binding sites, of which over 170 have been identified (8), has revealed a consensus DNA-binding motif consisting of several small clusters of conserved bases (9-11). These are located primarily in the 3'-half of the site. The two most highly conserved elements are the sequence WATCAA starting near the center of the site and the second sequence TTR located 4 bp in the 3' direction from WATCAA (9). Some IHF sites also contain a poly(A)-tract containing 4-6 adenines and located 8-9 bp in the 5' direction from the WATCAA element (12, 13). Despite a nonrandom distribution of bases throughout the sites, sequence consensus is limited to just these few elements. Sites that contain all of these elements bind IHF with affinities on the order 1 nM and are preferred over random sequences by a factor of
A crystallographic model of IHF bound to a 34-bp DNA fragment containing the H' site of bacteriophage
Much of the bend is centered on two positions 9 bp apart at which proline side chains at the tip of the Previously, we applied a bioinformatics approach to explore the relationship between DNA flexibility and IHF binding. DNA sequences were threaded onto the structure of the H' site in the bound complex (17), and the energy difference between these DNA sequences in the bound conformation and the same sequences in their native, unbound conformations was estimated using the harmonic conformational potentials of Olson et al. (20). Results indicated lower average deformation energy and a narrower distribution for known IHF sites as compared with either sequences selected at random from the E. coli genome, or generated as random sequences matching the E. coli base composition (21). Deformation energy calculated in this manner was used to seed classifiers that could be trained to identify IHF-binding sites (22). Subsequently, this result was extended to four other DNA-binding proteins, which feature highly degenerate consensus DNA-binding sequences and substantial DNA deformation in the bound complexes, and for which high resolution structures are available (23). In this work, we have analyzed experimentally the connection between the free energy change for specific IHF binding and the deformation energy. The goal was to assess quantitatively the contribution from DNA flexibility to indirect readout recognition of specific IHF-binding sites. Initial analysis of data taken from the literature for 32 IHF sites that have been carefully characterized quantitatively suggested both the need and an approach to separate effects of DNA flexibility from sequence-specific, albeit still indirect, recognition mechanisms that pertain to the conserved elements. A subsequent analysis was carried out on de novo sequences designed either to maintain particular combinations of consensus elements while varying the remaining sequence to generate the widest possible range of deformation energies, or designed as control sequences in which consensus elements and nonconsensus sequences were varied interchangeably. An additional set of E. coli genomic sequences was also analyzed. These sequences were selected as putative, but as yet uncharacterized, IHF regulatory sites based on classifiers described previously (21, 22, 24). Like the control sequences and other natural IHF regulatory sites, these exhibit variation in both consensus elements and nonconserved sequences. Results obtained from this analysis detail a significant effect of DNA flexibility that is of increasing significance when fewer consensus elements are conserved.
MaterialsBuffer components and reagents were electrophoresis grade if available and reagent grade otherwise.
IHF Purification and ActivityIHF was purified according to Nash et al. (25). Aliquots were stored at -70 °C. The specific DNA binding activity was determined by conducting site titrations of IHF binding to a high affinity IHF-binding site (Kd
IHF-binding DNA FragmentsDouble-stranded oligonucleotides used in IHF binding assays were generated by annealing complementary single strand oligonucleotides. The double strand DNA oligomers were end-labeled with 32P at both 5' ends using T4 polynucleotide kinase. DNA sequences were 50 bp in length and conformed to the sequence template, 5'-GTTGGCAT(X34)GAACAGGT-3'. X34 denotes a variable sequence corresponding to either different natural IHF-binding sites or selected based on adherence to different combinations of IHF consensus elements (11) and to otherwise maximize variation in deformation energy. The 8-bp flanking sequences are from the
IHF Binding AssaysEquilibrium binding of IHF was monitored by using the EMSA or gel shift assay. Reaction mixtures were prepared containing 1 x 10-10 M duplex DNA oligomer and either no IHF or concentrations ranging from 4 x 10-11 to 3 x 10-7 M in binding buffer consisting of 40 mM Tris-HCl, pH 8.0, 4 mM MgCl2, 70 mM KCl, 0.1 mM dithiothreitol, and 2 µg/ml bovine serum albumin. After 10 min of incubation at room temperature, the full reaction mixture volumes (20 µl) were loaded with the current running on 16 x 20 cm, 8% polyacrylamide gels (7.73% acrylamide, 0.26% N,N'-methylene bisacrylamide) in TAE buffer (0.04 M Tris acetate, pH 8.0, 0.001 M EDTA). Bound and free DNA species were separated by electrophoresis at constant 140 V for 2.5 h at room temperature. Typically, four independent titrations were conducted for each DNA sequence. The de novo sequences were investigated in groups of several at a time, together with the Dried gels were used to expose PhosphorImager screens, which were subsequently scanned using a Bio-Rad Molecular Imager FX. Fractions of bound and free DNA were determined by volume integration of the corresponding electrophoretic bands using the Bio-Rad Quantity One software, version 4.2.1. DNA present in each lane was divided into three fractions. Specifically bound DNA was identified as the distinct mobility-shifted band. A smear of DNA present only at IHF high concentration was interpreted as nonspecifically bound (26). Background exposure was determined by analyzing the regions in the lane with no IHF that would correspond to where bound species would migrate. These background values were subtracted from the pixel values used in the volume integration.
Numerical AnalysisIHF binding was analyzed according to the competitive specific and nonspecific finite lattice model developed for IHF binding by Tsodikov et al. (27). Application of this model to mobility shift assays, for which the experimental observables are fractions of DNA free, of DNA bound specifically, and of DNA bound nonspecifically, yields the partition function given by Equation 1, where
The last two terms in the product give the conditional probability of n-1 empty lattice positions adjacent to any particular empty position in an infinite lattice, as formulated originally by McGhee and von Hippel (28) with a correction to account for end effects in a finite lattice (27); is the binding density in IHF per base pair. Gspec and Gnonspec values were obtained by analyzing the fractions of unbound and of specifically bound DNA according to Equations 3 and 4.
An issue in applying Equations 3 and 4 is the value of n, the site size. Both forward, or ligand to lattice, and also reverse, or lattice to ligand, titrations are needed to evaluate this term empirically. The mobility shift assay provides only the former. Holbrook et al. (26) report n = 9 and n = 16, respectively, at 60 and at 100 mM KCl, ionic strengths that bracket our conditions. We found fitted values of Gspec and Gnonspec to be insensitive to the value of n over this range. Consequently, the data were analyzed with an intermediate value, fixed as n = 12.
The closed form expression for the conditional probability embedded in Equation 2 is an approximation of the exact numerical finite lattice isotherm derived by Epstein (29). Tsodikov et al. (27) find the difference to be within the limits of reasonable experimental uncertainty when the lattice size is at least severalfold larger than the site size, particularly at the low binding densities to which the mobility shift assay is sensitive. Nevertheless, we also analyzed the mobility shift data using a second formulation for nonspecific binding to assess whether this approximation in the nonspecific binding isotherm might affect fitted values of
In this model, Nave accounts for both the average number of IHF dimers bound in the mobility-shifted bands obtained in the gel shifts and the lattice binding statistics. Nave is not restricted to an integer value. Although incorrect chemically and so of little value to explain actual molecular behavior, this model does yield a reasonable phenomenological alternative to the competitive specific and nonspecific finite lattice model, i.e. one that conforms to the nonspecific binding phase well, and so yields a precise estimate of Gspec, the parameter of interest. Nonlinear least squares analysis was conducted using the Origin 7 software (OriginLab Corp.). The Origin software estimates parameter values corresponding to a minimum in the variance. Joint confidence limits that account for correlation between parameters are estimated by adjusting each parameter individually while refitting the others to search for a variance ratio as predicted by the F statistic (30). Confidence limits reported correspond to the 95% interval. When global analysis of multiple experiments was conducted, normalized weights were calculated for the individual data from the square roots of the variances of separate fits to each of the individual experiments.
Analysis of Naturally Occurring IHF-binding SitesFig. 1 shows an atomic model based on the crystallographic structure of IHF bound to the 34-bp DNA fragment containing the H' site of bacteriophage (17). Consensus elements are color-coded as described in the figure legend for ease of identification and discussion. The dominant features contributing to the DNA deformation are the sharp kinks at the ApA steps of the two proline intercalation sites, as shown in orange. Although the DNA flanking these sites returns to a canonical B-form structure, within half a helical turn in either direction, the different base pair steps are deformed to varying extents; bending is anisotropic both with respect to degree and direction. Approaches based on general DNA flexibility, even at individual steps, are inadequate to describe the energetic consequences of this situation. For this reason, we have applied the harmonic spring model of Olson et al. (20) to capture the molecular level detail of the structural model in estimating deformation energies. The reference structure for these calculations is the structure shown in Fig. 1. All other sequences were threaded onto this structure. As a starting point for the analysis, we compared the free energy change for IHF binding to the deformation energy in a series of naturally occurring, specific binding sites. The sites included in this analysis meet three criteria. First, each site is involved in a specific regulatory process, such as described in the Introduction, thereby confirming the location-specific nature of binding. Second, the evidence for DNA binding was derived from direct DNA binding assays. Third, titrations were conducted as necessary to provide quantitative estimation of the binding free energy. Table 1 presents the results compiled for 32 IHF-binding sites, the complete list for which reliable binding energies were obtained according to these criteria.
A concern in comparing literature values is the effect of different reaction conditions that were employed in the various studies. In general, variations in monovalent and divalent salt concentration, temperature, and pH would be expected to contribute significantly to apparent differences in DNA binding affinity. However, these IHF binding affinities are particularly advantageous in this regard. First, the majority of binding experiments were conducted within fairly narrow ranges of KCl concentration (50-70 mM), MgCl2 concentration (4-6 mM), temperature (20-25 °C), and pH (7.5-8.0), although some of the reaction conditions do fall outside these ranges. Second, IHF is unusual among DNA-binding proteins for its lack of salt dependence over the range of concentrations used in all of the experiments reported (26).5 Third, where hard data exist, reported values were corrected to a standard set of reaction conditions. G0 values calculated from Kd values reported were corrected to a standard temperature (20 °C) using the standard thermodynamic relationship as shown in Equation 9,
where Ka,ref and Ka,T refer to the association equilibrium constants at the reference (293 K) and experimental temperatures, and R is the gas constant. Record and co-workers (26) used isothermal titration calorimetry to determine H0 and C0p values for IHF binding to the H' sequence at different KCl concentrations. Interpolation was used to calculate values for the particular KCl concentrations used for the determination of each Kd value. We did not attempt to correct to a standard pH because dlnK/dln[H+] is not known for most of the applicable pH range.
Only a weak correlation between these calculated standard binding energies (
In accordance with this expectation, a different picture emerges when variation in the complement of consensus elements is controlled. For example, among the 32 sites are 4 that contain all three direct interaction nucleotides, the ApA steps at both sites of proline intercalation and match the sequence-defined consensus at all 9-bp positions, differing only at degenerate positions ( H', tdcA, fimA-II, and fimA-I in Table 1). G0' for IHF binding to these sites varies over a range of 0.8 kcal/mol, or only 3-fold in affinity. Regression analysis of these sites yields R = 0.95, indicating that deformation energy accounts for 90% of this variation (Fig. 2) despite the small variation in G0. The sequences of six additional sites contain either 1 or 2 mismatches to the sequence-defined consensus (pspA, ilvGMEDA-1, ndh, hycA, ompF-2, and dmsABC in Table 1). In general, the affinity for IHF binding to these sites is lower than when the consensus is matched completely, as might be expected, and the range of variation in G0' for IHF binding is much greater. Because both the position(s) of the mismatch(es) and the particular base(s) vary, a weaker correlation would also be anticipated. However, regression analysis yields R = 0.72 indicating that deformation energy accounts for 52% of the variation in this case (Fig. 2). In contrast to these findings, sequences that vary more widely in sequence within the consensus elements, or that are missing elements altogether, do not show similar correlation. Analysis of de Novo IHF SitesThese results from analysis of natural IHF sites suggested a general approach to the analysis of the contribution from DNA flexibility in which recognition of specific sequences would be controlled by fixing specific combinations of consensus binding elements, whereas other sequence positions would be varied to yield the widest possible range of deformation energy. An advantage is that this yields a single set of internally consistent results at a standard condition, thus removing a large source of expected variation. In addition, it addresses an inherent limitation of the natural sequences, which is a relatively narrow range of deformation energy as compared with the entire E. coli genome (21). Sequences designed in this manner were compared with a set of control sequences that feature variation both in the conservation of consensus elements and also in nonconserved regions to generate a broad range of deformation energies. Finally, E. coli sites predicted to be high affinity IHF-binding sites by algorithms based in part on deformation energy (24) were also evaluated. Sequences and results for 49 sequences analyzed are presented in Table 2. These results represent the analysis of over 200 separate binding experiments.
Representative gel mobility shift data for two sites are shown in Fig. 3. A distinct band of lower mobility shown in Fig. 3A contains the specific IHF-DNA complex with the DNA wrapped around the protein as shown in Fig. 1. We confirmed this interpretation by conducting gel shift experiments as site titrations of IHF binding to DNA in which the 5'-phosphates at the two ends were conjugated with fluorescein and with tetramethylrhodamine to yield a donor-acceptor pair for Forster resonance energy transfer. DNA bending in the specifically bound complex brings the ends of the DNA to within about 55 Å resulting in efficient fluorescence resonance energy transfer that is absent in free DNA (32). Additional bands with successively decreasing mobility become evident at higher IHF concentrations. These are poorly resolved, thus generating a smear. These necessarily represent higher order complexes with more than one IHF dimer bound. A limiting band was observed at the highest IHF concentrations, indicating that saturation had been achieved. Record and co-workers (26) have described distinct specific and nonspecific IHF binding modes and used isothermal titration calorimetry to investigate the thermodynamics of both binding modes. Nonspecific binding does not produce the large DNA bend that is characteristic of the specific complex, thus yielding a much smaller length of contact with DNA and providing for simultaneous binding of multiple IHF dimers on a 50-bp oligonucleotide. At high IHF concentrations where IHF is in great molar excess over DNA, the nonspecific binding mode can compete successfully with the specific binding mode because of its higher stoichiometry, despite lower intrinsic affinity. Thus, the additional lower mobility bands in our experiments represent successive nonspecific associations by IHF. Different mobility bands might also represent different locations of bound IHF along the 50-bp DNA fragment. The concentration ranges over which IHF binds in specific and nonspecific modes overlap, particularly for the lower affinity specific IHF sites. Consequently, it was necessary to analyze the binding resulting from both mechanisms to obtain accurate quantitative values for specific binding. Because the individual bands produced by nonspecific binding are poorly resolved electrophoretically and their identity is difficult or impossible to interpret as reflecting particular molecular species, these bands were combined into a single fraction comprising all nonspecifically bound DNA, without regard to number or location of bound IHF dimers. Thus, our analysis of the gel images generated three fractions, corresponding to unliganded oligonucleotides, the 1:1 specifically liganded complex, and nonspecifically liganded complexes.
Numerical analysis of the gel shift data applied the competitive specific and nonspecific finite lattice model (27) that is implemented in Equations 3 and 4 as described under "Experimental Procedures." Fig. 4 presents data from three independent titrations of one IHF binding sequence analyzed globally using this model. Fig. 4 also shows the results of analysis by the phenomenological Hill model described by Equations 5, 6, 7, 8 ("Experimental Procedures"). The fitting results yield essentially identical transitions for specific binding and indistinguishable transitions for nonspecific binding. The former was found to be the case for all 49 DNA sequences evaluated. In all cases, the difference between the estimates of Gspec obtained from the two models was within the confidence intervals obtained by either model alone. Results for all 49 sequences are listed in Table 2.
It is significant that a tight distribution of nonspecific affinities was obtained for these sequences (
The range of deformation energies exhibited by these sequences expands that of the natural IHF sites by more than 2-fold. These synthetic sites mimic the full range of deformation energies represented by the E. coli genome. Specific binding affinities vary by over 500-fold for these sequences. In nine cases, the specific complex band was not observed, indicating insufficient specific binding affinity to compete with nonspecific binding. Analysis of weakly specific binding sequences indicates that Taken as a whole, the data for all 49 sequences exhibit a very weak correlation between binding energy and deformation energy. The correlation coefficient is only 0.23, the same as for the natural sites, suggesting only 5-6% of the overall variation can be accounted as due to deformation energy. However, this value is adversely affected by the significant number of sequences (20%) for which low affinity specific binding is obscured by nonspecific binding. These sequences are skewed toward higher than average deformation energy, so that the limit to measurable specific affinity tends to decrease the apparent correlation between affinity and deformation energy. This effect is apparent in plots of the de novo sequences shown in Fig. 5. In support of this point, it is interesting to note that no high affinity sequence was found whose deformation energy exceeds the largest found among the natural IHF sites, i.e. greater than 135 kcal/mol. The absence of points in Fig. 5, lower right quadrant, contributes significantly to the correlation observed. The first 20 sequences (labeled A-G) listed in Table 2 includes seven distinct series of related sequences. The sequences within each series are identical with respect to consensus elements but are variable otherwise. Thus, for example, sequences A.1-A.6 contain all consensus elements, including a 6-bp A-tract and ApA steps at both sites of proline intercalation. Sequences designated B-D have each relaxed the requirement for one or more consensus element or have substituted a different base at one of the degenerate positions in the consensus sequence. Series E also has relaxed the requirement to match the consensus WATCAAnnnnTTR element, other than the three directly contacted bases. The sequences in series F and G each contain the sequence from E.1 that was substituted for the WATCAAnnnnTTR consensus element. This was treated as constituting a modified consensus sequence. In series F, this was held constant, along with all other consensus elements, and the remainder of the sequence varied, just as in series A. In series G, the poly(A)-tract was also allowed to vary, just as in series B.
Each of these series A-G exhibits a distinct correlation between
The individual regression lines form a fan-like pattern as if radiating from a single point at a low value of the deformation energy. To assess this possibility, the data analyzed globally were subject to the constraint that the regression lines intersect at a single point. This generated a family of regression lines that are similar to, or indistinguishable from, the individual regression lines, e.g. most notably for the limiting two series that have the flattest and steepest trend lines, respectively (Fig. 5A). One way to compare the pairs of lines derived from local and global regression models is by their slopes, because these indicate the correlation between Gspec and deformation energy. As shown in Table 3, the slopes are not distinguishable for any pair of lines. The correlation coefficient from the global fit is 0.96, suggesting that most of the variation observed can be accounted by this model. The intersection of the curves occurs at deformation energy of 17 kcal/mol (confidence limits of 0 and 45 kcal/mol), corresponding to a relatively small contribution from DNA flexibility. Gspec is -12.0 ± 0.6 kcal/mol at the point of intersection. At this point, IHF binding becomes insensitive to DNA flexibility suggesting an upper limit to the affinity achievable by IHF of 1 nM under these conditions.
Contrasting results were obtained for two sets of control sequences. The first control set (series H, Table 2) contains 15 synthetic sequences in which both the complement of conserved elements and all nonconserved positions were varied to adjust the deformation energy over the widest possible range. These sequences cover very similar ranges of both deformation energy and
The second control consists of a set of 14 putative IHF regulatory sites selected from the E. coli genome. The purpose of this analysis was to complement the analysis of natural IHF regulatory sites listed in Table 1 but with self-consistent quantitative assays of IHF binding. These are previously uncharacterized sites that were selected based on the proximity to E. coli promoters and their location in supercoiling-induced DNA deformation or SIDD loci (33, 34). SIDD loci contain DNA sequences with the greatest propensity to denature or undergo B-Z transitions under conditions of high torsional strain resulting from DNA supercoiling. These sites have been implicated in the mechanism of global transcriptional activation mediated by IHF (35, 36). The first 10 sequences (named in Table 2 by proximity to a particular E. coli gene) were also selected by one or more classifier algorithms trained to recognize aspects of IHF site sequence and structure (full description of selection given in Ref. 24). The last four sequences were scored as less likely to be high affinity IHF sites, based on these same selection criteria. Ten of these 14 sites are found to be high affinity, site-specific sites for IHF; four display only the nonspecific binding mode. These sequences share several characteristics of natural IHF sites, including highly variable complement of consensus elements, a similar narrow range of deformation energies (
DNA-binding proteins recognize specific sequences of base pairs both via direct contacts between amino acid side chains and the purine and pyrimidine bases, and also indirectly by recognizing elements of DNA structure. Moreover, indirect readout mechanisms include not only recognition of preformed features of DNA such as bends, groove topology, local geometry of base pair steps, and geometry of backbone phosphates but also binding-induced distortion of DNA or flexibility. The latter is particularly important to DNA scaffolding proteins such as are required both for chromosome organization and also for regulation of a variety of processes that physically manipulate DNA such as recombination, transposition, replication, and transcription. Because all of these recognition mechanisms rely on intrinsic properties of the particular sequence of bases, it poses a difficult problem to assess quantitatively the relative contributions from each. Because it relies entirely on indirect readout mechanisms, IHF presents an excellent opportunity to assess the relative contributions from preformed structure and from DNA flexibility to indirect readout. Moreover, because IHF functions both as a chromatin organizing protein and also as a specific regulator of genomic processes, this issue is directly relevant to its biological function. To assess the contribution from DNA flexibility, it is first necessary to evaluate the energy required to deform a particular DNA sequence from its preferred unbound conformation to the conformation in the bound complex. Olson et al. (20) first realized that a local sequence-dependent potential could be calculated by analyzing the conformational ensembles of individual base pair steps. Harmonic potentials of mean force were derived from analysis of the structures of protein-DNA complexes along the coordinates of a simplified conformational model that used six parameters to describe the translational and rotational orientation of adjacent base pairs. Sarai and coworkers (37, 38) developed a similar formalism independently and used it to calculate the interaction energies and relative contribution to specificities of direct and indirect readouts in many protein-DNA complexes. More recently, molecular dynamics simulations of all 136 unique tetranucleotides have been used to obtain ensembles of conformations of individual base pair steps (39). Their harmonic potentials of mean force yielded similar results as those obtained by Olson et al. (20). When we applied this model to assess the change in conformational energy for wrapping of known IHF-binding sites, we found a moderate negative correlation between the free energy change for binding and the deformation energy. This is as expected if favorable protein-DNA contacts are used to drive the (necessarily) unfavorable change in DNA conformation. Nevertheless, the correlation coefficient suggests that the majority of the variation in binding affinity must be accounted for by another mechanism. The probable reason for this is the strong influence of pre-existing structural characteristics of the conserved IHF recognition elements (17, 40). For example, the consensus hexamer, WATCAA, in the center of the site features a number of unusual base stacking and pairing geometries that facilitate van der Waals packing of protein side chains in the minor groove and optimize H-bonds (17). The narrow minor groove in this region is recognized specifically by a protein clamp that makes salt bridges to phosphates flanking the groove. The A-tract also features a narrow minor groove and a characteristic spine of hydration that provides a regular array of water-mediated H-bond partners. The preference for ApA steps at both sites of proline intercalation might be explained as due to its energetic preference, relative to other base pair steps, to form the particular kinked conformation found in the complex (41). Recent structural and biochemical analysis of a reduced specificity IHF mutant indicates how the unique, sequence-dependent structure of the TTR element contributes to specificity (42). Thus, pre-formed, sequence-dependent structural characteristics constitute a strong component of recognition for each of the conserved elements (17, 18). It is not surprising that variable combinations of these recognition elements, as found in the natural binding site sequences, would tend to obscure a contribution from flexibility. This led to our strategy of assessing the effect of deformation energy within individual groups of related sequences, where each group contains a fixed configuration of consensus elements. Preliminary analysis of such groups as afforded by natural IHF sites indicates high correlation, suggesting that deformation energy accounts for the majority of the variation. A more systematic analysis of designed sequences confirms this result. These sequences expand the range of deformation energies by about 2-fold as compared with natural IHF sites, approximating the distribution of E. coli sequences chosen at random (22). The correlation between binding and deformation energy is also similar as for IHF sites, both for the control sequences plotted in Fig. 5B and for all 49 sequences listed in Table 2 considered as a whole. In contrast, very strong correlation is observed for each of the individual series plotted in Fig. 5A. The most significant feature apparent in Fig. 5A is the effect when consensus elements are removed and replaced by DNA sequences that expand the range of deformation energy. In every case, i.e. removal of the A-tract (compare series B to A and series G to F), removal of a proline intercalation ApA step (compare series D to B), and alteration of the WATCAAnnnnTTR consensus sequence (compare series F to A), the result is decreased affinity. This result is consistent with what has been demonstrated previously in a systematic manner for the A-tract (13). In addition, the slopes of the regression lines are steeper in direct proportion to the affect on affinity. Thus, substitution for consensus sequences both lowers affinity and increases sensitivity to DNA flexibility. Three of the seven series considered contain only two members. Because these lines have no degrees of freedom, their slopes are of course quite sensitive to experimental uncertainty or secondary effects of sequence of either of the two experimental points. However, two of these three cases conform well to the overall pattern. The one case that does not (series C) is the only one to replace two consensus elements together. Caution should also be exercised when interpreting the absolute values of the slopes. The magnitude of the calculated deformation energies is large compared with experimental values of the free energy to bend DNA, e.g. as determined from circular permutation assays. In applying these potentials to analyze contributions to binding affinity in a number of systems, it has been found necessary to fit scaling factors to bring the calculated values in range of experimentally observed values (cf. Morozov et al, (43)). Although the scale of the independent variable may not accurately reflect the translation of conformational energy into binding energy, the relative slopes are significant. These vary by over 30-fold in a systematic manner. These data indicate that deformation can contribute up to 1.4 kcal/mol, or an order of magnitude in affinity, over the observable range of deformation energy and suggest a much larger potential contribution, were it not obscured in our assays because of competition from the nonspecific IHF binding mode. The fan-shaped family of regression lines that these individual sequence series define suggests a point of convergence at low deformation energy. This is confirmed by the results of a global regression analysis that was conducted to define this point. The analysis yields lines that are essentially indistinguishable from the separate regression lines in half of the cases, and within the confidence limits for the rest. The two significant observations are as follows: first, the point of convergence is at a finite, albeit low, value of deformation energy; and second, the slope for series A, sequences that include perfect matches to all consensus elements, is essentially zero (0.0012 ± | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||