Theory is as Theory Does.

Posted 11 October 2004 by Reed A. Cartwright

↗ The current version of this post is on the live site: https://pandasthumb.org/archives/2004/10/theory-is-as-th.html

by Ian F. Musgrave, Steve Reuland, and Reed A. Cartwright

“There’s precious little real biology in this project,” Mr. Behe said. For example, he said, the results might be more persuasive if the simulations had operated on genetic sequences rather than fictitious computer programs.

Michael J. Behe was commenting in The Chronicle of Higher Education (Kiernan 2003) on a paper reporting that digital organisms could evolve irreducibly complex systems without intervention (Lenski et al. 2003). Ironically, Behe has just coauthored a theoretical paper with David W. Snoke on the evolution of complex molecular adaptations that has “precious little real biology” in it. William Dembski has already stated that Behe and Snoke’s research “may well be the nail in the coffin [and] the crumbling of the Berlin wall of Darwinian evolution” (Dembski 2004). Despite the common claim made by “intelligent design” activists that evolution is in trouble, they have so far been unsuccessful in presenting their arguments to the scientific community. Is this the long-awaited peer-reviewed publication which will finally do it?

No.

Although some in the “intelligent design” community tout Behe and Snoke’s paper as the long-awaited theoretical paper (Discovery Institute 2004), it contains no “design theory”, makes no attempt to model an “intelligent design” process, and proposes no alternative to evolution. In reality Behe & Snoke (2004) is an unmemorable investigation of neutral drift in protein and nucleic acid sequences. As we will show, the paper cannot even support the modest claims it does make.

Behe has argued that certain biochemical systems or structures have a property he calls “irreducible complexity” (Behe 1996). Irreducible complexity is an update on the old anti-evolutionist argument of “What good is half a wing?” Simply put, the idea is that all the components of a system must be present before the system can accomplish its current function, and, therefore, there is no gradual, step-by-step way for natural selection to construct the system being considered. Behe’s arguments have attracted significant criticism, and Behe has recast his definition of “irreducible complexity” from time to time to attempt to deal with these criticisms. One of these re-definitions of “irreducible complexity” introduces the notion of unavoidable gaps in a stepwise progression of adaptive evolutionary intermediates of a system, such that the only means of bridging the gap is via neutral evolution (Behe 2000). The current paper is apparently an attempt to quantify just how improbable this process of bridging gaps via neutral mutation might be, and thus we may expect that when Behe makes arguments along these lines in the future he will point to this paper as the justification for saying that evolution does not have the time available to fashion “irreducibly complex” structures.

Although Behe and Snoke are “skeptical” of the ability of “Darwinian processes” to account for the evolution of complex biochemical systems (Behe & Snoke 2004), there is substantial scientific evidence for the conclusion that complex biochemical systems can and have evolved on earth. There is direct experimental research (Hall 2003). There are observed instances of it occurring in nature (Copley 2000, Seffernick & Wackett 2001, Johnson et al. 2002). There is also much evidence from comparative studies (Melendez-Hevia et al. 1996, Cunchillos & Lecointre 2003 as two examples). However, we don’t have enough space to fully explore the large amount of data that Behe and Snoke’s “skepticism” must overcome. We have our hands full just explaining some of the manifold ways in which Behe and Snoke’s model fails to represent molecular evolution. The flavor of Behe and Snoke’s paper may be gauged by the fact that the authors are skeptical of Darwinian processes to produce complex structures, yet use a model which largely ignores Darwinian processes. In the following sections we will examine some of the more questionable aspects of Behe and Snoke’s paper.

Contents
One True Sequence?
Evolution of DPG Binding
Subfunctionalization vs. Neofunctionalization
Rho-Oh!
How unlikely is the evolution of MR features?
Applying Behe & Snoke’s equations to the DPG binding site example
Conclusion
Acknowledgements
References

One True Sequence?

Theory is rather hard to review, and it is not uncommon for problematic theory to get published. For this reason, post-publication review is very important to sifting good theory from bad. We cannot emphasize this enough. The theoretician among us only trusts theory that either has withstood the test of time or has been satisfactorily replicated by him. (This paper satisfies neither.) There are two issues in creating good theory: (1) getting the modeling right and (2) using assumptions relevant to nature. Sometimes the former dooms theory, but in most instances it is the latter. We believe there are some issues with the modeling in the paper, but investigating them is too complex for this essay. However, even a rough reading of the paper makes it clear that Behe and Snoke’s work and the conclusions they draw are not relevant to nature.

Behe and Snoke are attempting to estimate how long and how large of a population it would take for a protein in the absence of selection to evolve a new binding site or other complex feature. They assume up-front that multiple amino acid substitutions would be required before the new feature can be preserved by natural selection. They call this a “multiresidue” feature (MR).

Behe and Snoke are modeling the formation of a completely new binding site in a duplicated protein. While this process is important in generating some kinds of new function, in the majority of duplicated proteins, existing binding sites are either modified to act on new substrates (for example the expansion of the protein kinase (Manning et al. 2002, Caenepeel et al. 2004, Waterston et al. 2002) and G-protein coupled receptor families in vertebrates) or new catalytic mechanisms (the TIM family of proteins; Schmidt et al. 2003, Gerlt & Babbitt 2001). (See Musgrave 2003 for more discussion about the evolution of binding sites. See Musgrave 2004 for a discussion on disulfide bonds.) Duplication of enzymes and modification of their existing binding sites can produce quite complex pathways; for example the clotting cascade is due to duplication of proteolytic enzymes with a subsequent change in substrates (Miller 1999 and the references therein).

Behe and Snoke use five parameters to model the neutral evolution of their sequences: the number of mutations needed for the multi-residue feature (λ), the ratio of null mutations (ones that create a pseudogene) to needed mutation (ρ), the number of duplicate genes in the population (N), the selection coefficient of the multi-residue feature (s), and the instantaneous rate of point mutation per site (ν). (Contrary to what Behe and Snoke claim, ν is not a per generation rate of nucleotide mutation.) In the model, selection only acts upon a duplicate gene’s multi-residue feature when it has acquired the particular λ “compatible” mutations and none of the λρ mutations that would nullify it. Behe and Snoke implement nullification by assuming that the duplication is deleted when the first “compatible” mutation (forward or backward) occurs after the first null mutation occurs. The deleted duplication is instantly replaced by a new gene that has no compatible and no null mutations. We call this “reset on null.” After a bit of math and some simulations Behe and Snoke estimate the average time to first appearance (T_f) and the average time to fixation (T_fx) of a multi-residue trait under various parameter sets. They use these results to conclude that point mutations plus selection cannot explain the evolution of multi-residue traits.

However, the model has many restrictive assumptions that prevent it from supporting the conclusion that Behe and Snoke make. In reality, the paper says that if you have a protein function that requires two or more specific mutations in specific locations in a specific gene in a specific population, and if the function is not able to be acted on by natural selection until all mutations are in place and if the only form of mutation is point mutation, and if the population of organisms is asexual, then it will take a very large population and very long time to evolve that function. This is not unexpected.

Furthermore, this paper follows the fallacious fascination with the “one true sequence” that is popular in the “intelligent design” community (for example Meyer 2004). Behe and Snoke end their paper with the following conclusion:

Although large uncertainties remain, it nonetheless seems reasonable to conclude that, although gene duplication and point mutation may be an effective mechanism for exploring closely neighboring genetic space for novel functions, where single mutations produce selectable effects, this conceptually simple pathway for developing new functions is problematic when multiple mutations are required. Thus, as a rule, we should look to more complicated pathways, perhaps involving insertion, deletion, recombination, selection of intermediate states, or other mechanisms, to account for most [multi-residue] protein features.

(Behe & Snoke 2004 p11)

No matter how much Behe and Snoke want to make this conclusion, it is simply not warranted by any work presented in the paper. The evolution of new functions is not a process that requires a certain target to be hit. There can be multiple new functions that any starting protein can acquire. Likewise, there can be multiple ways of acquiring any given function. And finally, evolution doesn’t happen in a single population; it happens in multiple populations at the same time.

Calculating that it would take a long time for a specific new function to evolve from a specific gene with a specific set of mutations in a specific population in no way suggests that it would take a long time for any new function to evolve in any gene from any set of mutations in any population. Behe and Snoke’s work attempts to show the former, but in their discussion they conclude the latter. Their conclusion simply does not follow.

Behe and Snoke do mention this early in their discussion:

… because the simulation looks for the production of a particular MR feature in a particular gene, the values will be overestimates of the time necessary to produce some MR feature in some duplicated gene. In other words, the simulation takes a prospective stance, asking for a certain feature to be produced, but we look at modern proteins retrospectively. Although we see a particular disulfide bond or binding site in a particular protein, there may have been several sites in the protein that could have evolved into disulfide bonds or binding sites, or other proteins may have fulfilled the same role. For example, Matthews’ group engineered several nonnative disulfide bonds into lysozyme that permit function (Matsumura et al. 1989). We see the modern product but not the historical possibilities.

(Behe & Snoke 2004 p11, emphasis original)

Unfortunately Behe and Snoke ignore their own caveat in the rest of their discussion, resulting in an entirely unjustified conclusion. At the very least, it biases their estimates of population sizes and generations considerably upward. A more realistic scenario, in which multiple targets could be reached, would result in a much smaller number of generations and/or smaller required population sizes. (To be fair, it would be more difficult to model such a scenario.) At worst, the fact that they only consider specific changes at specific locations makes their model meaningless because it assumes a fundamentally different process than the one that occurs in nature.

The rest of this essay will be divided into two parts. The first part will explore the assumptions that Behe and Snoke use to build their model, and why those assumptions bias their model toward their favored conclusion. The second part will discuss the likelihood of MR features evolving, and will include a calculation for Behe and Snoke’s flagship example using realistic parameters.

Evolution of DPG Binding

Let us now look at one of Behe and Snoke’s own examples, the evolution of β-hemoglobin’s 2,3-diphosphoglycerate binding site. Hemoglobin is the protein that caries oxygen in vertebrate blood. In most vertebrates, hemoglobin is a tetramer composed of two α and two β protein chains. There is much evidence that the α and β chains evolved from duplicates of an ancestral globin chain (Li 1997 pp289-292). In mammals, frogs, and reptiles there is a small pocket which binds the organic phosphate molecule 2,3-diphosphoglycerate (DPG) at the interface between the two beta chains. (Reptiles use inositol pentaphosphate (IPP) as the physiological modulator, but their site binds DPG very effectively.) If a molecule of DPG is bound to hemoglobin, it helps stabilize the tetramer, reducing the hemoglobin’s affinity for oxygen. Therefore, DPG regulation allows hemoglobin to release more oxygen in capillaries. DPG is produced by a shunt pathway that prevents glycolysis (the pathway that “burns” glucose to produce energy) from producing the energy molecule ATP when levels of ATP are high. This means that DPG levels are positively correlated with the oxygen levels. When oxygen levels are high, ATP levels are high, and DPG levels are also high. When oxygen levels are low, ATP levels are low, and DPG levels are also low. Therefore, low levels of oxygen favor increased oxygen affinity of hemoglobin. In sum, DPG regulation ensures that hemoglobin can work efficiently even when oxygen levels are low.

Behe and Stoke refer to DPG binding as an example of a function that must have involved multiple neutral mutations, citing Li (1997). At the beginning of their discussion, Behe and Stoke state:

Some features of proteins, such as disulfide bonds and ligand binding sites, … are composed of multiple amino acid residues. As Li (1997) points out, the evolutionary origins of such features must have involved multiple mutations that were initially neutral with respect to the MR feature.

(Behe & Snoke 2004 p7, emphasis added)

However, this is simply not supported by Li (1997):

The emergence of a new function in a DNA or protein sequence is supposedly advantageous and is commonly believed to have occurred by advantageous mutations. However, acquiring a new function may require many mutational steps, and a point that needs emphasis is that the early steps might have been selectively neutral because the new function might not be manifested until a certain number of steps had already occurred.

(Li 1997 p427, emphasis added)

Note the change from “might have been” to “must have involved.” Li (1997) in fact never says early mutational steps “must have been.” He points out that at least one mutation to form the DPG binding site was selectable, the one that generated histidine at position 2. He also says “Of course one cannot rule out the possibility that either Lys82 or His143 or both have evolved because of a selective advantage other than DPG binding” (p 428). As we shall see below, this is in fact the case. Behe and Snoke overemphasize must, creating a misleading impression that there is authoritative support for the concept that all amino acids of a site must be in place for a selectable function to occur. This is exactly the opposite of current knowledge. It is interesting to point out that in their introduction Behe and Snoke actually quote from this section of Li (1997), yet manage to misread it.

As Li (1997) noted, the binding site for diphosphoglycerate in hemoglobin requires three residues. The population size required to produce an MR feature consisting of three interacting residues by point mutation in a duplicated gene initially lacking those residues would depend on the number of nucleotides that had to be changed–a minimum of three and a maximum of nine.

(Behe & Snoke 2004 p10)

This gives the misleading impression that Li (1997) states that the DPG binding site requires all 3 amino acids, and only those amino acids, in place before selectable binding occurs. Unfortunately for Behe and Snoke, the DPG binding site is a good example of plasticity and redundancy which highlights a key flaw in their “one true sequence” assumption. Li (1997), the very article they quote, gives examples of alternate sequences that work well, thus invalidating the assumption that only their “one true sequence” will have that function.

The modern mammalian DPG binding site is formed from 3 amino acids on the beta chain. Histidines (H in the single amino acid code) at position 2 and 143, and lysine (K in the single amino acid code) at position 82. All these amino acids are basic; we can call these amino acids responsible for DPG binding the HKH triad. Behe and Snoke imply that you must have the HKH triad in place for both binding and selection to take place.

Now, while all of these sites are required for good binding, there are mammals without these 3 that do quite nicely. You can replace basic histidine with basic arginine ® and a functional DPG binding site is obtained at either position 2 or 143 (Bonaventura et al. 1975). So you can have RKH and HKR and still get DPG binding with selectable function. There are even more variants with selectable function. Mice have an asparagine (N) at position 2, giving an NKH triad. Lemurs make do with leucine (LKH). Ruminants have methionine (M) at position 2. The MKH variant is much less sensitive to DPG, but still has binding and selectable function (Angeletti et al. 2001). Even with a limited number of vertebrates, we can show that the actual example cited by Behe and Snoke, the DPG binding site, does not in fact follow their “one true sequence” model.

But it gets more interesting than that.

As one example, Li (1997) has argued that the precursor to modern hemoglobins that can bind diphosphoglycerate did not have any of the three amino acid residues involved in the interaction.

(Behe & Snoke 2004 p11)

This is true, but highly misleading. DPG binding is at the end of a sequence of development of allosteric modulation by anions and organic phosphates paralleling the development of tetrameric hemoglobin.

While most vertebrate hemoglobins are α₂β₂ tetramers, most species of hagfish, a primitive jawless fish, have monomeric hemoglobin. In other species of hagfish, the oxygenated form of hemoglobin is a monomer, and the deoxygenated form is a dimer. In all known species of lamprey, another primitive jawless fish, the oxygenated form of hemoglobin is a monomer, and the deoxygenated form is either a dimer or tetramer. In sharks, the oxygenated form is the αβ dimer, and the deoxygenated form is a tetramer. In bony fish, frogs, lizards and mammals hemoglobin is the familiar α₂β₂ tetramer.

In most vertebrates, organic phosphates modulate the oxygen affinity of hemoglobin, but most vertebrates use ATP, the energy bearing phosphate produced by oxidative metabolism, to modulate hemoglobin oxygen affinity (Nikinmaa 2001, Coates 1975a). Mammalian DPG binding evolved not from scratch, as implied by Behe and Snoke, but from an ATP binding site.

Lamprey hemoglobin is not modulated by organic phosphates, it has PSS (proline, serine) at the equivalent position of the mammalian HKH triad (see figure 1). Most hagfish hemoglobins are not modulated by organic phosphates (Brittain & Wells 1986). However, some hagfish hemoglobins that form dimers in the deoxyform are modulated (weakly) by ATP (Nikinmaa 2001). Hagfish have TKS (threonine) in the equivalent position of the mammalian HKH triad.

Figure 1. Aligned sequences of Rat (204569),Bony fish (2154747, 2154902, 38606321), Sharks (4512338, 451454), beta hemoglobins with Hagfish (5114419) and Lamprey(7677498) globins. Sequences are aligned to the hagfish sequence, so that the mammalian amino acids that form the 2,3-diphosphoglycerate binding site are at different locations than stated in the text. Histidine 2 (H) is at position 11, lysine 82 (K) is at position 96 and histidine 143 (H) is at position 158. Note the significant variation at these positions.

Sharks have dimers which associate to tetramers; these are also modulated by ATP. The key to ATP modulation appears to be the presence of lysine at position 82 on the beta chain. Although some sharks use a KES motif, lysine 82 seems to be the minimal configuration. Frog hemoglobin binds both ATP and DPG (Coates 1975b). Frogs have lysine at position 82, lysine at position 143, and either nothing or glutamic acid (E) at position 2, thus you can bind DPG (and get selectable modulation of oxygen binding) without the HKH triad, a -KK will do (which builds on an XKX which binds ATP, X being any amino acid) and thus onward and upward.

Triad	Vertebrate	Function
PSS	Lamprey	no organic phosphate modulation
TKS	Hagfish	some ATP modulation
HKK/KES	Sharks	ATP modulation
HKR/HKK/EKK	Fish	ATP modulation
-KK/EKK	Frogs	ATP/DPG modulation
HKR	Lizards	ATP/DPG modulation. DPG not physiological in lizards
HKH/NKH/RKH/HKR	Mammals	DPG/ATP modulation

Figure 2. This shows the same information as above in a more graphical way, showing that the 2,3-diposphoglyerate binding site is the result of concerted evolution of organic phosphate binding sites, with a significant flexibility in structure.

The TKS phosphate binding triad in hagfish probably is neutral, as their red blood cells have little ATP or DPG (Coates 1975a). However, in sharks, which do have significant levels of ATP (Coates 1975a), this would be selectable, and requires few selectable steps to get to HKS. XKK and XKR also detectably bind DPG (and modulate oxygen affinity), so we have precursors to the mammalian DPG binding site which are functional and selectable. Thus a DPG binding site can evolve from an ATP binding site using selectively advantageous mutations almost all the way.

So, using the DPG binding site wasn’t a good example of a site requiring 3 neutral mutations to produce a fixed sequence before selectable function occurs. Some functions may require a “one true sequence” in the Behe-and-Snoke sense, but they are likely to be vanishingly rare, as even Behe and Snoke’s flagship example can be derived from simpler, selectable sequences.

Subfunctionalization vs. Neofunctionalization

Behe and Snoke’s model assumes a simplified “neofunctionalization” model of novel gene evolution. The scenario works as follows: A gene duplicates, resulting in two copies of the same gene. Because one copy continues to provide the ancestral function, the other copy is redundant, and thus has relaxed selective pressure. The redundant copy is then free to accumulate mutations, either those that give the gene a new activity, or those that inactivate the gene and render it completely nonfunctional, which then results in a pseudogene. However, as the authors explain, this simplified scenario may be the exception to the rule (Behe & Snoke 2004, p. 8). Most gene duplicates probably experience a “subfunctionalization” or similar scenario, in which both gene copies are maintained by selection for different ancestral activities. Since selection is operating more or less continuously in these scenarios, null mutations should be weeded out, increasing the likelihood for a novel feature to evolve.

Behe and Snoke are not the first to conclude that the “classic” neofunctionalization model is probably insufficient to explain the observed rate of duplicate gene preservation. Lynch and Force (2000) write,

Under the classical model of gene duplication, nonfunctionalization of one member of the pair by degenerative mutation has generally been viewed as inevitable unless the fixation of a silencing mutation is preceded by a mutation to a novel beneficial function. However, there now appear to be several plausible mechanisms for the preservation of duplicate genes….

There is, however, nothing inherent in the [subfunctionalization] model that denies the significance of gene duplication in the origin of evolutionary novelty. Indeed, the subfunctionalization process may facilitate such evolution by preserving gene duplicates and maintaining their exposure to natural selection and/or by removing pleiotropic constraints.

Unfortunately, Behe and Snoke’s model cannot be applied to situations other than a simplistic neofunctionalization scenario, in which a gene duplicate has completely relaxed selection up to the point where the MR feature evolves.

There are many more questionable assumptions in Behe and Snoke’s model. (Yes, assumptions have to be made to simplify nature into tractable models, but one has to be careful not to over-simplify.) These include ignoring back mutation from null sites, resetting a gene when it acquires a null mutation, ignoring recombination and gene conversion, and ignoring confidence intervals on waiting times, which are notoriously large. However, we do not have the space to consider them all.

Rho-Oh!

There is another assumption that goes into the model that directly affects the results. This one has to do with Behe and Snoke’s estimation of the frequency of null mutations. The parameter they factor into their model is ρ, which is defined as the ratio of null mutations to “compatible” mutations that contribute to the new function. They give an example:

… consider a gene of a thousand nucleotides. If a total of 2400 point mutations of those positions would yield a null allele, whereas three positions must be changed to build a new MR feature such as a disulfide bond, then ρ would be 2400/3, or 800. (Any possible mutations which are neutral are ignored.)

(Behe & Snoke 2004 p3)

We had a hard time understanding how they got 2400 null mutations from a gene of 1000 nucleotides long. A gene of that size has 3000 possible point mutations, since each nucleotide can change into one of the remaining three. But thanks to the degeneracy of the genetic code, about 1/3 of these will be “silent” because they won’t affect the amino acid sequence of the protein. Since Behe and Snoke are ignoring neutral mutations, that should leave a maximum of about 2000 non-neutral mutations. How are they coming up with 2400?

It seemed like such a simple miscalculation that we figured the problem had to be with us. So one of us emailed Michael Behe and asked about how the 2400 number was reached. He graciously replied the next day and said that the example was only intended for illustrative purposes, which is certainly fair. He also said that mutations in non-coding portions of the gene, which control regulation, could also render it nonfunctional, which tends to push the number upward. As shown with the DPG binding site example below, it doesn’t push it up that much. However, we don’t see this as being valid no matter how one looks at it. Non-coding DNA wasn’t taken into account anywhere in their model, and a new gene duplicate doesn’t necessarily need regulation anyway. Either it can be regulated by a neighbor’s regulatory region (for example, its parent gene), or, if its regulatory regions are nullified, it can be turned “on” constantly. But the real kicker is that the number was based on assuming that around half of amino acid changes would result in a nonfunctional protein. Okay … but the example uses nucleotides, and more importantly, the simulations they do factor in nucleotide changes and not amino acid changes. For example …

consider a case where three nucleotide changes must be made to generate a novel feature such as a disulfide bond. In that instance, Figure 6 shows that a population size of approximately 10¹¹ organisms on average would be required to give rise to the feature over the course of 10⁸ generations….

(Behe & Snoke 2004 p10)

And later …

The population size required to produce [a multi-residue] feature consisting of three interacting residues by point mutation in a duplicated gene initially lacking those residues would depend on the number of nucleotides that had to be changed–a minimum of three and a maximum of nine. If six mutations were required then, as indicated by Figure 6, on average a population size of ~10²² organisms would be necessary to fix the MR feature in 10⁸ generations….

(Behe & Snoke 2004 p10)

It’s clear they’re using nucleotides substitutions in their model. This is critical because, if they used amino acid changes instead, the number of changes required to produce the multi-residue feature (what they call λ) would be about half as many, and this would revise their numbers strongly downward. On the other hand, if we use nucleotides to calculate ρ, then that figure should be a great deal less, which also revises their numbers downward. (Our opinion is that nucleotides should be used consistently, as we have done in working out the DPG example.) It’s much less likely that a nucleotide substitution will be detrimental than an amino acid substitution. Aside from silent mutations, the genetic code favors conservative substitutions, and disruptive amino acids (like tryptophan) tend to be coded for infrequently. Using nucleotides rather than amino acids automatically factors this in. However, we will point out that Behe and Snoke do not even base their model on nucleotides. Instead, they use a biologically unrealistic binary model. This leads to further problems determining biologically relevant parameter values.

The simulations run by Behe and Snoke set ρ to 1000. They get this value from Walsh (1995) which uses this value as the conservative end of a range of estimates. However, Walsh doesn’t explain why he chose that range of values, and in the context of his analysis, it didn’t matter. Walsh’s analysis concerns the ratio of ρ to s (the selection coefficient) and shows under what conditions a new function is likely to be fixed for a duplicate gene. He doesn’t try to apply it to real-life examples, and changing values of ρ don’t affect the key findings of his analysis. Behe and Snoke, on the other hand, are trying to base their estimate on real-world mutational tolerance:

A gene coding for a duplicate, redundant protein would contain many nucleotides. The majority of nonneutral point mutations to the gene will yield a null allele (again, by which we mean a gene coding for a nonfunctional protein) because most mutations that alter the amino acid sequence of a protein effectively eliminate function (Reidhaar-Olson and Sauer 1988, 1990; Bowie and Sauer 1989; Lim and Sauer 1989; Bowie et al. 1990; Reidhaar-Olson and Sauer 1990; Rennell et al. 1991; Axe et al. 1996; Huang et al. 1996; Sauer et al. 1996; Suckow et al. 1996).

(Behe & Snoke 2004 pp2-3, emphasis added)

And later they quantify what they mean by “most”:

An estimate of ρ can be inferred from studies of the tolerance of proteins to amino acid substitution. Although there is variation among different positions in a protein sequence, with surface residues in general being more tolerant of substitution than buried residues, it can be calculated that on average a given position will tolerate about six different amino acid residues and still maintain function (Reidhaar-Olson and Sauer 1988, 1990; Bowie and Sauer 1989; Lim and Sauer 1989; Bowie et al. 1990; Rennell et al.1991; Axe et al. 1996; Huang et al. 1996; Sauer et al. 1996; Suckow et al. 1996). Conversely, mutations to an average of 14 residues per site will produce a null allele, that is, one coding for a nonfunctional protein. Thus, in the coding sequence for an average-sized protein domain of 200 amino acid residues, there are, on average, 2800 possible substitutions that lead to a nonfunctional protein as a result of direct effects on protein structure or function. If several mutations are required to produce a new MR feature in a protein, then ρ is roughly of the order of 1000.

(Behe & Snoke 2004 p10)

Here we can see that Behe and Snoke assume that about 70% of amino acid substitutions will result in a nonfunctioning protein. But this is almost certainly a vast overestimate. The best estimate for a protein’s tolerance to random amino acid change comes from a recent paper from Guo and coworkers (2004), in which they calculate, through direct empirical investigation, that 34 +/- 6% of random amino acid changes will eliminate a protein’s function (what they call the protein’s “x factor”). They find this to be in accord with similar studies using a broad array of proteins. In fairness to Behe and Snoke, they couldn’t have known about Guo et al. (2004) since it was published right about the time that their own paper was accepted. But the papers that they cite don’t support the 70% figure either. Each of those studies either applies mutagenesis only to a conserved region (typically the hydrophobic core or an active site), or they mutate more than one amino acid at a time, or both.

Consider Axe et al. (1996), which Behe and Snoke cite in support of their 70% estimate. In this study, the researchers applied random mutagenesis to the hydrophobic core of the enzyme barnase. They didn’t just mutate one amino acid at a time, they mutated the whole lot. Not only is the hydrophobic core expected to be much less tolerant to mutation than the protein as a whole, it’s also expected that multiple amino acid changes should be less easily tolerated than single mutations. Yet despite all that, 23% of their variants were functional, far greater than expected. This means that 77% were nonfunctional, which would accord well with Behe and Snoke’s estimate, if only this were what they were estimating. But what Behe and Snoke are actually estimating is the likelihood that a single amino acid change at a random location throughout the entire protein renders it nonfunctional. Axe et al. (1998) later did such an experiment and found that a big bad 5% of random mutations rendered barnase nonfunctional, far less than Guo’s estimate of 34%. Behe and Snoke later cite Axe et al (1998) and note that their estimate of ρ may be too high. We wonder why they cite Axe (1996) in support of their 70% assumption, when the paper is both irrelevant and contradicted by a more recent study using the same enzyme.

Being off by this much wouldn’t be such a big deal if not for the fact that ρ factors prominently into Behe and Snoke’s calculations. Small changes in the value of ρ make a large difference to the model, as the authors explain on page 11. Behe and Snoke used a value of ρ set at 1000 for their examples, which they explain in the quoted section above. But now let’s take their example and figure out what a more realistic value of ρ should be. A 200 amino acid protein will have a coding region of 600 nucleotides, which has 1800 possible point mutations. Guo et al. (2004) calculate a value for the nucleotide “x factor”, based upon the protein “x factor,” but taking into account silent mutations. The number they get is 26%. We can therefore assume that 26% of our 1800 mutations will be eliminate function, which gives us 486. So ρ should be 486 divided by the number of mutations necessary to produce the MR feature. No matter what, it will be substantially less than the 1000 figure that Behe and Snoke use. In the example with disulfide bonds, where the MR feature is assumed to require 3 nucleotide changes, ρ would be 162. According to Behe and Snoke’s own equations, this replacement could occur in 3.6x10⁰⁷ generations in a population size of 10⁸. In the example with DPG binding, where they assume 6 required changes, ρ would be 81. This replacement would occur in 3.28x10¹² generations for a population size of 10⁸. This is orders of magnitude less than the 1.09x10¹⁹ generations that Behe and Snoke’s values would produce. As we show for the DPG binding site below, using realistic values of ρ and λ will produce reasonable times to production of binding sites.

How unlikely is the evolution of MR features?

Despite using assumptions that render their model overly pessimistic, the population size and generation time that Behe and Snoke calculate is not prohibitive for the types of organisms (haploid, asexual) that it is most applicable to. The authors conclude that population sizes of 10⁹ would require at least 10⁸ generations to evolve a two-site MR feature (λ=2) under their model.

And while this does seem prohibitive for large, multicellular eukaryotes, it’s actually easily achievable for bacteria. A population size of 10⁹ is what one would find in a very small culture growing in a lab; even small handfuls of dirt, or the average human gut, will contain populations in excess of this number. Bacteria reproduce quickly; under optimal conditions for E. coli, 10⁸ generations will occur in less than 40,000 years, a geological blink of the eye. Given that there are about 5x10³⁰ bacteria on Earth (Whitman et al. 1998), we should expect the evolution of novel MR features to be an extremely common event – an average of many times per microsecond – even if we accept Behe and Snoke’s unrealistic assumptions.

Since we can be confident that their numbers are a vast overestimate, Behe and Snoke have ironically demonstrated that the evolution of novel gene functions is not unlikely at all. And yet, it has been a long standing claim of the ID movement that the evolution of “novelty” simply cannot happen, period. Behe and Snoke have done us the favor of disproving this bogus notion once and for all.

Applying Behe & Snoke’s equations to the DPG binding site example

Let’s take this further and apply Behe and Snoke’s own equations to the DPG binding site example. First we need to calculate λ, the number of neutral nucleotide substitutions (above we were looking at amino acid substitutions) to produce a DPG binding site.

To calculate λ we’ve used two scenarios, drift from the hagfish or lamprey sequence to a known DPG binding site, or to a bony fish or a shark ATP binding site.

Now, first let’s chart some single nucleotide steps from the hagfish or lamprey sequence to HKS, a known DPG binding sequence with weak but functional (and hence selectable) DPG binding. Then let’s look at the steps from that to the ATP binding site of a bony fish (Tuna HKR) and Port Jackson Shark (HKK). Again the letters are the one letter amino acid code, the number in parentheses is the number of nucleotides that needs to be changed to get from one amino acid state to the other. These are not the only paths (or possible endpoints), but they show that there are multiple possible ways to get to even a narrowly defined target. The number of nucleotide steps differs from the number of amino acid substitutions above, because in some cases you can’t go directly from the code for one amino acid to the other.

Hagfish to DPG binding			Lamprey to DPG binding	DPG to Shark
T K S			P S S	H K S
N K S (1)	K K S (1)	R K S (1)	H S S (1)	H K R (1)
H K S (1)	N K S (1)	R K S (1)	H R S (1)	H K K (1)
	H K S (1)	H K S (1)	H K S (1)

Note (a) that the maximum path length from no/very weak DPG binding to selectable binding is between 2-3 substitutions, rather than the 6 nucleotide substitutions that Behe and Snoke imply. Note also that the “neutral” substitutions N, K and R, especially basic K and R, are amino acids that are found in ATP/DGP binding hemoglobins in those positions (see above). Indeed HKR is the ATP binding motif in bony fish. So the “neutral drift” from either hagfish or lamprey to a selectable inorganic phosphate binding site is between 2 to 4 nucleotides depending on your endpoint. Thus λ is between 2 and 4.

Next we need ρ, which is defined as the number of null mutations divided by λ. To find the number of null mutations, we multiply the size of the gene in nucleotides by 3 (to get the number of alternative nucleotides) multiply that by the fraction of nucleotide replacements that are null (i.e. generate functionless peptides). For these calculations we shall use the length of the lamprey gene which is 452 nucleotides.–This makes things harder for ourselves as the lamprey and hagfish genes are substantially longer than the shark or bony fish genes.–Contrast Behe and Snoke’s null factor (0.7), derived from amino acids, with a more realistic one (0.3), derived from nucleotides, which is conservative compared with the null factor of 0.26 calculated by Guo (2004). Now we are in a position to calculate time to fixation (T_fx).

We’ve used a program kindly provided by E. Tellgren to calculate T_fx, using Behe and Snoke’s equation 4. Now using Behe and Snoke’s figures of 1x10^-8 for the mutation rate, 0.7 for null factor, and realistic values of ρ and λ into ET’s program gives the following generations to fixation for various population sizes (N) from 10⁶to 10⁹.

N	λ (ρ)
N	2 (316.4)	3 (210.933)	4 (158.2)	5 (126.56)	6 (105.467)
10⁰⁶	3.55x10⁰⁷	1.48x10⁰⁹	1.71x10¹¹	1.76x10¹³	1.57x10¹⁵
10⁰⁷	1.04x10⁰⁷	1.93x10⁰⁸	1.72x10¹⁰	1.76x10¹²	1.57x10¹⁴
10⁰⁸	3.21x10⁰⁶	5.04x10⁰⁷	1.76x10⁰⁹	1.76x10¹¹	1.57x10¹³
10⁰⁹	1.01x10⁰⁶	1.93x10⁰⁷	2.17x10⁰⁸	1.76x10¹⁰	1.57x10¹²

Table: Effect of realistic population sizes (N), ρ and λ (number of neutral replacements required to form the site) on time to fixation (T_fx) of a DPG binding site in generations. T_fx calculated from Behe and Snoke’s equation 4.

Now, Behe and Snoke claim in their paper that, for an MR site with six sites (λ = 6) and a ρ of 1000, a population would require 1x10²² generations to fix such a site in a population, a mind-boggling large amount of time and far longer than the age of the universe if one generation is one year. They imply that these figures are applicable to the DPG binding site. However, as we can see, using a protein size that overestimates the size of the evolving hemoglobin protein and Behe and Snoke’s overly conservative figures for null mutation rate (0.7), a DPG binding site could evolve quite rapidly.

We’re pretty sure that there are more than 10 million hagfish or lampreys around. As you can see from the table, with a mere 10 million hagfish, Behe and Snoke’s own equation finds that on average a population would get to the functional bony fish ATP binding site in 190 million years, assuming one generation a year. This is not unrealistic considering that agnathans and elasmobranchs are separated by approximately 100 million years in the fossil record as it stands. If we use Behe and Snoke’s own 100 million individuals for the population cut off, the hagfish to bony fish sequence could be acquired on average in a mere 50 million years. The lamprey to bony fish site would require on average around 2 billion years using Behe and Snoke’s 100 million individuals cut off, rather long, but orders of magnitude below the 1x10²² Behe and Snoke imply.

Now let’s look at what happens when we use the more realistic null mutation rate of 0.3. With a population of 10 million (again, realistic for vertebrates), you can get from the hagfish to the bony fish site in 50 million years, and from the lamprey to the bony fish site in 600 million years. Using Behe and Snoke’s population of 100 million, you can get on average from the hagfish to the bony fish site in 10 million years and from the lamprey to the bony fish site in about 100 million years. Again, remember we are biasing ourselves by using larger proteins, and assuming that intermediates are neutral even though we know they are selectable.

In the above calculations we’ve been using the length of the protein coding gene. That is, from translation start codon to translation stop codons, but actual genes are larger than the translation frame. So here we have looked at the effect of using the whole gene, not just the translated bits, on the time to fixation (T_fx). Aside from the transcription initiation site, the rest of the gene outside the translation frame is probably neutral, so this overestimates time to fixation as well. We couldn’t find the shark transcription sequence, so we’ve used the Lamprey sequence for comparison. Using Behe and Snoke’s population of 100 million organisms you can get the hagfish to bony fish site on average in 50 million years, and the lamprey to bony fish site on average in 600 million years.

So, plugging Behe and Snoke’s own example in to Behe and Snoke’s own equation gives values that are compatible with the fossil record, even assuming intermediates with known or probable function were functionless, using overestimates of protein size and using an overestimate of null mutation rate. More realistic values of the null mutation rate suggest that even systems with large λ’s can evolve in reasonable times.

Conclusion

We began this essay with a quotation from Behe complaining that a paper describing an evolutionary simulation (Lenski et al. 2003) had “precious little real biology” in it. What we see here is that Behe and Snoke’s paper is acutely vulnerable to the same criticism. A theoretical model is useful to the extent that it accurately represents or appropriately idealizes the processes that occur in the phenomenon being studied. Although it is worthwhile to investigate the importance of neutral drift, Behe and Snoke have in our opinion over-simplified the process, resulting in questionable conclusions.

Their assumptions bias their results towards more pessimistic numbers. The worst assumption is that only one target sequence can be hit to produce a new function. This is probably false under all circumstances. The notion that a newly arisen duplicate will remain selectively neutral until the modern function is firmly in place is also probably false as a general rule. Their assumption that 70% of all amino acid substitutions will destroy a protein’s function is much too high. And finally, we have shown that their flagship example does not require a large multi-residue change before being selectable.

And ironically, despite these faulty assumptions, Behe and Snoke show that the probability of small multi-residue features evolving is extremely high, given the types of organisms that Behe and Snoke’s model applies to. When we use more realistic assumptions, though many bad ones still remain, we find that the evolution of multi-residue features is quite likely, even when there are smaller populations and larger changes involved. In fact, the times required are within the estimated divergence times gleaned from the fossil record. We can therefore say, with confidence, that the evolution of novel genes via multi-residue changes is not problematic for evolutionary theory as currently understood.

Acknowledgements

The authors would like to thank the many people who offered suggestions on this essay, our fellow contributors to The Panda’s Thumb, and especially E. Tellgren who provided an application to evaluate Behe and Snoke’s equations.

References

Angeletti M et al. (2001) Different functional modulation by heterotropic ligands (2,3-diphosphoglycerate and chlorides) of the two haemoglobins from fallow-deer (Dama dama). Eur J Biochem 268: 603-611.
Axe DD et al. (1996) Active barnase variants with completely random hydrophobic cores. PNAS 93: 5590-5594.
Axe DD et al. (1998) A search for single substitutions that eliminate enzymatic function in a bacterial ribonuclease. Biochemistry 37(20): 7157-66.
Behe MJ (1996) Darwin’s Black Box. The Free Press, New York.
Behe MJ (2000) In Defense of the Irreducibility of the Blood Clotting Cascade: Response to Russell Doolittle, Ken Miller and Keith Robison. Access Research Network. (Accessed 9/25/04).
Behe MJ & Snoke DW (2004) Simulating evolution by gene duplication of protein features that require multiple amino acid residues. Protein Science, Epub ahead of print (Accessed 8/31/04).
Bonaventura J et al. (1975) Hemoglobin Deer Lodge (beta 2 His replaced by Arg). Consequences of altering the 2,3-diphosphoglycerate binding site. J Biol Chem., 250: 9250-5.
Brittain T & Wells RM (1986) Characterization of the changes in the state of aggregation induced by ligand binding in the hemoglobin system of a primitive vertebrate, the hagfish Eptatretus cirrhatus. Comp Biochem Physiol A 85(4): 785-90.
Caenepeel S et al. (2004) The mouse kinome: discovery and comparative genomics of all mouse protein kinases. PNAS 101(32):11707-12.
Coates ML (1975a) Hemoglobin function in the vertebrates: an evolutionary model. J Mol Evol 6(4): 285-307.
Coates ML (1975b) Studies on the interaction of organic phosphates with haemoglobin in an amphibian (Bufo marinus), a reptile (Trachydosaurus rugosus) and man. Aust J Biol Sci 28(4): 367-78.
Copley SD (2000) Evolution of a metabolic pathway for degradation of a toxic xenobiotic: the patchwork approach. Trends Biochem Sci. 25(6): 261-5.
Cunchillos C & Lecointre G (2003) Evolution of amino acid metabolism inferred through cladistic analysis. J Biol Chem. 278(48):47960-70.
Dembski W (2004) Presentation at DDDV. Communicated by MK Johnson.
Discovery Institute (2004) Media Backgrounder: Intelligent Design Article Sparks Controversy. (Accessed 10/7/04)
Gerlt JA & Babbitt PC (2001) Divergent evolution of enzymatic function: mechanistically diverse superfamilies and functionally distinct suprafamilies. Annu Rev Biochem 70:209-46.
Guo HH et al. (2004) Protein tolerance to random amino acid change. PNAS 101(25):9205-10.
Hall BG (2003) The EBG system of E. coli: origin and evolution of a novel beta-galactosidase for the metabolism of lactose. Genetica 118(2-3) :143-56.
Huang W et al. (1996) Amino acid sequence determinants of beta-lactamase structure and activity. J Mol Biol. 258: 688-703.
Johnson GR et al. (2002) Origins of the 2,4-dinitrotoluene pathway. J Bacteriol 184(15): 4219-4232.
Kiernan V (2003) Simulation Demonstrates Evolutionary Process. The Chronicle of Higher Education June 6.
Lenski RF et al. (2003) The Evolutionary Origin of Complex Features. Nature 423(6936): 139-44.
Li WH (1997) Molecular Evolution. Sinauer Associates, Sunderland, MA.
Lynch M & Force A (2000) The probability of duplicate gene preservation by subfunctionalization. Genetics 154(1):459-73.
Manning G et al. (2002) The protein kinase complement of the human genome. Science 298(5600):1912-34.
Melendez-Hevia E et al. (1996) The puzzle of the Krebs citric acid cycle: assembling the pieces of chemically feasible reactions, and opportunism in the design of metabolic pathways during evolution. J Mol Evol. 43(3):293-303.
Meyer S (2004) The Origin of Biological Information and the Higher Taxonomic Categories. Proc Bio. Soc. Wash. 117(2):213-239.
Miller (1999) The Evolution of Vertebrate Blood Clotting (Accessed 10/01/04).
Musgrave I (2003) Spetner and Biological Information (Accessed 10/01/04).
Musgrave I (2004) ID research, is that all there is? (Accessed 10/01/04).
Nikinmaa M (2001) Haemoglobin function in vertebrates: evolutionary changes in cellular regulation in hypoxia. Resp Physiol 128: 317-329.
Ohta T (1987) Simulating evolution by gene duplication. Genetics 115: 207-214.
Sauer RT et al. (1996) Sequence determinants of folding and stability for the P22 Arc repressor dimer. Faseb J. 10: 42-48.
Schmidt et al. (2003) Evolutionary potential of (beta/alpha)8-barrels: functional promiscuity produced by single substitutions in the enolase superfamily. Biochemistry 42(28):8387-93.
Seffernick JL & Wackett LP (2001) Rapid evolution of bacterial catabolic enzymes: a case study with atrazine chlorohydrolase. Biochemistry. 40(43): 12747-53.
Walsh JB (1995) How often do duplicated genes evolve new functions? Genetics 139(1):421-8.
Waterston RH et al. (2002) Initial sequencing and comparative analysis of the mouse genome. Nature 420(6915):520-62.
Whitman et al. (1998) Prokaryotes: the unseen majority. PNAS 95(12):6578-83.

114 Comments

PvM · 11 October 2004

Excellent article. While science depends on falsification of proposed mechanisms and pathways, science based on looking at worst case scenarios does not serve much of a purpose but shows how science is limited by an ID view. Convinced that evolutionary mechanisms are unable to explain particular features, ID seems to rely on unimaginative scenarios. Starting with Dembski's protein 'calculations', continuing with MESA and now the paper by Behe and Snoke, we see how appeal to ignorance can become detrimental to scientific progress. Notice that in none of these cases ID has proposed their own hypotheses. In this context it may be interesting to point out the work by Koonin and others (proteins) or Schuster, Stadler, Fontana and others (RNA) who have shown that simple Birth-Innovation-Death models can explain the distribution of RNA/Protein networks.

Results: A simple model of evolution of the domain composition of proteomes was developed, with the following elementary processes: i) domain birth (duplication with divergence), ii) death (inactivation and/or deletion), and iii) innovation (emergence from non-coding or non-globular sequences or acquisition via horizontal gene transfer). This formalism can be described as a birth, death and innovation model (BDIM). case. We apply the BDIM formalism to the analysis of the domain family size distributions in prokaryotic and eukaryotic proteomes and show an excellent fit between these empirical data and a particular form of the model, the second-order balanced linear BDIM.

Birth and death of protein domains: A simple model of evolution explains power law behavior, Georgy P Karev, Yuri I Wolf, Andrey Y Rzhetsky, Faina S Berezovskaya and Eugene V Koonin, BMC Evolutionary Biology 2002, 2:18 In a more recent paper the authors looked at non-linear models since linear models were 'too slow'

Conclusions: The analysis of stochastic BDIMs presented here shows that non-linear versions of such models can well approximate not only the size distribution of gene families but also the dynamics of their formation during genome evolution. The fact that only higher degree BDIMs are compatible with the observed characteristics of genome evolution suggests that the growth of gene families is self-accelerating, which might reflect differential selective pressure acting on different genes.

Gene family evolution: an in-depth theoretical and simulation analysis of non-linear birth-death-innovation models, Georgy P Karev, Yuir I Wolf, Faina S Berezovskaya, Eugene V Koonin, BMC Evolutionary Biology 2004, 4:32 Similarly birth and death modeling in RNA has shown how gene duplication and divergence can explain the characteristics of RNA networks.

Nick · 11 October 2004

Musgrave et al. write,

Since we can be confident that their numbers are a vast overestimate, Behe and Snoke have ironically demonstrated that the evolution of novel gene functions is not unlikely at all. And yet, it has been a long standing claim of the ID movement that the evolution of "novelty" simply cannot happen, period. Behe and Snoke have done us the favor of disproving this bogus notion once and for all. "Theory is as Theory Does"

I wonder if this "discovery" of Behe and Snokes will be highlighted at the next ID conference?

Alan Gourant · 11 October 2004

Musgrave-Reuland-Cartwright = MRC = "Most Reasonable Counter-argument" against Behe-Snoke = BS = "........"

Aarobyl · 11 October 2004

An excellent article, but I think it toesn't help at all. Behe's paper has nothing to do with science at all ( except with abuse of science, of course... ), so showing it to be bad science doesn't help any further.
Behe's only aim is just to lay an other cuckoo egg, which will allow him to pose himself as an scientist who writes about evolution. No matter how bad or irrelevant the paper actually is, no matter how all the others will be successful in their retufations, he can still say to the plain folks, he is an expert for evolution, because he is writing 'scientific' papers about it. And what to do against this ? I have no clue...

frank schmidt · 11 October 2004

Is there some confusion here?

DPG is produced by a shunt pathway that prevents glycolysis (the pathway that "burns" glucose to produce energy) from producing the energy molecule ATP when levels of ATP are high...When oxygen levels are low, ATP levels are low, and DPG levels are also low. Therefore, low levels of oxygen favor increased oxygen affinity of hemoglobin.

Actually, DPG is produced during conditions of hypoxia; for example, [DPG] goes up when subjects experience a gain of altitude. This makes sense, because the objective is for Hemoglobin to release O2 to the tissues when there isn't much around. The shunt pathway to make DPG is off the glycolytic pathway. Glycolysis is anaerobic, so when ATP synthesis by glycolysis is high, the concentration of 1,3-diphosphoglycerate increases. 1,3-DPG is converted to DPG. It may well be a simple case of mass action, although DPG also inhibits its own synthesis by feedback. Source: Newsholme and Leach, Biochemistry for the Medical Sciences (1983).

Reed A. Cartwright · 11 October 2004

Frank,

I spent a lot of time trying to figure out the confusing physiology. Ian should be along to explain it better than I can.

My understanding is that DPG increases at high altitude because of upregulation of the shunting enzymes, not directly because of less oxygen. A lot of this has to do with understanding oxygen affinity curves and trade-offs between oxygen uptake and oxygen release.

Steve Reuland · 11 October 2004

An excellent article, but I think it toesn't help at all. Behe's paper has nothing to do with science at all ( except with abuse of science, of course . . . ), so showing it to be bad science doesn't help any further. Behe's only aim is just to lay an other cuckoo egg, which will allow him to pose himself as an scientist who writes about evolution. No matter how bad or irrelevant the paper actually is, no matter how all the others will be successful in their retufations, he can still say to the plain folks, he is an expert for evolution, because he is writing 'scientific' papers about it. And what to do against this ? I have no clue . . .
— Aarobyl

Thanks for the compliment, but I don't agree that Behe and Snoke's paper "has nothing to do with science". In fact, it's just about the first and only stab at science we've seen out of the ID movement. And if the paper's conclusions were warranted (and the authors, to their credit, are careful to issue many caveats), then it would be meaningful to science. I'm personally glad they've published this because it gives us something concrete to critique (finally!), rather than a bunch of vague musings and unsupported assertions. As for the propaganda value of the paper, sure, I expect there will be a number of hackneyed celebratory missives from the Discovery Institute. But they've been doing that for years, claiming to have discovered something profound even without having any papers to show for all the money they've spent. The important point is for people who want a deeper understanding of issues to be able to weigh their claims -- paper or no paper -- against reality. As for the people who don't care about that... well, what can you do?

Steve · 11 October 2004

Thanks, Ian, Steve, Reed, for the excellent article.

The worst assumption is that only one target sequence can be hit to produce a new function. This is probably false under all circumstances.

To my mind, this fact condemns the project of ID statistics to failure. In statistical mechanics terms, they're trying to say that the macrostate is impossible, based on the odds of the single microstate. It's just stupidity.

JohnK · 11 October 2004

Is anyone intimately familiar enough with Protein Science to speculate on its depth of review on evo subjects beyond phylogenies, etc.?

Great White Wonder · 11 October 2004

I note at least three misleading statements are referred to in the excellent review of Behe et al.'s paper, above. And the term "questionable" pops up as well.

Just out of curiosity ... are there any examples of misleading statements or questionable assumptions in the paper that do NOT serve to bolster Behe's alleged conclusion and/or Behe's ultimate goal?

Admonitus · 11 October 2004

Great article. I've been awaiting this for some time, but I must say my initial reaction to the article was basically the same. They're concluding something can't happen based on the most simple of all possible models, in defiance of historical observations, and then opening their discussion with the statement that they didn't consider the historical possibilities. It's saying "evolution is not intelligent design because intelligent design can do something that evolution cannot (specifically get from point A directly to point B)" and then following it with "therefore evolution can't explain the history of biology if we assume that intelligent design was responsible for it."

Hence, my PennySaver arguments that "the probability of getting from here to there are THIS SMALL." But, people have denounced PennySaver as recycled creationism and thrown her out on the curbside to be picked up.

Admonitus · 11 October 2004

But, John Bracht would of course argue (after noting that you all are a bunch of atheist ID-haters, mostly not even real scientists, confined by your narrow worldview to conclude evolution, and not allowed to openly question the theory) that you are not following the proper course of action. He'd insist that you try submitting your rebuttal to a peer-reviewed journal, rather than retreating to the ramparts of your weblog where you can say whatever you want (much of which, he insists, doesn't hold up when you really examine the sources you all cite). Gotta love the guy...

Admonitus Again · 11 October 2004

Oh, and I am familiar with Protein Science, just having attended their conference (which wasn't that great). It's a respected journal in biophysics and structural biology, but it doesn't have a lot to do with evolutionary biology at all.

Steve Reuland · 11 October 2004

Oh, and I am familiar with Protein Science, just having attended their conference (which wasn't that great). It's a respected journal in biophysics and structural biology, but it doesn't have a lot to do with evolutionary biology at all.
— Admonitous

I was there too. There were a few talks that dealt specifically with evolutionary topics, including the opening plenary by Chris Walsh, which was about antibiotic resistance (it was my fave). The Protein Society certainly doesn't specialize in evolutionary biology (believe it or not, it specializes in proteins), but evolution is integrated into it just as it is with every other area of biology. If anyone in attendence was an "evolution skeptic", you couldn't tell. As for the journal they publish, it is a respectable journal, and Behe and Snoke could have done worse. I think there are things in the paper that the reviewers probably insisted on (mostly a large number of caveats) that wouldn't be present otherwise. So I don't see any reason to think that the reviewers failed or that the paper shouldn't have been published. Of course, that doesn't mean it's without flaws...

Steve Reuland · 11 October 2004

John Bracht would of course argue (after noting that you all are a bunch of atheist ID-haters, mostly not even real scientists, confined by your narrow worldview to conclude evolution, and not allowed to openly question the theory) that you are not following the proper course of action.
— Admonitous

I'm only vaguely familiar with who John Bract is, but I don't remember him using those kinds of ad hominem arguments. (Though his mentors in the ID movement have never hesitated to.) Do you have any examples?

He'd insist that you try submitting your rebuttal to a peer-reviewed journal, rather than retreating to the ramparts of your weblog where you can say whatever you want (much of which, he insists, doesn't hold up when you really examine the sources you all cite).

Who says we're not?

Admonitus · 11 October 2004

The litany of ad-hominems is what he does in conversation about this, and in email with me. He insists, though, that he's spent a year debating you guys and he's just come to the conclusion that you're not interested in truth.

As for the submission to a peer-reviewed journal, good show! I taught for a professor of physical chemistry who was quite happy to rebut an article in the pedgogical literature stating that the second law of thermodynamics can be derived from the first. The abstract read something like "we identify the errors and give explicit counterexamples to the claim."

Steve Reuland · 11 October 2004

Just out of curiosity . . . are there any examples of misleading statements or questionable assumptions in the paper that do NOT serve to bolster Behe's alleged conclusion and/or Behe's ultimate goal?
— GWW

I don't know about "misleading", but they do make some simplifying assumptions that work against, rather than towards, their conclusion. Most of these are rather minor, but one significant one that comes to mind is the assumption that the gene duplicate starts out in every member of the population. In reality, it would first need to spread either by selection or drift, which would increase the amount of time required. I think this would be partially off-set by the fact that there's more than one gene in the genome, but I'd have to go back and look to figure out the relevance of that. At any rate, I don't think there's any question that the assumptions that bias their model towards greater times and/or population sizes are more numerous and severe than those that bias it in the opposite direction.

Admonitus · 11 October 2004

I should say, though, that John Bracht is sincere about his desire to do science in the Intelligent Design arena, and he does admit that it shouldn't be taught in classes (yet). He vehemently denies that the Discovery Institute is trying to get Intelligent Design into the school curriculum, and not knowing very much about their specific goals (or the exact statements thereof) I am unable to argue that point with him.

Steve Reuland · 11 October 2004

The litany of ad-hominems is what he does in conversation about this, and in email with me. He insists, though, that he's spent a year debating you guys and he's just come to the conclusion that you're not interested in truth.
— Admonitous

Funny, I came to the same conclusion about him.

He vehemently denies that the Discovery Institute is trying to get Intelligent Design into the school curriculum, and not knowing very much about their specific goals (or the exact statements thereof) I am unable to argue that point with him.

I think even a passing familiarity with the DI shows this to be false. They have specifically worked in Kansas, Ohio, and other states to get ID (or something like it) into school curricula. They've also worked in Texas to try to influence which text books were adopted. The only sense in which Bract's claim is true is that the DI has frequently moved the goal posts and claimed that they weren't really trying to get ID into schools, rather they were trying to get "the evidence against evolution", or "objective origins science", and so forth to be taught. It's a semantic game, not much more.

Steve Reuland · 11 October 2004

I should say, though, that John Bracht is sincere about his desire to do science in the Intelligent Design arena...
— Admonitous

I would be too, if I thought there were anything to be researched. So far, the best I've seen are proposals to try to test some evolutionary hypothesis or another, which means that they'd really be doing research on evolution, not on ID. Anyone who wants to see how fertile ID is for research can pop over to Brainstorms and judge for themselves if the quality or quantity of research proposals stacks up. Bract himself contributed some of the more, um, interesting ones. My favorite was the argument that C. elegans must have been "designed for discovery" because it's such a useful lab organism. (I guess that means elephants were designed to confound discovery.) Still not sure how he plans to test that hypothesis...

Admonitus · 11 October 2004

I thought as much about Ohio. Where's that list of misrepresented papers (38, IIRC) that they tried to use? I've been looking around for it and the rebuttals that the NCSE put together.

RBH · 11 October 2004

Here.

Nick · 11 October 2004

Regarding C. elegans and other model organisms, they are always picked because they are (usually unlike most other organisms) easy to experiment on in various ways. Small genomes, ultra-rapid reproduction, uncomplicated growing conditions, etc. Therefore they are often not particularly representative of even their relatives.

So there is some intelligent design involved, it's just in the selection of the model organisms...

Admonitus · 11 October 2004

How cute! Thanks for the heads-up!

And, by the way RBH, you're not Russ Humpreys, right? Just checking...

Ian Musgrave · 11 October 2004

Actually, DPG is produced during conditions of hypoxia; for example, [DPG] goes up when subjects experience a gain of altitude. This makes sense, because the objective is for Hemoglobin to release O2 to the tissues when there isn't much around.
— Frank Schmidt

Actually, it is a bit more complicated than that. The objective of haemoglobin is to bind oxygen from the oxygen-rich tissue of the lungs, and release the oxygen in the oxygen-poor tissues. DPG binds to the deoxy form of haemoglobin, stabilising it and making it more likely that it will release O₂. But this also means that it is less likely to take up oxygen. This makes things a bit of a trade-off. When oxygen levels are low in the lungs, high DPG levels would mean that haemoglobin would bind less oxygen, and thus be rather bad for the organism. Now, things get a little complicated, because hypoxia ain't hypoxia. In mild to modest hypoxia due to exposure to high altitude, mammals hyperventilate. This actually increases the O₂ levels in lung tissue, saturating the haemoglobin. To shift the balance to tissue release, DPG levels are increased. DPG levels increase as DPG synthesis is increased and breakdown is slowed by the alkalosis produced by hyperventilation (which also increases oxygen affinity). At higher elevations, where hyperventilation cannot maintain lung O₂ saturation, DPG levels fall. Nikinmaa M (2000) Haemoglobin function in vertebrates: evolutionary changes in cellular regulation in hypoxia, Respriration Physiology 128, 217-329

Matthew Heaney · 11 October 2004

What's Snoke's story? Is he a creationist, too? Is Snoke aware of Behe's creationist pedigree?

Admonitus · 12 October 2004

Matt Heaney, you didn't go to Rice, did you?

RBH · 12 October 2004

Admonitus asked

And, by the way RBH, you're not Russ Humpreys, right? Just checking . . .

Um. Right. Not me. :) RBH

Reed A. Cartwright · 12 October 2004

What's Snoke's story? Is he a creationist, too? Is Snoke aware of Behe's creationist pedigree?
— Matthew Heaney

As far as I can tell, Snoke is an old-earth creationist (ala Hugh Ross). You can find some of his theological writtings here. Notice his defence of "God-of-the-Gaps."

Russell · 12 October 2004

What's Snoke's story? Is he a creationist, too? Is Snoke aware of Behe's creationist pedigree?

Note, also, Snoke does not claim to be a biologist. He's in the Department of Physics and Astronomy, University of Pittsburgh. I suspect the collaboration results from a mutual commitment to creationism and Behe's need for mathematical expertise.

Reed A. Cartwright · 14 October 2004

I am curious what supporters of Behe and Snoke (2004) think of our critical essay. So far I have seen no reaction from the anti-evolution community.

Gary Hurd · 14 October 2004

It was great fun looking over everyone's sholder while this article was written. And it was a similar process to "Meyer's Hopeless Monster" in that there were weeks of careful discussion of the science literature and no pointless discussion of "creationist" versus "militant Darwinist" points of view.

This is what made the recent whinning from the DI so pathetic: the critism of Behe and Snokes and of Meyer is devistating, and is purely and simplely on the science or lack of science.

Pim van Meurs · 14 October 2004

In addition, the contributors to the Meyer's Hopeless Monsters did not hide behind the anonymity of the term 'Discovery Institute Fellows' to let their objections be known. As Gary points out, the recent publications of the various ID papers has presented an unprecedented opportunity to analyse their claims. Nothing much relevant to ID remains.

Admonitus · 14 October 2004

I'll ask John Bracht what he thinks of the review. He asked me to review it in depth, and when I presented two paragraphs of hand-waving (which captured the salient points you all have brought up, nonetheless), he challenged me for not reading carefully enough. My impression is that he thinks I'm running from the issue because, like you all, I'm just a closed-minded Darwinian sycophant. But, I told him that I thought there was a flaw in the fact that Behe and Snoke were looking for the chances of getting a particular poker hand, not the historical possibilities. He insists that Behe and Snoke 2004 is a solid contribution, motivated Intelligent Design. You're right, though--there's no etest of intelligent design in the paper!

Steve · 14 October 2004

As far as I can tell, this paper is only a solid contribution to my amusement.

"may well be the nail in the coffin [and] the crumbling of the Berlin wall of Darwinian evolution"--Dembski

Ah, that's the good stuff.

Dave Cerutti · 15 October 2004

Well, I suppose I shouldn't go speaking for John, seeing as he and I grew up on the same street and he's a friendly guy once you get past his ability to blithely insist that he's figured out the cornerstone of biology by reading a few books and studying with Bill Dembski for a year. Yes, the Behe and Snoke paper should get its cummuppins. Russell said you were submitting a rebuttal to a peer-reviewed journal--is that so?

Reed A. Cartwright · 15 October 2004

Russell said you were submitting a rebuttal to a peer-reviewed journal---is that so?
— Dave Cerutti

Not a rebuttal per se, more like better theory.

Dave Cerutti · 15 October 2004

While you're at it, you might want to draw the line between theory and computer simulation. I do the latter--and I'd call it alchemy, not theory.

Reed A. Cartwright · 15 October 2004

While you're at it, you might want to draw the line between theory and computer simulation.
— Dave Cerutti

What line would that be? Simulation is just as much a part of theoretical work as mathematics and statistics are.

Dave Cerutti · 15 October 2004

I distinguish theory from simulation, but you are correct, simulation is merely the application of theory. There have been occasional articles warning of the dangers of hailing certain simulations because they were done with a bigger computer and were more expensive to calculate than others with better theory behind them, but that particular case does not actually seem to be a problem with Behe and Snoke.

RBH · 15 October 2004

Dave Cerutti wrote

I distinguish theory from simulation, but you are correct, simulation is merely the application of theory.

I myself prefer the locution "simulation is an embodiment of theory." A well-designed simulation embodies and thereby formalizes a theory in much the same way that mathematicizing a theory formalizes it. In particular, a well written simulation can formalize theories for which no mathematics yet exist, as for example the problem of coupling linked levels (e.g., individuals and populations) characterized by different non-linear dynamics. RBH

~DS~ · 16 October 2004

O.T. but BTW, Mr. George Gilder has made an appearance on Pharyngula.

Reed A. Cartwright · 16 October 2004

LOL, Gilder is sounding like the new-age quantum nutjobs that Nick and I saw in What the #$*! Do We Know!?.

Neil Johnson · 18 October 2004

Reed,

James Randi has taken to referring "new-age" concepts by dropping the hyphen: "newage". Pronounce it like it smells...

Neil

Russell · 18 October 2004

"newage". Pronounce it like it smells

In a similar vein, for when the Discovery Institute and its supporters put their spin machine into overdrive spewing out disinformation and distraction, as e.g. recently in l'affaire Meyer , I have coined the term: spewage .

Reed A. Cartwright · 18 October 2004

Apparantly Salvador thinks Musgrave et al. (2004) is based on a straw-man of Behe and Snoke (2004). I'm curious what exactly this straw-man is.

PvM · 18 October 2004

That is another misrepresentation. Behe and Snoke never argue for one true sequence.
— Sal

Has Sal read the paper?

It's depressing the Pandas continue to use tactics like literature bluffing and incorrect representation to argue their case.
— Sal

Irony alert... So far ID proponents seem to have remained quiet on the Behe and Snoke critique :-)

PvM · 18 October 2004

The three boxes outlined in blue are the positions that must change in order to produce the new multiresidue (MR) feature
— Behe Snoke

For simplicity, each position in an array, representing sites which must be changed to yield an MR feature, can be in either of just two states---the original incompatible state or the mutated, compatible state. Mutations can change a site either forward from incompatible to compatible or backward from compatible to incompatible.

Reed A. Cartwright · 18 October 2004

I see that Salvador is already quote-mining.

Even the article the Pandas Link to Discovery Institute 2004 does not proclaim it as the long awaited paper. Another misrepresentation.
— Salvador on ARN

Salvador forgot to include the end of the previous paragraph, which establishes the context of "long-awaited." Also, Salvador apparantly hasn't read Musgrave et al. (2004) carefully. Although, we didn't quote Behe and Snoke's "we are skeptical" comment we do devote a paragraph in the introduction to it.

Reed A. Cartwright · 18 October 2004

For crying out loud! [Mad] Mike and Dave said these are the very things worth exploring because the simplistic pathway will not work.
— Salvador on ARN

No, that is not what Behe and Snoke are saying. Their conclusion is that "gene duplication and point mutation . . . is problematic when multiple mutations are required." This is different than saying that the particular structure of their model needs to be tweaked. The problems mentioned are all issues that can be incorporated into a model looking at duplication and point mutation. For example, there is no hint by Behe and Snoke that confidence intervals should be explored. IMO, this is a considerable oversight in the paper.

Salvador T. Cordova · 18 October 2004

Hi PvM, Well, I have a policy. If you mention my name at PandasThumb, I feel free to visit and post a response. Every one is invited to see: PandasThumbs Representations This supposed rebuttal by Musgrave, Reuland, Cartwright (MRC) is a strawman. They insinuate that Behe and Snoke are trying to extrapolate the simplistic gene duplication model to all biology. That's misleading. The voluminous attack of the strawman representation and caricature is misleading. Nonetheless I found it entertaining. Reminded me of similar attacks on strawmen by GME and Elsberry and Shallit.

Behe and Snoke write: Although large uncertainties remain, it nonetheless seems reasonable to conclude that, although gene duplication and point mutation may be an effective mechanism for exploring closely neighboring genetic space for novel functions, where single mutations produce selectable effects, this conceptually simple pathway for developing new functions is problematic when multiple mutations are required. Thus, as a rule, we should look to more complicated pathways, perhaps involving insertion, deletion, recombination, selection of intermediate states, or other mechanisms, to account for most MR protein features.

In no way are they arguing the model the explore repudiates all of Darwinism. They have simply negated one simplistic pathway which cannot work and suggest exploration of other pathways. But this amusing blurb made it through peer review:

Although many scientists assume that Darwinian processes account for the evolution of complex biochemical systems, we are skeptical.

An IDist and a creationist mangage to slip that into a peer-reviewed paper. Sweet! Salvador PS Nonetheless I credit you guys with a most impressive slaying of a strawman. Never saw so much intellectual artilery brought to bear on a scarecrow.

Reed A. Cartwright · 18 October 2004

They insinuate that Behe and Snoke are trying to extrapolate the simplistic gene duplication model to all biology.
— Salvador

Not at all. We demonstrate that Behe and Snoke use a simplistic model of the evolution of new protein functions by duplication and point mutation to falsly conclude that point mutation and gene duplication in nature is a problematic explaination. Behe and Snoke do demonstrate that their ideal process when their parameters is not a good fit for the results we see in nature. However, we point out that their ideal process is not a faithful representation of the natural process of the evolution of new molecular functions via duplication and point mutation. This is why their conclusion is unjustified.

Steve Reuland · 18 October 2004

In no way are they arguing the model the explore repudiates all of Darwinism.
— Salvador

Erm, we never said they did. For someone who seems convinced that we're slaying strawmen, you sure don't hesitate to slay them yourself.

Dave Cerutti · 18 October 2004

Salvador wrote:

Well, I have a policy. If you mention my name at PandasThumb, I feel free to visit and post a response.

You're always free to do that. What's your point?

Pim van Meurs · 18 October 2004

Given Sal's track record it may be wiser for him to change his policy to remain quiet. It's just embarassing to see him make accusations which are often plain wrong. When confronted with these facts I wonder if Sal will apologize to the authors of the critique for the way he 'represented' their arguments.
Btw, Sal you are always free to visit and post a response. Keep up the good work my friend.

Randy · 18 October 2004

did you notice that he accused you of making the strawman argument BEFORE HE READ the B&S paper?

Reed A. Cartwright · 18 October 2004

Anyone else think that Salvador is trying to shift the ARN thread away from Behe and Snoke's paper?

Stirling Newberry · 18 October 2004

Behe's math doesn't add up on its own terms. If the rate of disfunctionalization were 70%, this would be selective pressure enough to keep multiple copies of important alelles - since one cosmic ray ripping through can kill an organism with a very high degree of probability.

Second, their model does not assume one of the most important selective pressures - pathogens. Developing neutral, but different, versions of the same genetic material is a protection against pathogens binding to a key site. A variation of "cousin" versions of the same allele has selective advantage in itself.

In short, their paper concludes "if natural selection didn't work, it couldn't possibly work."

Salvador T. Cordova · 19 October 2004

Erm, we never said they did. For someone who seems convinced that we're slaying strawmen, you sure don't hesitate to slay them yourself.

Ok, I'll retract that at ARN. I'm for fair play gentleman. My apologies.

Anyone else think that Salvador is trying to shift the ARN thread away from Behe and Snoke's paper?

Look your critiques are valuable from a technical standpoint, I hope the IDists will read them and learn and correct if need be. I have in the past sided with you guys (a few times) when I thought you had good arguments because it does not serve the ID cause well to go around with flawed arguments. I was just a little irritated your presentation wasn't more charitable. I'll go back an try to tone down what I said at ARN. I commend your your hard work in your review of Behe and Snoke's work. respectfully, Salvador

Reed A. Cartwright · 19 October 2004

Salvador on ARN has objected to our statement:

They use these results to conclude that point mutations plus selection cannot explain the evolution of multi-residue traits.

Apparantly he needs to look harder at the last paragraph of Behe and Snoke (2004), with some of the fatty clauses trimed:

. . . it nonetheless seems reasonable to conclude that . . . gene duplication and point mutation . . . is problematic [as a pathway for new functions] when multiple mutations are required.

In the conclusion "this conceptually simple pathway" refers to "gene duplication and point mutation" not to their simplistic model. If Salvador wants more confimation, this is what Behe said at DDDV:

So the point is that, whenever you have an apparatus that needs three or more proteins in order to work, that's essentially beyond the capacity of random mutation and natural selection to produce. Even when you have something that only has two proteins stuck together, that's a very very rare event in the history of life on earth. So the tentative conclusions from this--and there are a couple of caveats that I haven't gone into but which I'd glad to talk about if you wanted to--two tentative conclusions: that is that the formation of new protein-protein interactions would be very rare in the history of life, and the formation of two such interactions in an irreducibly complex complex is practically impossible. So what I think is going to turn out to be the case--although we will be required to do it--is that design is going to be seen to extend very deeply into the cell and perhaps beyond that as well.

Salvador also tries to defend Behe and Snoke by arguing that the model was just a start, but he misses the point of our criticism that their current conclusions are not supported by work they have done so far. In other words, if they are just starting their research, why are they make the big conclusions they make? "Just a start" is a critcism of the conclusions drawn in the paper, not a defense.

Steve Reuland · 20 October 2004

Look your critiques are valuable from a technical standpoint, I hope the IDists will read them and learn and correct if need be. I have in the past sided with you guys (a few times) when I thought you had good arguments because it does not serve the ID cause well to go around with flawed arguments. I was just a little irritated your presentation wasn't more charitable. I'll go back an try to tone down what I said at ARN. I commend your your hard work in your review of Behe and Snoke's work.
— Salvador

Thank you for the kind words Salvador. I hope you won't read our critique as being uncharitable; I honestly don't think that Behe and Snoke's paper was bad; I think it was very well written and certainly thoughtful and relevant to the issues that Behe has been trying to raise. As I said earlier, I prefer this kind of technical approach to simply making bald assertions that are hard to counter other than saying, no, you're wrong. I liked the paper in that it attempted to do what we've been saying ID advocates should do: Deal with the actual science of evolutionary biology, rather than pretend it doesn't exist. (The other part -- coming up with an actual theory of ID -- is forthcoming, I'm sure ;)). But the paper does have some issues that need to be aired out, and it definitely doesn't score any real points for ID (in my opinion at least -- and Reed's and Ian's too, of course). Like you say, I hope we've done a service by correcting any technical errors that may be present in their paper.

Salvador T. Cordova · 20 October 2004

Steve Reuland wrote: Thank you for the kind words Salvador.

You are welcome Steve. I just finished editing out some of my wording over at ARN to reciprocate the hostpitality and civility you have extended to me. I appreciate the scholarly approach you all have taken, and I will try to reciprocate. I believe the IDists will comb through what you all have written here at Pandas thumb on Behe's paper. The post-diction single target specification is a difficult problem to overcome in making specifications. My feeling is Elsberry actually inspired a solution with his SAI, namely IDist can make their probability arguments more solid by appealing to sequence convergence in unrelated lineages. Behe and Snokes equations, with apporpriate modifications might actually be well suited for applications in that context especially if the expression pathways for identical protein sequences is arrived at by different genes and gene circuits in these unrelated organisms. I believe, Rick Sternberg, though he disagreed with Meyer's ultimate conclusions, recognized the problems posed by molecular convergence. I would hope Behe and Snoke submit another paper which address the considerations you all have posted in your critique and that they seek ways to apply their models in the context of molecular convergences. respectfully, Salvador

PvM · 21 October 2004

My feeling is Elsberry actually inspired a solution with his SAI, namely IDist can make their probability arguments more solid by appealing to sequence convergence in unrelated lineages.
— Salvador

Are you a supporter of MDT then? Where the designers find varying solutions to the same problem indepedently? How would sequence convergence in unrelated lineages help ID's case?

Behe and Snokes equations, with apporpriate modifications might actually be well suited for applications in that context especially if the expression pathways for identical protein sequences is arrived at by different genes and gene circuits in these unrelated organisms.
— Salvador

I would like to hear more about these appropriate modifications to Behe and Snoke that would serve your above stated purpose? Could you outline the research approach(es)? What is the relevant ID hypothesis for instance?

Great White Wonder · 21 October 2004

I would like to hear more about these appropriate modifications to Behe and Snoke that would serve your above stated purpose? Could you outline the research approach(es)? What is the relevant ID hypothesis for instance?

I remember one time when I was a kid going fishing for northern pike with my dad. Occasionally a pike will take bait and not swim away with it but just roll it around in its mouth for a while. I recall one time realizing that my bait had been taken not by detecting any resistance on the line but by seeing the pike in the water staring up at me with the line hanging out of its mouth? Of course, we took the line out before we ate him.

Alex Merz · 25 November 2004

I just read the paper today, and now this rebuttal. It's absolutely outstanding. I'd like to add one more point about the mutation rate assumptions. B&S themselves point out that their simulations might be most useful for asexual, clonal, haploid organisms. Some - not all, not even most - bacteria fall within this category. And even among these organisms, such as E. coli, it has become evident that mutation rates are not constant. They are not constant over the genome, as Seymour Benzer's magnificent work with T4 phage and Jeff Miller's work with LacI showed us, they are not constant over time, as work from many labs studying, for example, the SOS response has shown us. Average rates can't tell us what's happened at each locus over time.

In addition, even bugs with very low rates of intraspecies recombination show enormous evidence of horizontal gene transfer. What this tells us is that recombination simply cannot be ignored. Moreover, experimental evidence that recombination speeds evolution is overwhelming. Do a PubMed search for [AU] Stemmer WP, and you'll find dozens of papers showing this experimentally.

RBH · 3 December 2004

The title of the Behe & Snoke paper is "Simulating evolution by gene duplication of protein features that require multiple amino acid residues." Given that they ignore natural selection, recombination (lateral gene transfer), and the variability of mutation rates across both loci and time, can they be indicted for violating the Truth in Titles Act?

RBH

John Harshman · 3 December 2004

Quick question for population genetics experts. It's always been my understanding that population size doesn't enter into the equation for neutral fixation probabilities. So why are population size estimates necessary in any of the simulations being discussed, and why do they make such big differences?

Is this because we're actually talking not about fixation rates, but about a race between two competing fixations, i.e. the target mutation and a null mutation?

It's still not clear to me, even given that, why population size ought to matter.

Reed A. Cartwright · 3 December 2004

Under some simple theory, population size doesn't matter if you are looking at the rate of neutral gene substitution. However, if you are looking at the probability that a particular gene copy goes to fixation, size does matter.

John Harshman · 3 December 2004

OK, but why are we looking at that probability? It assumes that we care about whether a particular single mutation goes to fixation, rather than another, identical mutation in the same population. Isn't it the fixation of a particular allele we are talking about, rather than a particular mutational event? The "simple theory" you talk about is basic neutral theory, and the simulations are in fact supposed to represent neutral evolution.

The advantage of a large population is that there are more instances of any given mutation, but the disadvantage is that any given mutation is less likely to be fixed. The advantage of a small population is that any given neutral mutation is more likely to be fixed, but the disadvantage is that there are fewer instances of any particular mutation. The population parameters cancel out if all we care about is whether a change from, say, A to G at some particular site becomes fixed. And isn't that what the simulations are all about?

So I ask again, why is population size even a parameter in these simulations?

Can someone explain what if anything I am missing here?

By the way, how do I quote?

Reed A. Cartwright · 3 December 2004

John,

You need a population size parameter if you are simulating drift because otherwise you wouldn't be simulating drift.

Now I'm not exactly sure why the population size doesn't cancel out in the results of BS04 versus classic theory. (I haven't had time to look back over the paper.) I suspect it is because the classic results are steady state results and BS04 results are not from a steady state. Instead of looking at the mean time between substitutions in a steady state, like classic theory does, BS04 looks at the mean time until the first substitution. Other features of the model, like reset on null or multiple mutations, could also be at fault.

PS: You can use Kwickcode to format your post.

John Harshman · 3 December 2004

Now I'm not exactly sure why the population size doesn't cancel out in the results of BS04 versus classic theory. (I haven't had time to look back over the paper.) I suspect it is because the classic results are steady state results and BS04 results are not from a steady state. Instead of looking at the mean time between substitutions in a steady state, like classic theory does, BS04 looks at the mean time until the first substitution.

This should be the same number, given a Poisson process, shouldn't it?

Other features of the model, like reset on null or multiple mutations, could also be at fault.

If that's all it is, then this is a serious flaw in the simulation, in that it doesn't even reproduce the most basic result we expect from neutral evolution. It's such a fundamental flaw that all other flaws pale in comparison. That's why I have trouble crediting that it's really something B&S as well as you three have overlooked until I mentioned it. What gives?

Great White Wonder · 3 December 2004

That's why I have trouble crediting that it's really something B&S as well as you three have overlooked until I mentioned it. What gives?

Perhaps the flaw is like the flaw in Dembski's ID "theory": so fundamental that one misses it at first because one mistakenly assumes that the human who put forth the theory actually cared about its scientific validity.

Erik · 4 December 2004

The BS model is not a model of neutral fixation probabilities. It is a model of the emergence of a particular new sequence through neutral evolution, followed by the fixation of the new sequence through non-neutral evolution. Of course, the population size is important for the first part (in an infinite population the new sequence will appear extremely fast no matter how many MR sites that must be exactly right) and there's no reason to think that population size effects in the second part would exactly cancel out this effect. Update:

Quick question for population genetics experts. It's always been my understanding that population size doesn't enter into the equation for neutral fixation probabilities. So why are population size estimates necessary in any of the simulations being discussed, and why do they make such big differences?
— John Harshman (#11184)

Quick answer from a population genetics non-expert: Behe & Snoke aren't studying neutral fixation probabilities! In the BS model, a new MR feature is considered to require that all MR sites have the right values. Thus, the perfect MR sequence has a higher fitness than the imperfect ones, which all have the same fitness. Mutations that change an imperfect MR sequence into another imperfect (possibly less imperfect) MR sequence are therefore neutral. Mutations that change an imperfect MR sequence into the perfect MR sequence, however, are not treated as neutral. The larger the population, the shorter the time until the perfect MR sequence appears in the population. Once the perfect MR sequence appears, the population dynamics ceases to be neutral.

Jon Fleming · 4 December 2004

By the way, how do I quote?
— John Harshman

Hi, John.

You use KwikCode, which has a certain similarity to HTML. The quote above is made by

By the way, how do I quote?
— John Harshman

Bob Morrison · 4 December 2004

I am offended by the phase "evolution/creation" controversy when there is no controversy. Evolution is reality. Creationism is a belief founded on mythology and very poor myths at that. Consider that the Bible says that the Tower of Babble project was confounded by God becaue he didn't want anybody building a short cut to Heaven and that is why we have different languages today. What rubbish!! That's from anyway you consider it. Still, Creationists hope to find some small ray of substance in this stuff that isn't so outright false on it's face. Check out our real founding father's beliefs at "Religion-Just say NO"

Pete Dunkelberg · 4 December 2004

(paraphrasing) I can't believe the Behe & Snoke paper is that unreal.
— John Harshman

John, I'm surprised at you, thinking their model would be realistic :) But unscrewing it is fun, isn't it? Here's what I think may have happened: by the time of Nick's flagellum article if not sooner, the DI gang must have realized that IC, their one supposed link to biology, was busted. In particular, Nick made it clear that even their prize horse, the bacterial flagellum, could indeed evolve, needing mainly the evolution of several new protein binding sites. There is no in principle difficulty involved. They could either give up or do the equivalent of jumping over the moon by proving that even simple binding sites can't evolve. Dembski stalled while Behe recruited Snoke to do some math and they got to work. As you can see, they produced an unreal model and the desired result. And were promptly busted. Meanwhile, their propaganda has convinced some local school boards to try to teach ID, just when they know more acutely than ever that they have nothing to offer.

John Harshman · 4 December 2004

Erik (*11128) wrote: Quick answer from a population genetics non-expert: Behe & Snoke aren't studying neutral fixation probabilities!

If that's so, then the simulation must incorporate an explicit selection coefficient for the relevant allele. Does it? As far as I can tell it doesn't.

Reed A. Cartwright · 4 December 2004

If that's so, then the simulation must incorporate an explicit selection coefficient for the relevant allele. Does it? As far as I can tell it doesn't.
— John Harshman

I will quote from our critique:

Behe and Snoke use five parameters to model the neutral evolution of their sequences: the number of mutations needed for the multi-residue feature (λ), the ratio of null mutations (ones that create a pseudogene) to needed mutation (ρ), the number of duplicate genes in the population (N), the selection coefficient of the multi-residue feature (s), and the instantaneous rate of point mutation per site (ν). (Contrary to what Behe and Snoke claim, ν is not a per generation rate of nucleotide mutation.) In the model, selection only acts upon a duplicate gene's multi-residue feature when it has acquired the particular λ "compatible" mutations and none of the λρ mutations that would nullify it.

Douglas Theobald · 4 December 2004

Quick answer from a population genetics non-expert: Behe & Snoke aren't studying neutral fixation probabilities!
— Erik (*11128)

If that's so, then the simulation must incorporate an explicit selection coefficient for the relevant allele. Does it? As far as I can tell it doesn't.
— Harshman

Yes, their simulation incorporates a selection coefficient for the MR feature. See equation 4. They mainly use s = 0.01. However, eqn 3 describes the time for the first appearance of an MR feature, without taking selection into account, and can be considered as describing a modified population mutation rate, in essence giving the population mutation rate for an MR feature. It of course then incorporates population size, just like in classical neutral theory where the population mutation rate = 2Nv. They don't really consider neutral fixation rates for the MR feature (since it is a bit weird to consider neutral MR alleles), but you could just multiply by the chance of neutral fixation, 1/N, and population size drops out as it should. Also, remember that in neutral theory, while the fixation rate is population size independent, the time to fixation (in generations) of a neutral allele does depend on population size (t = 4N_e, which in turn is independent of the mutation/fixation rate). Douglas

Douglas Theobald · 4 December 2004

Speaking of the selection coefficient in B&S's model, isn't equation 4 wrong? Shouldn't the denominator for the RHS be s instead of 2s, since they are working with haploids?

Reed A. Cartwright · 5 December 2004

Speaking of the selection coefficient in B&S's model, isn't equation 4 wrong? Shouldn't the denominator for the RHS be s instead of 2s, since they are working with haploids?
— Theobald

It depends. S versus 2S depends on how one sets up the model. This is true even in diploid situations. Some authors prefer {1,1+S,1+2S} while others perfer {1,1+S/2,1+S}. In the former the probability of fixation becomes approximately 2S and in the latter S. If we set up a haploid model as {1,1+S} then S would be the probability of fixation. (It is relatively simple to show using some random walk theory.) Therefore, BS04's selection regime corresponds to {1,1+2S}. I don't think they understood this, but it isn't a problem.

John Harshman · 5 December 2004

OK, here is the part that I managed to blip over:

Behe and Snoke use five parameters to model the neutral evolution of their sequences: the number of mutations needed for the multi-residue feature (λ), the ratio of null mutations (ones that create a pseudogene) to needed mutation (ρ), the number of duplicate genes in the population (N), the selection coefficient of the multi-residue feature (s), and the instantaneous rate of point mutation per site (ν). (Contrary to what Behe and Snoke claim, ν is not a per generation rate of nucleotide mutation.) In the model, selection only acts upon a duplicate gene's multi-residue feature when it has acquired the particular λ "compatible" mutations and none of the λρ mutations that would nullify it.

Let me therefore revise my naive question. If this is working right, population size should not matter until the last mutation happens to create the MRF, right? Thus all the differences between different values of N should result from the time required to fix the final mutation through selection. Is that the way the simulation actually works? It intuitively seems to me that differences among values of N should be less and less important as the size of the MRF increases, since more and more of the time will be spent fixing the neutral mutations before that final selective event, with a limit in very large populations that the selective fixation happens instantly and all the time is spent on the neutral fixations. But that doesn't seem evident from the table. In fact, time differences seem to remain constant, or even to increase slightly with increasing N, as the length of the MRF increases. In fact t seems to be converging on being exactly proportional to 1/N as MRF length increases. Flaw in model or flaw in intuition?

Douglas Theobald · 5 December 2004

If this is working right, population size should not matter until the last mutation happens to create the MRF, right? Thus all the differences between different values of N should result from the time required to fix the final mutation through selection.
— Harshman

No, I don't think that is right. The probability of an MRF occurring in a given population is directly proportional to N, after which selection takes over and fixes it. Neutral fixation doesn't even come into play. In large populations, then, the time due to selection is negligible (relatively), and the time required to generate the first MRF is proportional to 1/N. Which seems to be consistent with the observation you make here:

In fact, time differences seem to remain constant, or even to increase slightly with increasing N, as the length of the MRF increases. In fact t seems to be converging on being exactly proportional to 1/N as MRF length increases.

Douglas

Douglas Theobald · 5 December 2004

If we set up a haploid model as {1,1+S} then S would be the probability of fixation. ... Therefore, BS04's selection regime corresponds to {1,1+2S}. I don't think they understood this, but it isn't a problem.
— Reed

OK, but simply using s seems preferable to me. Anyway, parts of eqns (3) and (4) don't make sense to me (I'm probably missing something simple). First, I don't understand why the LHSs of (3) and (4) have the Nνλ term. Shouldn't it just be Nν? (Can we use that nifty latex math stuff here?) Second, in deriving (4), is it proper to just divide the RHS of (3) by the probability of fixation due to selection? I don't follow.

Reed A. Cartwright · 5 December 2004

I don't enable latex math in comments since I fell it could be a security risk.

From what I can gather BS04's (3) and (4) come from some steady state mathematics. I'm not familiar with it, but I was able to derived (3) a different (and more rigorous) way using order statistics.

Douglas Theobald · 5 December 2004

I don't enable latex math in comments since I fell it could be a security risk.
— Reed

Bummer.

... I was able to derived (3) a different (and more rigorous) way using order statistics.

Good. I suppose that you got the Nνλ term too. So, what concerns me about (4) is relating it to steady state neutral theory. Eqn (3), the time to generate the first MRF, at steady state reduces to: (A) t_f = (ρ + 1)^λ (λ^λ/λ!) / (Nνλ) So, at steady state, the time to fixation of a selective MRF should be equal to the time to generate the first MRF in a population of size N (t_f above) plus the time for selection to fix it in that population: (B) t_fx = t_f + (lnN)/s For classical neutral theory, ρ = 0 and &lambda = 1, and in this case (C) t_fx = 1/(Nν) + (lnN)/s after some rearrangement, (C) can be expressed as (D) t_fx = {s + (lnN)Nν} / (sNν) The above argument is based only on B&S (3) and established selection results. Is there some way that B&S (4) can agree with this result? If we plug ρ = 0 and &lambda = 1 into B&S (4) (using s instead of 2s), we get: (E) t_fx = 1 / (sNν) which is quite different from (D) and appears to be a problem to me. Have I messed up somewhere?

Reed A. Cartwright · 5 December 2004

BS04 actually ignores the time to fixation in (4). What (4) is really looking at is the time until the appearance of the MRF that becomes fixed. I'm not sure if B&S realized this.

Douglas Theobald · 5 December 2004

BS04 actually ignores the time to fixation in (4).
— Reed

That's certainly not what they intended, given their definition of t_fx.

What (4) is really looking at is the time until the appearance of the MRF that becomes fixed. I'm not sure if B&S realized this.

So then I don't understand why 2s appears in the denom of the RHS of (4). What does selection have to do with the time until the appearance of the MRF, when before that moment all alleles on the pathway to the MRF are neutral?

Reed A. Cartwright · 5 December 2004

The RHS of (3) is E(opportunities until an MRF is produced).

The RHS of (4) is E(opportunities until an MRF is produced that goes to fixation).

According to BS04, the probability that an MRF is produced is z = ((1+ρ)λ)^λ/λ!. (In truth, this is approximately the actual probability.)

The probability that an MRF feature, having been just produced, goes to fixation is 2S. Therefore, the probability that a MRF feature that goes to fixation is produced is P(fx|MRF)*P(MRF) = 2S*z. From a geometric model, the E(opportunities to this event) is then 1/2S/z, which is the RHS of (4).

Douglas Theobald · 5 December 2004

Thanks Reed, that makes sense to me now. I think you're right -- it seems that B&S consider (4) to be the time to fixation rather than the waiting time for the appearance of the specific MRF that gets fixed.

What do you think about the nonequilibrium stuff? Since organisms have been in existence for nearly 4 billion years, whether you think they've been evolving or not by Darwinian mechanisms, it seems pretty valid to assume steady state just to simplify all the equations and relate the results to previous work.

John Harshman · 5 December 2004

Doug Theobald (#11233) wrote: No, I don't think that is right. The probability of an MRF occurring in a given population is directly proportional to N, after which selection takes over and fixes it. Neutral fixation doesn't even come into play. In large populations, then, the time due to selection is negligible (relatively), and the time required to generate the first MRF is proportional to 1/N.

This would be true if the MRF arose in a single step as a single mutation. But it doesn't. As I understand the model, the system begins with all sites different from the final MRF, and all mutations prior to the final one are neutral. So in a 6-site MRF, the first 5 mutations necessary to create it must be fixed by drift before the final, selected mutation can happen. And the final, selected mutation is whichever one happens last.

Douglas Theobald · 5 December 2004

So in a 6-site MRF, the first 5 mutations necessary to create it must be fixed by drift before the final, selected mutation can happen.
— Harshman

Again, I don't think that is right. B&S don't mention anything about needing each successive mutation to be fixed, and I see no reason why each one should be. As I understand it, the probabilities they work out apply to the 6 successive mutations in a 6-site MRF occuring in any single copy of the gene in the population (and in its descendents).

Douglas Theobald · 5 December 2004

I think you're right --- it seems that B&S consider (4) to be the time to fixation rather than the waiting time for the appearance of the specific MRF that gets fixed.
— Douglas Theobald

I retract that. B&S do indeed realize what eqn (4) represents:

It should be noted that the time calculated from equation 4 reflects the average time required simply to produce the MR mutant that will go on to become fixed in the population; it does not explicitly include the time required for the mutation to spread and become fixed in the population once it has been produced.
— BS04 p9

STEVE · 6 December 2004

THIS IS POO I CAN NOT STAND THE RUBISH THAT IS PRODUCED ON THIS SITE

Scott Simmons · 6 December 2004

Well, that was insightful. I'll skip over the punctuation and capitalization issues in your post, STEVE, and just suggest that you put in the 'under' that you forgot in front of 'stand', and correct the spelling of 'rubish' to 'math'.

John Harshman · 6 December 2004

Doug Theobald (#11259) wrote: Again, I don't think that is right. B&S don't mention anything about needing each successive mutation to be fixed, and I see no reason why each one should be. As I understand it, the probabilities they work out apply to the 6 successive mutations in a 6-site MRF occuring in any single copy of the gene in the population (and in its descendents).

Given no recombination, the mutations must happen sequentially in a single gene-lineage. But surely the probability of additive events must depend partly on the frequencies of the various lineages in the population. Even if they are integrating over all possibilities (and are they really?) the mass of the probability distribution must center on a sequential fixation scenario; other scenarios must be extremely unlikely in comparison, and contribute little to the distribution. Again this is just my intuition, and I would like to see my balloon punctured by a explanation of why my intuition is wrong.

Douglas Theobald · 6 December 2004

But surely the probability of additive events must depend partly on the frequencies of the various lineages in the population. Even if they are integrating over all possibilities (and are they really?) ...
— Harshman

I don't believe you need to explicitly integrate over all possible frequencies in the population. If you can figure out the probability of getting one hit in a single copy then you can calculate the joint probability of getting multiple hits in that same single copy. Remember, the average frequency of a neutral gene remains constant between generations. The fact that neutral genes are actually randomly increasing and decreasing in frequency during the process of MRF creation will affect the distribution of times until generation of the MRF that gets fixed, but it shouldn't affect the average time. For a λ-site MRF, then, the joint probability of getting λ hits in the same lineage (before nullifying that gene copy) is relatively easy to figure out (((1 + ρ)λ)^λ / λ!). In order to randomly generate one single copy of an MRF gene (on average), all you need to do is mutate a number of times equal to the inverse of that probability. Also, I know that B&S don't wait for fixation in their simulations (except sometimes for the selective step after creation of the MRF), and the fact that their equation (4) fits the simulations so closely is good evidence they got it right.

John Harshman · 6 December 2004

OK, I think I see this. Let me try to put it in terms I understand and see if you like it. Since the mean frequency of a lineage will be 1/N forever, we can consider a sort of average population as a bundle of N such lineages (like a package of spaghetti), and all we have to do is follow their individual fates, ending either in nullification (at which point the lineage resets to zero) or MRF. B&S are determining the mean time until the bundle produces an MRF in one lineage. There is no selection involved, and no drift since all frequencies are held constant at the mean. Thus the time is directly proportional to the number of lineages in the bundle, i.e. N.

Douglas Theobald · 6 December 2004

John, yeah that seems right to me. At least that is how I see equation (3). Equation (4) has the selection term, which takes into account that, even though the MRF once formed is selectable, most rare advantageous alleles are lost in finite populations (the probability of fixing an advantageous allele with frequency 1/N is equal to s). So the time to generating an MRF that will eventually get selected is longer by a factor inversely proportional to s.

B&S don't explicitly consider the time for the MRF (once generated) to fix, since they say that time is negligible relative to the time it takes to randomly generate the first MRF that will fix. They apparently verified that it their simulations.

Douglas Theobald · 7 December 2004

Nick made it clear that even their prize horse, the bacterial flagellum, could indeed evolve, needing mainly the evolution of several new protein binding sites. There is no in principle difficulty involved. They could either give up or do the equivalent of jumping over the moon by proving that even simple binding sites can't evolve. Dembski stalled while Behe recruited Snoke to do some math and they got to work. As you can see, they produced an unreal model and the desired result.
— Dunkelberg

Actually, I think they've shot themselves in the foot here, unreal model and all. Bacteria evolved the bacterial flagellum, not multicellular metazoa. Let's use B&S's own parameters, which are undoubtedly pessimistic. Assuming a MRF of size 10 (a very good-sized complex binding site) and B&S's (unrealistic) estimate that 70% of all coding mutations kill the function of an average-sized domain (150 aa), then ρ = 150 × 3 × 3 × 0.7 / 10 = 94.5. Worldwide bacterial population size is ~5 × 10³⁰. Bacterial colonies are so large (in terms of numbers of individuals) and undergoing rampant horizontal transfer, it is quite reasonable to consider them approximately equilibrated in terms of mutations and duplications. According then to B&S equation (4), on average we will see the emergence of a fixable 10-site MR feature about three times a day. About a thousand a year. All in the complete absence of selection! Who needs selection and gradual Darwinian pathways between intermediate functional proteins when random mutation will create complex functions for you like this? (OK, you do need some weak selection to quickly fix the thing after it's made, but that is trivial).

Israel Falcone · 9 December 2004

The article was good but a little biased.

Reed A. Cartwright · 9 December 2004

Israel,

Care to elaborate on what you think our bias was?

Nick (Matzke) · 10 December 2004

Quoth Doug Theobald:

Actually, I think they've shot themselves in the foot here, unreal model and all. Bacteria evolved the bacterial flagellum, not multicellular metazoa. Let's use B&S's own parameters, which are undoubtedly pessimistic. Assuming a MRF of size 10 (a very good-sized complex binding site) and B&S's (unrealistic) estimate that 70% of all coding mutations kill the function of an average-sized domain (150 aa), then ρ = 150 × 3 × 3 × 0.7 / 10 = 94.5. Worldwide bacterial population size is ~5 × 10^30. Bacterial colonies are so large (in terms of numbers of individuals) and undergoing rampant horizontal transfer, it is quite reasonable to consider them approximately equilibrated in terms of mutations and duplications. According then to B&S equation (4), on average we will see the emergence of a fixable 10-site MR feature about three times a day. About a thousand a year. All in the complete absence of selection! Who needs selection and gradual Darwinian pathways between intermediate functional proteins when random mutation will create complex functions for you like this? (OK, you do need some weak selection to quickly fix the thing after it's made, but that is trivial).

Is this for real? (That is, this is a correct calculation from the model?) We will see a *particular* 10-site MR feature occur three times a day somewhere in the global bacterial population, according to the B&S model? If so: OK, based on the big flagellum paper, let's say that for the flagellum to evolve, it takes: * about 5 completely new binding sites (each new binding site couples two preexisting subsystems; each new binding site selectable by selection for a new function), * plus modifying binding sites perhaps 20 times (for the improvement of the pilus attachment, divergence of the ancestral axial protein into the rod/hook/linkers/flagellum/cap proteins, tweaking of chemotaxis and motor proteins, etc.), each time selectable for improvement of current function. * If you like, in addition to a ~10 MR binding site, you could add another mutation representing a necessary duplication of the appropriate gene(s). * Let's take for granted the ancestral subsystems with other functions (and in most cases, we have good evidence that they existed, and excellent evidence (i.e., directly observed in modern times) that such subsystems are functionally viable.) * Time to fixation under mild selection might be rather long in a population of 5x10^30 (or, if it is unrealistic to think that all 5x10^30 bacteria on earth are participating in this, we might consider some fraction of the bacterial biomass, e.g. 0.1%, that represents some cosmopolitan bacterial strain. So a perhaps a population of 1x10^27.) Given these crude assumptions and the B&S model, can you guys calculate how long the evolution of a prokaryote flagellum might take? PS: Pertinent to this, Sonleitner says in a Pandas update:

Microbiologists have estimated that there are 5 x 1024 bacteria living on earth in the ocean, in the soil, beneath the surface, in the air, and inside animals. Soil and subsurface habitats account for 94%; the insides of animals account for only a fraction of 1 percent. In the oceans, any given bacterial gene is estimated to undergo an average of 4 mutations every 20 minutes (Anonymous 1998). [...] [Anonymous]. Whole lotta bugs. Discover 1998 Dec; 19 (12): 28.

I have seen something similar in An Official Journal like PNAS. Ah yes:

Genes that are widely distributed in prokaryotes have a tremendous opportunity for mutational change, and the evolution of conserved genes must be otherwise greatly constrained. Assuming a prokaryotic mutation rate of 4x10^-7 mutations per gene per DNA replication (86, 87), four simultaneous mutations in every gene shared by the populations of marine heterotrophs (in the upper 200 m), marine autotrophs, soil prokaryotes, or prokaryotes in domestic animals would be expected to occur once every 0.4, 0.5, 3.4, or 170 hr, respectively. Similarly, five simultaneous mutations in every gene shared by all four populations would be expected to occur every 60 yr. The capacity for a large number of simultaneous mutations distinguishes prokaryotic from eukaryotic evolution and should be explicitly considered in methods of phylogenetic analyses. [William B. Whitman, David C. Coleman, and William J. Wiebe (1998). Prokaryotes: The unseen majority. PNAS. Vol. 95, Issue 12, 6578-6583.]

Douglas Theobald · 10 December 2004

What's the average size, or size range, of the flagellar proteins under consideration here?

Nick · 10 December 2004

Ranges between 89 and 692. I'd say a median is roughly 200 or 225.

Size and stoichiometry table here.

Douglas Theobald · 10 December 2004

OK, here's a crude approximation assuming:

the bacterial population is mutation/duplication saturated

s = 0.001 for all genes once selectable

instantaneous mutation rate ν = 10^-8

population size N = 10²⁷

percentage of null mutations is 70% (B&S's value, unrealistic)

average size of a domain evolving the binding site in each protein is 225 aa (this is conservative, since average domains are not this big)

MRF is of size λ = 10

thus ρ = (225 (aa) × 3 (nuc/aa) × 3 (possible mutations / nuc) × 0.70 (nulls / possible mutation) - λ) / λ = 141

5 new MRF binding sites, each that must undergo 20 MRF steps each (5 × 20 = 100 MRF generations total)

all steps are sequential, i.e. no simultaneous selection going on, each binding site cannot evolve until a previous one has already been fixed (again, conservative and unrealistic)

then, T_fx = (ρ + 1)^λ (λ^λ/λ!) / (2sNνλ) = ~5 × 10⁷ generations per MRF The average time for the fixation of each MRF that will be fixed is T_s = ln(N)/s . Total time is then (T_fx + T_s) × 100 = ~5 × 10⁹ generations, or given an hour per generation, a little over half a million years. Increasing the population size, selection coeff, or mutation rate decreases the time roughly proportionally. Decreasing the assumed saturation increases the time proportionally. If we use a more realistic null fraction, say 35%, then the total time decreases substantially to a little over a thousand years (and the selection fixation time becomes relatively important). If we further use a more realistic domain size, and a larger selection coefficient, then we get a really short times, where the limiting factor is fixation by selection. The ironic thing about this analysis by B&S -- I never used to consider a neutral pathway like this (waiting for 10 needed mutations before selection could kick in) to be a viable scenario. Now I think it very well could be in organisms with fast generation times and large populations.

Nick · 10 December 2004

LOL! This is the one part I didn't get:

5 new MRF binding sites, each that must undergo 20 MRF steps (5 × 20 = 100 MRF generations total)

I was thinking you might need about 20 + 5 = 25 new binding sites total (5 subsystem cooption events, 20 improvement of current system function events) to get a flagellum from subsystems. But you're doing 5 x 20 = 100. So either I'm misunderstanding, or your estimate is 300% too high and the actual estimate should be about 125,000 years.

Nick · 10 December 2004

Regarding neutral pathways vs. selection, my own suspicion is that at the level of bacteria Selection Rules All even more than usual, and that "gradual" pathways dominate ("gradual" = point mutations and single other events, like e.g. duplication/fusion/deletion/rearrangement, each event followed by selection). A 10-aa MRF neutral jump might be possible, but a more gradual route will be found first. IMVVHO of course.

Douglas Theobald · 10 December 2004

So either I'm misunderstanding, or your estimate is 300% too high and the actual estimate should be about 125,000 years.
— Nick

No and yes, I misunderstood you and given what you meant the estimate is too high. I also think you're right about the evolution of new functions, selection must dominate. Nevertheless, if you've got neutral alleles on their way to being B&S-style MRFs, one of them may find some beneficial function that you can't get to via the (more) gradual pathway. Kindof the "hopping between adaptive peaks" sort of thing. Which, due to BS04, I think is much more likely now than I did before.

Zipper · 25 December 2004

So how many point mutations does it take to evolve a new function from duplicated gene? In the abstract of their study, Behe & Snoke write that the models of the process assume that it would take only one mutation. Is this realistic at all? I mean how well does the experiments support this assumption?

And is a simple point mutation(s) only way to get new function from duplicated gene? What about for example gene conversion?

John Mercer · 4 February 2005

There's an even bigger hole in this BS that you're all missing--alternative splicing.

Differences in splicing (which can result from single nucleotide changes) dwarf differences in nucleotide and aa sequence when you look at closely-related species.

John Mercer · 4 February 2005

There's an even bigger hole in this BS that you're all missing--alternative splicing.

Differences in splicing (which can result from single nucleotide changes) dwarf differences in nucleotide and aa sequence when you look at closely-related species.

Ian Musgrave · 4 May 2005

I've decided it men, there was an intelligent designer.
No other way to answer the life quesrions.
Sorry to let you all down, but you're wrong.

paul flocken · 4 May 2005

what have you done with Ian Musgrove, you foul fiend

Theory is as Theory Does.

Contents

One True Sequence?

Evolution of DPG Binding

Subfunctionalization vs. Neofunctionalization

Rho-Oh!

How unlikely is the evolution of MR features?

Applying Behe & Snoke’s equations to the DPG binding site example

Conclusion

Acknowledgements

References

114 Comments