RNA networks, protein networks all seem to exhibit a scale-free structure. I intend to show that this scale free structure and other aspects of these networks not only can be expected from simple evolutionary principles but also how this scale free structure helps explain such issues as modularity, robustness, and evolvability.
Characteristics of scale-free networks
As is well known DNA sequences map to RNA or protein structures.
There are far more sequences than structures
Contains few highly-connected motifs and many less connected nodes
Motifs have a neutral network which extends throughout sequence space
For these frequent structures, their networks expand through sequence space, this means that gor any given fold, one can traverse through sequence space (that is change every nucleotide position) without changing the structure of the fold. In addition these structures are close in the sense that any such structure is within a small distance from any random sequence.
These findings have significant implications for our understanding of evolution.
The existence of extended and connected neutral networks in RNA sequence space was proven by an elegant experiment recently published by Erik Schultes and David Bartel [60]. They designed an RNA sequence which forms two known structures (of chain length (l = 88) with different catalytic activities, an RNA ligase evolved in the laboratory [61] and a natural cleavage ribozyme isolated from hepatitis delta virus RNA [62]. The two structures have no base pair in common. Folding the synthesized chimeric sequence into structures yielded indeed both activities, although they were substantially weaker than those of the reference ribozymes, the ligase and the cleavage ribozyme, respectively. Only two or three selected point mutations or base pair exchanges are required, however, to reach full catalytic efficiency. Still, the two optimzed RNA molecules have a Hamming distance around forty from their reference sequences. Then, Schultes and Bartel [60] explored further the mutational neighborhoods and found neutral paths of Hamming distance about 40, by preparing and analyzing series of RNA sequences, in which neighboring sequences differ in a single base or base pair only. Without interruption these neutral paths lead from the RNA with both catalytic activities to the two reference ribozymes.
A Testable Genotype-Phenotype Map: Modeling Evolution of RNA Molecules
60: E. A. Schultes and D. P. Bartel. One sequence, two ribozymes: Implications or the emergence of new ribozyme folds. Science, 289:448{452, 2000.
We describe a single RNA sequence that can assume either of two ribozyme folds and catalyze the two respective reactions. The two ribozyme folds share no evolutionary history and are completely different, with no base pairs (and probably no hydrogen bonds) in common. Minor variants of this sequence are highly active for one or the other reaction, and can be accessed from prototype ribozymes through a series of neutral mutations. Thus, in the course of evolution, new RNA folds could arise from preexisting folds, without the need to carry inactive intermediate sequences. This raises the possibility that biological RNAs having no structural or functional similarity might share a common ancestry. Furthermore, functional and structural divergence might, in some cases, precede rather than follow gene duplication.
Figure 3. A close apposition of two ribozyme neutral networks. (A) Alignment of sequences spanning the distance between the two prototype ribozymes (25). Each sequence differs from its neighbors at no more than two residues. Each variant is named on the basis of whether it catalyzes ligation (LIG) or self-cleavage (HDV) and the number of residues that differ from the intersection sequence (boxed residues). The prototype ligase (LIG P) and HDV (HDV P) sequences are at the top and bottom of the alignment, respectively, with their secondary structures annotated as in Fig. 1. Positions are numbered with respect to the intersection sequence (INT), as in Fig. 2A. (B) Activities of the ribozyme sequences aligned in (A). Self-ligation activity is plotted in blue; self-cleavage activity is plotted in red. The horizontal axis represents the number of residues that differ from the intersection sequence. The vertical axis indicates the reaction rate of each ribozyme, normalized to that of the respective prototype ribozymes (37). The relative rate for uncatalyzed cleavage with formation of a cyclic phosphate (17) is indicated by the long-dashed line. The relative rates for nonenzymatic, template-directed oligonucleotide ligation (17) are indicated by the short-dashed line (ligation with formation of a 2’,5’-linkage) and the dotted line (ligation with formation of a 3’,5’-linkage). Both ligation and cleavage rates are plotted for the intersection sequence, demonstrating an intersection of the two ribozyme networks.
Bartels’ findings were confirmed in
HUANG, Z., SZOSTAK, J. W. (2003). Evolution of aptamers with a new specificity and new secondary structures from an ATP aptamer. RNA 9: 1456-1463
It has been recently demonstrated that two completely unrelated ribozymes with different folds and catalytic functions can be evolved from a single RNA via a series of neutral mutations (Schultes and Bartel 2000). Our experimental observations provide another example to suggest that RNA structures may tend to evolve through the accumulation of mutations followed by jumps to distinct structures with novel functions.
The discovery of the scale-free character of the protein domain universe is striking and represents the main result of this paper. It has immediate evolutionary implications by pointing to a possible origin of all proteins from a single or a few precursor folds—a scenario akin to that of the origin of the universe from the Big Bang. An alternative scenario, whereby protein folds evolved de novo and independently, would have resulted in random PDUG (similar to the one shown in Fig. 3b) rather than that observed in the scale-free one.
Expanding protein universe and its origin from the biological Big Bang by Dokholyan et al, PNAS, October 29, 2002, vol. 99, no. 22, 14132-14136

Fig 3a,b
The authors then proceeded to use a simple model of random duplication but in addition the authors mention that Importantly, our evolutionary time step is large enough to allow many mutations as well as more dramatic changes in sequences such as insertions/deletions or shuffling of structural elements to occur in the offspring protein such that sequence similarity with the parent protein is lost (4). Such mutations may or may not lead to significant structural divergence of the offspring from its parent protein because the landscape in sequence space is complex (4).
Fig 4. Proposed model of domain evolution. (a) Gene duplication (A A + B): the structural similarity between A and B is defined by some function w = (A,B) (e.g., RMSD or DRMSD). (b) If structural similarity w = (A,B) is greater than some critical value wmax, then we add a link connecting A and B. If structural similarity is above wmax, a new fold family is born. (c) The second generation progeny C (A ->B -> C) can connect to its grandparent A, if there is structural similarity between A and C: wAC le wmax. (d) With each time step, mutations diverge protein structures from each other; i.e., structural similarity changes by some value D: w w’ = w+ D(D = 10-4). If w’ > wmax, we remove the edge between corresponding proteins. (e) The dependence of the size of the largest cluster in the graphs generated by our model on wmax, averaged over 20 realizations. (f) The probability of the node connectivity in our model, averaged over 102 realizations. Apart from the finite-size effects at large k, it exhibits power law distribution with exponent alpha ~ 1.6.
Discussion
The nature of the protein/RNA networks indicates that homology may have to involve not just sequence similarity but also structure similarity. In other words, due to the nature of the neutrality of these networks, sequences can ‘diffuse’ while still maintaining structural similarity. Thus, instances in which convergent evolution was based upon sequence dissimilarities may need to be revisited.
Given the (recent) findings that there are many proteins with undetectable sequence similarities which however share a similar folding structure, this issue may be quite important in understanding the full details of evolution.
Relevant links
The scientific literature on scale free networks is very extensive. I will focus on just a few of the relevant papers.
Relevant authors include: Peter Schuster, Andreas Wagner, Walter Fontana, and Barabasi
Preferential attachment in the protein network evolution. Eisenberg E, Levanon EY.Phys Rev Lett. 2003 Sep 26;91(13):138701. Related Articles, Links
The Saccharomyces cerevisiae protein-protein interaction map, as well as many natural and man-made networks, shares the scale-free topology. The preferential attachment model was suggested as a generic network evolution model that yields this universal topology. However, it is not clear that the model assumptions hold for the protein interaction network. Using a cross-genome comparison, we show that (a) the older a protein, the better connected it is, and (b) the number of interactions a protein gains during its evolution is proportional to its connectivity. Therefore, preferential attachment governs the protein network evolution. Evolutionary mechanisms leading to such preference and some implications are discussed.
Evolutionary dynamics of protein networks, Berg, Johannes; Lssig, Michael; Wagner, Andreas


33 Comments
Ian Menzies · 30 July 2004
Absolutely fascinating.
I have a rather periferal question that I've been wondering about for a while:
As I understand it, one of the reigning abiogenesis hypotheses, the RNA world, states that before use of proteins arose, "protolife" (my term) used RNA strands to catalyze reactions, etc. Later protolife used proteins for catalysis and used RNA (and later, DNA) to code for these proteins (as is the case for all life today).
My question is this: is there a connection between the function of an enzyme and the posible catalytic function of the RNA sequence that codes for it? I.e. some form of protolife used a given RNA sequence to catalyze a given reaction, and later "descendants" used that RNA sequence (or a similar one) to instead code for a protein that catalyzed that reaction.
Ian Menzies · 30 July 2004
Russell · 30 July 2004
Ian, that is a fascinating question, and of course I don't have any answers. But here's my guess, for what it's worth.
I'll bet the huge majority of modern protein functions had no counterpart in the RNA world. Some proto-ancestral RNA presumably stumbled upon the ability to catalyze its own replication. But RNA is so mutable, my bet would be that no trace of that original function will be detectable in its (great)^(10^9)granddaughter molecules extant today.
The ability of RNA to catalyze protein polymerization, on the other hand, lives on in ribosomal RNA.
Another bold prediction: I'll bet we've barely scratched the surface of RNA catalytic functions that do exist in modern biota. I hereby claim this bold prediction in the name of Darwin, just to fend off the inevitable declarations from the ID Vatican in Seattle: "See! we would have predicted that on the basis of Intelligent Design"
Pim van Meurs · 30 July 2004
A good point. Since RNA is so 'neutral' it can mutate while maintaining the same phenotype. Old RNA will be 'hard to recognize' at the sequence level.
charlie wagner · 30 July 2004
Pim, I now have a body of work from which I can draw. Below are snippets of my thoughts on these subjects that you might find interesting. They form the basis for my upcoming book "How Really Smart People Can Believe Really Stupid Things".
.............................................................
"The use of the term "self-organizing" by Barabasi and his group is misguided. Correlating biological evolution with technological advancement is a false analogy unless you are willing to admit that the same principles apply to both types of systems. This type of argument has been made in the past with respect to the automobile and the airplane. The reason the analogy is false is because at no point in the development of the automobile, the airplane or the World Wide web or any other complex network was any element of the design achieved by chance. Only by the most strict application of the rules of engineering and aerodynamics and the intelligent input from human designers were these results obtained. There is no way that a random search could ever have discovered the design of the internal combustion engine or the jet engine or the internet. In all cases, the search for function is intelligently guided. Evolution by random mutation is analagous to problem solving without any intelligent guidance. In the case of every kind of complex, functional system, the total magnitude of all combinational possibilities is nearly infinite. Meaningful islands of function are so isolated that to find even one by chance would be truly a miracle. But the analogy may indeed be valid. If these complex networks, processes and structures that are the product of human design all require a higher intelligence to create, why would one not imagine that the most complex system of all, the living cell, must not have had a similar design component to achieve it's function?"
.............................................................
"I am familiar with many of the aspects of "self-organizing networks". In fact, I know enough about these systems to state unequivocally that they have nothing whatsoever to do with biological evolution. It therefore puzzles me why you brought this up here in the first place, if not to subtly suggest that the concept of "self-organization" may somehow have significance in the evolution of biological systems. News flash: it does not! The mathematics is not the reason why I characterized their use of the word as "misguided". The reason was because of the fact that self-organizing networks are irrelevant wrt biolical evolution. They are not, however, irrelevant wrt the living cell. Living cells function with a number of similarities and their topology is important in carrying out their developmental and regulatory processes. But the important point to remember, is that these self-organizing networks are the result of intelligent guidance, and any similarities that these networks have with living cells only enhances the notion that there is a component of intelligent guidance involved in their processes, structures and functions as well."
.............................................................
"Self-organizing networks are not the result of random processes, but skillful intelligent guidance at every step in their development. Biological evolution, on the other hand, as described by Darwin and his successors is dependent on a random search strategy, a strategy that can be demonstrated by mathematical analysis to be almost useless in finding isolated islands of function in a sea of noise."
.............................................................
"...if cellular networks were not designed, nor did they arise from random searches, then where, pray tell, did they come from? You state that selection is not random, that it operates on viable living systems yet you fail to explain where these viable living systems came from in the first place. Perhaps they came from other "viable living systems" which had their origin in "earlier viable living systems". Tell me, is it "viable living systems" all the way down?"
...............................................................
"Do you know any radio operators? There are not many left in the world, but I am one. We have an expression that we use when there's a lot of noise on a channel, perhaps static or interference. We say the signal is "down in the mud". It's very difficult to pull a readable signal out of a high signal/noise ratio. It is the same in nature. These variations that occur must be rather large and important to be noticed. Small variations, such as those proposed by Darwin are below the noise level and get lost "down in the mud". A real-life illustration is that an antelope being chased by a lion may have a small variation that allows him to run just a little faster and he would be better fit to survive such a challenge and pass these traits on to his offspring, and they would accumulate over time. The real truth is that this kind of scenario is a fairy story, since the lion fixates on one prey to the exculsion of all others and chases him till he drops. Speed is irrelevant in this context. In addition, this small variation will get lost in the noise of day to day existence and whether or not the antelope gets caught is much more contingent on whether or not he stubles and falls, not on his absolute speed."
............................................................
"Yes, alleles can change their frquency over time by this method. Mutations do occur and selection is a real phenomenon. But this kind of change is not evolution in my book, because, first of all, it's an oscillatory effect, changing first in one direction and then back in another direction. This is not a path that leads to new processes, structures and adaptations. This was clearly demonstrated by Peter and Rosemary Grant with the Galapagos finches. If you want to call this evolution, fine, but then you cannot use it to explain the appearance of complex, highly organized processes, structures and adaptations. And you cannot demonstrate that these trivial effects have anywhere near the creative proclivities assigned to them by Darwin and his ilk."
...........................................................
"If a specific protein is needed for a certain process, and it's 100 amino acids long, there are 20^100 possible combinations of amino acids that could be correct. How many do you think have to be "tested" before the right one is found? These kinds of numbers are simply beyond the realm of chance. Or, if the right protein just happens to be hanging around, one must marvel at the good fortune and inquire as to where it came from."
.............................................................
"...And these processes and structures and components just happen to be floating around, waiting to be co-opted by evolution? Sure, the wheel bearings on my Buick LeSabre are exacxtly the same as the wheel bearings on the Cadillac Sedan DeVille, so it's easy to see how GM co-opted an existing bearing for use on another application. In fact, I've seen these kinds of bearings employed in a wide variety of applications, both automotive and otherwise. But you've still got to answer the question: where did the bearing come from in the first place?"
............................................................
"Intelligent does not mean things like "conscious" or "smart". It means (and most any computer scientist will bear this out) the ability to store and process information. Anyway, that's what I mean by intelligent. If it makes you feel any better, I'll stop using the word "intelligent" and will substitute "algorithmic" in its place. The assembly of a living organism from the raw materials in its environment does not take place without guiding instructions. The algorithmic guidance that exists in the genome. Of course, since functional algorithms do not create themselves out of thin air, it becomes self-evident that these algorithms had an intelligent designer that had insight and was able to validate that they were functioning as expected. Chemistry can explain a lot about the DNA in the genome, it can explain how nucleotides assemble together into chains, how the phosphate sugars are incorporated and how the hydrogen bonding works. But chemistry cannot account for the specific sequence of bases, a sequence that contains instructions that regulate the growth and development of the organism. Just like chemistry can explain how paper and inks are made, and their properties and characteristics, how they will behave under various conditions and how they can be preserved and manufactured, it cannot and never will be able to explain the ideas that are created by the specific words written on that paper with those inks. That requires...intelligence."
............................................................
"Other than to remain alive and reproduce, I don't know what the function of life on earth is. All of the life processes, photosynthesis, replication, protein synthesis, cellular respiration, etc. exist to maintain the living state and to allow for growth and reproduction in order to perpetuate life. If this whole business has an ultimate function, it has not been discovered by me. I have a problem understanding what you mean by "natural". Everything that is known to exist is "natural". So what else is there? I don't think that life is something that is "not natural". Even if it turns out that life is the product of intelligent design, why would one imagine that to be "not natural"? Anything that might be classified as "supernatural", upon identifying it and understanding it's "nature" would then become part of nature and would properly be called natural. The only difference I see is between living systems and non-living systems. Non-living systems do not adapt means to ends, they do not adapt structure and process to function and they do not self-organize. And one must be careful not to confuse organization with order. There's a lot of talk about ordered systems in the non-living world, snowflakes, tornadoes, etc. but this is not the issue. Living systems are beyond order, which is simply a condition of logical or comprehensible arrangement among the separate elements of a group. Like putting files in alphabetical order or using a seive to separate items by size. Organization is a much different structure in which something is made up of elements with varied functions that contribute to the whole and to collective functions, such as exist in living organisms. Ordered systems can result from non-intelligent processes, as has been seen many times and cited by the examples given. But organized systems require intelligent guidance. They need to be put together with intent and their assembly requires insight. They need to be the product of intelligence because it is necessary to determine if they are functioning properly and that can only be acheived by insight. Since living systems display organization, they display means adapted to ends and structures and processes assembled to perform specific functions, it becomes self-evident that they are the product of a higher intelligence. And be careful also not to confuse "higher intelligence" with god, religion or any such human conventions. I have been an agnostic almost my entire life and continue so, despite my belief that life is the product of intelligent design."
.............................................................
"Mathematics is a product of the human mind and our attempts to explain everything in the universe in mathematical terms are doomed to failure. It assumes that the human mind is the highest bar against which everything in the universe must be measured. Humans did not discover mathematics, they invented it. Mathematics does not exist beyond the human mind. If it did, there would be no (sqr)-1 or "i" or 1/0 or anything like that. So, there is presently no mathematical relationship associated with Nelson's law, which is not to say that one may not be formulated in the future. The law of gravity worked just fine long before it was quantified mathematically."
............................................................
"You must demonstrate that living organisms assemble themselves without intelligent guidance. You have not done so. The intelligent guidance is coded into the DNA, just like the instructions for a computer program are coded into the source code. You must show that these genetic instructions are not the product of a higher intelligence. Nelson's law is unscathed."
steve · 30 July 2004
Based on what I see here, it looks like Charlie may be one of the preeminent ID Theorists of his generation.
Pim van Meurs · 30 July 2004
Charlie: There is no way that a random search could ever have discovered the design of the internal combustion engine or the jet engine or the internet.
Charlie, you continued misrepresentation of evolutionary processes is becoming a little bit tedious here.
Neither is the argument that there is a neutral network for the combustion engine. Of course the question really is, what about the cellular 'combustion engine' which can be understood much better in terms of evolutionary concepts.
Thus the statement that organized systems require intelligent guidance is both misguided and erroneour.
Your continued 'missing the point' is well established by your 'Nelson law' claim. I am not interested in how the genetic coding system arose just how it evolved. If you want to believe that it required intelligent design fine. But such a belief is lacking in much evidence.
The only way to make your case seems to be through obfuscation and appeal to personal incredulity and ignorance.
Now back to the topic. The scale free nature of RNA space, the neutrality of said space all help understand the many concepts of evolution. Nothing prohibits one from believing in an 'intelligent designer' because it neither relies on such a designer nor denies such designer.
Pim van Meurs · 30 July 2004
Charlie: You state that selection is not random, that it operates on viable living
systems yet you fail to explain where these viable living systems came from in the first place. Perhaps they came from other "viable living systems" which had their origin in "earlier viable living systems". Tell me, is it "viable living systems" all the way down?"
Of course not, such a strawman would misrepresent my position but let's not confuse the issue of abiogenesis with the issue of evolution. Seems that Charlie is proposing a 'front loading' explanation to life. Fine... Lacking any supporting evidence, such a front loading solution seems to suffer from Occam's razor.
As to how life may have originated, what is needed is a self-replicator, followed by encapsulation. Fox protocells may given an outline of what such a system may have looked like. Now that we have replication and encapsulation evolution is well positioned to guide. Self replications, lipid world, RNA world all provide us with answers to these questions. Of course Charlie may reject such science in favor of his 'intelligent design but don't ask me details' approach. Luckily real science IS interested in the details.
charlie wagner · 30 July 2004
Pim van Meurs · 30 July 2004
Charlie: And you cannot demonstrate that these trivial effects have anywhere near the creative proclivities assigned to them by Darwin and his ilk.
But I can and I have Charlie.
Frank J · 30 July 2004
Ian Menzies · 30 July 2004
Bob Maurus · 30 July 2004
Charlie,
You said, in post #5835, "If these complex networks, processes and structures that are the product of human design all require a higher intelligence to create, why would one not imagine that the most complex system of all, the living cell, must not have had a similar design component to achieve it's function?"
The hurdle here for me, and I expect for a whole hell of a lot of others, is the attempt at a seamless seque from inanimate manufactured objects of known human origin to animate, reproducing biological organisms of unknown origin. This is akin to cherries and golfballs - they're both round.
Can you lay out a cogent argument to support that attempted jump to intelligent design, bearing in mind the total absence of evidence for such an origin, and the unsupported assumption that such a jump has any credible basis? The claim that inanimate objects and biological organisms can be qualitativly compared straight up across the board is unsupported by any evidence that I'm aware of. If you have some, let us see it.
You also said, " I know enough about these systems to state unequivocally that they have nothing whatsoever to do with biological evolution."
Please, in the interest of an information "bank" gr ud yput o0bservations
Pim van Meurs · 30 July 2004
Pim van Meurs · 30 July 2004
charlie wagner · 30 July 2004
Pim van Meurs · 30 July 2004
Charlie is basically saying that DNA is intelligent because it can store and process information. Under that definition intelligence uses most of its meaning. As if ID 'theorists' have not done enough in that area :-0
Pim van Meurs · 30 July 2004
Frank J · 31 July 2004
charlie wagner · 31 July 2004
Frank J · 31 July 2004
Pim van Meurs · 31 July 2004
But I agree that Charlie's 'collection so far would make a great example of "How Really Smart People Can Believe Really Stupid things". I believe that Lilith or Diana have shown how Charlie's ideas may certainly belong in said category. For starters: Charlie's definition of intelligent which includes the ability to store and process information. I guess my voice recorder should be treated with the respect reserved for intelligent beings :-)
Perhaps Charlie can focus on such 'silly concepts' as 'no false positives' or 'law of conservation of information'?
charlie wagner · 31 July 2004
Frank J · 31 July 2004
Charlie, please just answer the two questions. It's either yes, no, or "wait 'til the book comes out."
steve · 31 July 2004
I think he should break the chapters down by university.
Papers From Caltech I Reject
Papers From Princeton I Reject
Papers From Johns Hopkins I Reject
Papers From Harvard I Reject
Papers From MIT I Reject
...
And maybe a final dessert chapter like How I got smarter about biology than 72 Nobel Laureates.
Shit, this book writes itself.
charlie wagner · 31 July 2004
Pim van Meurs · 31 July 2004
Abert Brodsky · 31 July 2004
About 2 months ago I found a book on evolution in a public library one out of hundreds or more books by Dennit. I have thought a lot about the beginning. I concluded that the universe is infinite, has no beginning and no end like a circle or a sphere. The universe was once all energy. To a human that would appear to be nothing. But from that nothing came something, everything. We now know the relationship between energy and matter. The universe is in constant motion. I call it organized caouse. From energy comes motion. There is a continuos conversion of energy to matter. This energy is not destroyed. It reverts to a lower level as is well known on earth. Nature or evolution is a mindless process of creating matter, which is found usefull and retained and discarded if not. Evolution has infinite time(time is infinite) to throw things together; but the process is precisely organized according to the principles of chemistry and physics way beyond human present understanding. We only observe nature and copy her. We fly but have to use airplanes for example, etc. There are simple defects in nature ( like an extra electron in an outer ring of some atoms ). Humans observed this I believe that made semiconductors useful. Therefore the elements were created in this energy conversion and everything else as nature created it simply throwing things together. After all there may not be an electron for example, it may simply be a quantity of energy in motion. Is this not what Einstein concluded. We can take matter and convert it to energy, but only nature can convert energy into matter( although a little new matter is created in the atomic bomb. And nature has plenty of time. I suppose eventually the energy level in the universe will reach a state that the process will reverse restoring matter back to energy and begin all over.
charlie wagner · 31 July 2004
Bob Maurus · 31 July 2004
Hey Charlie,
I'm sitting at the computer with an hour of Fleetwood Mack on PBS TV in the background. I thoroughly enjoyed your Stevie Nicks gallery. We've got a friend from the old days who could be her twin.
Frank J · 1 August 2004
charlie wagner · 1 August 2004
Frank J · 1 August 2004
Charlie, is that another "no"? And given how you answer questions, are you running for office soon?