In this installment of Icons of ID I will explore the evolution of the reliance of Dembski’s ID arguments on the No Free Lunch Theorems. While originally Dembski was suggesting that the “No Free Lunch Theorems” play a major role in his arguments against Darwinian pathways, he seems to have changed his tune when it was pointed out that his application of these theorems was flawed or as Wolpert, who is one of the original authors of the these theorems mentioned “written in jello” and “fatally informal and imprecise”.
1999
Dembski referred to the results being essentially a corollary of the No Free Lunch Theorems
It follows that the vast majority of fitness functions on the phase space that coincide with our original fitness function on the target but reshuffle the function on the partition elements outside the target will not land the evolutionary algorithm in the target (this result is essentially a corollary of the No Free Lunch theorems by Wolpert and Macready). Simply put, the vast majority of fitness functions will not guide E into the target even if they coincide with our original fitness function on the target (see Appendix 8).
This result refutes the claim that evolutionary algorithms can generate specified complexity, for it means that they can yield specified complexity only if such algorithms along with their fitness functions are carefully adapted to the complex specified targets they are meant to attain. In other words, all the specified complexity we get out of an evolutionary algorithm has first to be put into the construction of the evolutionary algorithm and into the fitness function that guides the algorithm. Evolutionary algorithms therefore do not generate or create specified complexity, but merely harness already existing specified complexity. Like a bump under a rug, the specified complexity problem has been shifted around, but it has not been eliminated.
Why Evolutionary Algorithms Cannot Generate Specified Complexity, Nov 1999
Let’s look at the definition of corollary
An immediate consequence of a result already proved. Corollaries usually state more complicated theorems in a language simpler to use and apply
A proposition that follows with little or no proof required from one already proven.
A deduction or an inference.
A natural consequence or effect; a result.
2001
Let’s look at the introduction of Dembski’s book “No Free Lunch”. Dembski not only introduced the book but also the origin of the title of the book as well as the relevance of these theorems to his arguments.
The title of this book, No Free Lunch, refers to a collection of mathematical theorems proved in the past five years about evolutionary algorithms. The upshot of these theorems is that evolutionary algorithms, far from being universal problem solvers, are in fact quite limited problem solvers that depend crucially on additional information not inherent in the algorithms before they are able to solve any interesting problems. This additional information needs to be carefully specified and fine-tuned, and such specification and fine-tuning is always thoroughly teleological. Consequently, evolutionary algorithms are incapable of providing a computational justification for the Darwinian mechanism of natural selection and random variation as the primary creative force in biology. The subtitle, Why Specified Complexity Cannot Be Purchased without Intelligence, refers to that form of information, known as specified complexity or complex specified information, that is increasingly coming to be regarded as a reliable empirical marker of purpose, intelligence, and design.
and
The Design Inference laid the groundwork. This book demonstrates the inadequacy of the Darwinian mechanism to generate specified complexity. Darwinists themselves have made possible such a refutation. By assimilating the Darwinian mechanism to evolutionary algorithms, they have invited a mathematical assessment of the power of the Darwinian mechanism to generate life’s diversity. Such an assessment, begun with the No Free Lunch theorems of David Wolpert and William Macready (see section 4.6), will in this book be taken to its logical conclusion. The conclusion is that Darwinian mechanisms of any kind, whether in nature or in silico, are in principle incapable of generating specified complexity. Coupled with the growing evidence in cosmology and biology that nature is chock-full of specified complexity (cf. the fine-tuning of cosmological constants and the irreducible complexity of biochemical systems), this conclusion implies that naturalistic explanations are incomplete and that design constitutes a legitimate and fundamental mode of scientific explanation.
Both from Introduction to No Free Lunch 10-01-2001
Since then various authors have pointed out the flaws in Dembski’s arguments and his reliance on the “No Free Lunch Theorems”, starting with Wesley Elsberry, Richard Wein, Mark Perakh and culminating with Wolpert, one of the original authors of these theorems.
Wesley Elsberry wrote on June 2001 on Talk.Origins
It’s my opinion that Dembski misconstrues or misunderstands what the NFL theorems say. I’ve passed word along that Dembski’s choice of “No Free Lunch” for the title of a book that is due out this fall sets him up for embarrassment. That’s still the title, so far as I know. It will be interesting to see how the reviews turn out.
2002
Now compare this with Dembski’s reversal
Given my title, it’s not surprising that critics see my book No Free Lunch as depending crucially on the No Free Lunch theorems of Wolpert and Macready. But in fact, my key point concerns displacement, and the NFL theorems merely exemplify one instance (not the general case). The basic idea behind displacement is this: Suppose you need to search a space of possibilities. The space is so large and the possibilities individually so improbable that an exhaustive search is not feasible and a random search is highly unlikely to conclude the search successfully. As a consequence, you need some constraints on the search — some information to help guide the search to a solution (think of an Easter egg hunt where you either have to go it cold or where someone guides you by saying “warm” and “warmer”). All such information that assists your search, however, resides in a search space of its own — an informational space. So the search of the original space gets displaced to a search of an informational space in which the crucial information that constrains the search of the original space resides. I then argue that this higher-order informational space (“higher” with respect to the original search space) is always at least as big and hard to search as the original space
Evolution’s Logic of Credulity: An Unfettered Response to Allen Orr by William Dembski
Or let’s look at the book description on ARN
But by employing powerful recent results from the No Free Lunch Theory, Dembski addresses and decisively refutes such claims.
No Free Lunch announcement on ARN
Or Koons’s review of “No Free Lunch”
Chapter 4, on evolutionary algorithms, is in some ways the crux of the book. It is here that Dembski discusses the No Free Lunch theorems that give the book its title. Dembski demonstrates that so-called “evolutionary” (or trial-and-error) algorithms cannot generate CSI. Instead the problem of generating CSI is simply pushed back from a high CSI result to a very high CSI fitness function. A fitness function is a function that differentially “rewards” or “selects” different trials, whether these trials be algorithms, behavior patterns, neural-net configurations, or biological genotypes. To solve a problem using an evolutionary algorithm, the designer must find a fitness function that can enable the sort of evolution likely to lead to an acceptable solution to the problem. Where there is a large amount of CSI required to solve a problem directly, the NFL theorems entail that an equally large quantity of CSI is required to find a suitable fitness function.
I hope that committed Darwinists will accept this brilliant and illuminating analysis. Even if you are convinced that Dembski is completely wrong about Darwinism, he has made an indispensable contribution to the development of the information-theoretic analysis of evolution. I hope that relatively few commit the genetic fallacy of rejecting Dembski’s work on the grounds that he holds “scientifically incorrect” opinions on other maters.
Or from the Conservative BookClub, we find What are the “No Free Lunch” theorems, and how do they undercut Darwinian theory and support Intelligent Design?
Dembski’s THE DESIGN REVOLUTION: ANSWERING THE TOUGHEST QUESTIONS ABOUT INTELLIGENT DESIGN mentions
35. Displacement and the No Free Lunch Principle
How do the No Free Lunch theorems undercut Darwinian theory and support intelligent design?
On ISCID Iain Strachan
Many critics of Dembski’s use of the “No Free Lunch” theorems take issue with it by pointing out that these theorems are based on averaging over all possible fitness landscapes, the vast majority of which are completely irrelevant, and
represent problems that are not interesting to solve, because the fitness landscape looks like a random array of spikes.I would have to say that I also share this worry. I was aware of the No Free Lunch theorems long before I was aware that Dembski had written a book with the same name and I initially wondered if this might provide a theoretical underpinning to the idea that design cannot be achieved without intelligent input. But then precisely the same objection occurred to me (that the generality of the result severely limited its relevance to practical problem solving), and I felt that one should be cautious in citing the NFL theorems to justify the design hypothesis.
It may be that I have misunderstood Dembski’s argument, and I would certainly like to see him respond to this criticism.
Dembski did not respond on this thread. Nor on this thread where Iain raised the same issue.
Relevant Links:
Information Theory, Evolutionary Computation, and Dembski’s Complex Specified Information by Wesley Elsberry and Jeffrey Shallit
Dembski mistakenly invokes the “No Free Lunch” theorems of [95] as justification for his view that evolutionary computation cannot generate CSI. The NFL theorems compare efficiency between algorithms and characterize averaged performance over all cost functions, but Dembski’s claim is not about average performance: Dembski makes the universal claim that for all evolutionary computation, no instance of CSI can be attributed to any such computation. Dembski’s initial summary of the NFL theorems characterize them as establishing a problem of “regress” to a “higher-order phase space.” This does not appear to reflect what is actually discussed in the papers Dembski cites on NFL. (Search for “regress” in the text of “No Free Lunch theorems for search”, “No Free Lunch theorems for optimization”, and “On the futility of blind search” comes up empty.) The point made by Wolpert and Macready is not that a search of the space of fitness functions must be conducted, but rather
that one must examine the fitness function of interest in order to select an algorithm which will perform more efficiently given that fitness function. This is a far cry from Dembski’s assertion of some “regress” in finding the source of information seen in the output of an algorithm.
Wesley Elsberry collected a number of good essays
THE NO FREE LUNCH THEOREMS AND THEIR APPLICATIONS TO EVOLUTIONARY ALGORITHMS By Mark Perakh
Orr on Boston Review: Book Review: No Free Lunch
Dembski responds to Orr Sheer versus Real Possibilities
The No Free Lunch Theorems and Their Application to Evolutionary Algorithms By Mark Perakh
While essentially Orr’s critical remarks are well taken, their formulation was insufficiently careful from a mathematician’s viewpoint so it may create a distorted impression of what Orr had in mind. Indeed, in a private communication Orr, who is a biologist, provided an explanation of his actual views on the subject of the NFL theorems’ application to Darwinian evolution, and these views are essentially in accord with a proper interpretation of the NFL theorems. Since, however, the rendition of that problem in Orr’s review of Dembski’s book may be confusing for unprepared readers, I will try to clarify the problem by first explaining in way as simple as possible the NFL theorems, and second, briefly discussing their misapplication by Dembski.
The Table of Contents and Preface to William Dembski’s upcoming book, “No Free Lunch.”
Links to “No Free Lunch Theorems
Yin-Yang: No-Free-Lunch Theorems for Search at the 1995 International Conference on Genetic Algorithms (ICGA-95)
William Dembski’s treatment of the No Free Lunch theorems is written in jello By David Wolpert
Recent Results on No-Free-Lunch Theorems for Optimization by Christian Igel, Marc Toussaint
On Classes of Functions for which No Free Lunch Results Hold by Christian Igel, Marc Toussaint
In a recent paper it was shown that No Free Lunch results hold for any subset F of the set of all possible functions from a finite set X to a finite set Y iff F is closed under permutation of X. In this article, we prove that the number of those subsets can be neglected compared to the overall number of possible subsets. Further, we present some arguments why problem classes relevant in practice are not likely to be closed under permutation.
No Free Lunch Theorems for Search (1995) by David H. Wolpert, William G. Macready
No Free Lunch Theorems for Optimization (1996) by David H. Wolpert, William G. Macready
No Free Lunch Theorems: Many good links by Martin Sewell
Critiques and reviews of the works of William Dembski at infidels.org
NO FREE LUNCH: Why Specified Complexity Cannot Be Purchased without Intelligence From the Book — WATCH FOR IT!
55 Comments
Wesley R. Elsberry · 15 June 2004
Mark Perakh · 15 June 2004
Perhaps it is appropriate to point out that the anthology titled Why Intelligent Design Fails: A Scientific Critique of the New Creationism (edited by Matt Young and Taner Edis) which is coming this month from Rutgers University Press, contains, among other chapters, my chapter titled There Is a Free Lunch After All: Dembski's Wrong Answers to Irrelevant Questions. This chapter is mainly about Dembski's misuse of the No Free Lunch theorems and his so called displacement problem.
It looks like changing the tune after having encountered critique (especially from Wolpert, against whom Dembski had no chance as far as the NFL theorems were in question)Dembski made a very awkward attempt to alleviate embarrassment at being rebuffed by the author of the theorems in question - he tried to play down the role of the NFL theorems in his theory. Instead, he stressed the significance of what he calls displacement problem. Unfortunately for Bill Dembski, his appeal to the displacement problem has not improved his position at all but rather should add more embarrassment to him as the problem in point is a phantom, as discussed in detail in the above mentioned chapter.
Koons's accolade and his cry to Darwinists to accept Dembski's allegedly brilliant and illuminating analysis is especially funny given Koons's evident lack of familiarity with information theory and with the essence of the NFL theorems: in a letter to Science Insights (the organ of the National Association of Scholars -see vol. 7, No 5, 2003, at www.nas.org ) Koons wrote that the NFL theorems are part of information theory! This is news for all those having a minimal knowledge of either information theory or optimization theory.
Koons is in general a good comedian. In a blurb to Dembski's Intelligent Design he wrote that Dembski's law of conservation of information was a great breakthrough in information theory. However, after Dembaki alleged law was substantially dismantled by a number of critics, in his (more recent) mentioned letter to Science Insights Koons asserted that Dembski in fact did not suggest any new law but only showed how to apply the NFL theorems to biology, etc. He seemed not to realize that Dembski's pseudo-law and his attempts to apply the NFL theorems stand in no relation to each other. What a circus! With friends like Koons Dembski needs no adversaries.
Hey, Dr. Bill: What about admitting at least a single error for change? After all, you have a place to go - the barbecue stand in Riesel Tx is not a bad backup, is it.
Pim van Meurs · 15 June 2004
Tom English · 16 June 2004
RBH · 16 June 2004
RBH · 16 June 2004
Never mind. Found 'em.
RBH
Tom English · 16 June 2004
All,
Sorry for the duplicate post. My browser said the connection had timed out on the first attempt, and I didn't realize the post might have succeeded.
RBH,
I have some papers, c.v., etc. on my home page, http://www.TomEnglishProject.com. Here's the one where I obtain NFL in terms of conservation of information:
English, T. M. "Evaluation of Evolutionary and Genetic Optimizers: No Free Lunch," in Evolutionary Programming V: Proceedings of the Fifth Annual Conference on Evolutionary Programming, pp. 163-169, 1996.
The following shows that NFL does not hold in the world, and treats search algorithms as operators on probability distributions on problems. The geometry of search in the space of distributions is quite interesting.
English, T. "No More Lunch: Analysis of Sequential Search," to appear.
Mark Perakh · 16 June 2004
Tom English:
Thanks for your comment and the reference to your articles. A few questions:
1. I am confused by the dates you indicated. As far as I am aware, Wolpert's NFL theorems for supervised learning appeared in 1996, and Wolpert-Macready's NFL theorems for search, in 1997, but you refer to 1995 and 1996 in regard to the theorems for search.
2. Have you communicated with Dave Wolpert regarding your proof that the NFL theorems are invalid for the real world problems, and if you did, what his reaction was? (I believe, Wolpert and Macready themselves admitted that their theorems are not very instrumental for solving real world problems, but your approach seems to be even more radical, is it?)
3. Regardless of what Wolpert or other experts may think of it, your discourse is beyond of what non-experts in optimization theory can comprehend. Therefore, IMHO, referring to the averaging as a point negating Dembski's use of the NFL theorems as alegedly proving the impossibility of evolution (which can be easily understood by laymen) seems to be justified. I discussed this point with Dave Wolpert and he did not offer objections to the use of the averaging argument. Cheers, Mark
Tom English · 16 June 2004
More on references
NFL might have become something useless to Dembski if we had been more open to an unusual perspective. The first discussion of algorithmic complexity and NFL came soon after Wolpert and Macready released their tech report. Search for Kihong Park in the 1995 archive at http://www.aic.nrl.navy.mil/galist/. His comments were outstanding, but Wolpert dismissed them, and I was scurrying too much to contemplate learning new math.
I don't know a publication that clearly justifies the claim that run-to-run optimizer adaptation can be immunized against teleology. But consider uniformly sampling the space of optimizer parameters and evaluating the different parameter vectors on a sample of instances drawn according to the problem distribution. The problem distribution is given, and the sample of possible parameter vectors is uninformed, so there can be no teleology. Dembski would claim, perhaps, that it would take a great deal of time to get good results with this approach, but I predict otherwise. To be 99% sure of obtaining a parameter vector with performance ranked in the top percentile requires only a sample of 458 vectors. Massively parallel computers support simultaneous evaluation of many more vectors than that. Thus it may take no longer to discover good optimizer parameters than to evaluate one instance of the optimizer.
Is this invocation of brute-force parallel computation cheating? I'm no biologist, but I hear that nature is plagued with it.
Tom English · 16 June 2004
Mark Perakh · 16 June 2004
Tom English:
Thanks for the clarification. I am familiar with Wolpert's NFL theorems for supervised learning and W-M theorems for optimization to the extent of their published papers (and to the extent allowed by not being a professional mathematician). I did not know, though, that they had predecessors. After W-M paper appeared in 97, there were a few papers wherein limitations of the NFL theorems were discussed. I asked Dave about his opinion of those papers and he largely dismissed them (it was about two years ago, so I don't have references at hand). However, he admitted that the NFL theorems are crude tools shedding light on the relationship between fitness functions and search algorithms but their practical significance is very limited. You seem to be very well versed in the matter, so it'd be interesting if you exchanged views with Dave. He promised to publish a paper (again with Macready) in which to prove that the NFL theorems are invalid in the case of co-evolution - but we are waiting for it for more than a year. Just two weeks ago I asked him once again about that proof but he did no answer my question. It looks like you have such proof, don't you. At least it seems to follow from some of the sentences in your preceding comment. If this is indeed so, it is important as proof that the NFL theorems are of no consequence for biological evolution which is always a co-evolution (search changes the fitness landscape). It is a pity I did not know about the proof in question (if it indeed exists) when writing the chapter for the anthology from Rutgers - it would be a nice argument against Dembski. Anyway, that anthology is supposed to be for general audience, so we could not use sophisticated math in it. I consulted with Wolpert when writing it and he had no critical objection to my text. The loose rendition of the unexspurgated theorem, as you named it, seems to approach the matter from a slightly different angle as compared with the original NFL theorem by W-M: although identical distributions translate also into identical means, the opposite is not necessarily true. Your rendition looks therefore as a broader statement than the NFL theorem by W-M, but on the other hand it is more restrictive in that it limits the types of distributions. Or perhaps I am confused, but don't feel that you owe me any explanation - amateurs do not deserve detailed replies. Cheers, Mark
Tom English · 16 June 2004
Mark,
Most people find my theoretical writing difficult, and I am here because I want to explain things in plain language. If my replies are too detailed, give me a nudge in the ribs.
I suppose you have "practical significance" with respect to biological evolution in mind. In nature the fitness function is nonstationary, the search is random, points are revisited, and the domain of fitness functions cannot be defined as a finite set. These are four substantial violations of the suppositions of the original NFL theorems.
I don't have a coevolution proof. And precisely what to prove is less than obvious. I will give it some thought.
My NFL theorem applies to infinitely many distributions of test functions, while Wolpert and Macready's applies to just one. Mine refers simply to sequences of observed values, while theirs refers to histograms of values. Sequences are simpler for various reasons, and some important aspects of NFL are easy to miss when working with histograms. As you say, "although identical distributions translate also into identical means, the opposite is not necessarily true."
Reed A. Cartwright · 16 June 2004
Would someone mind explaining to me (here or with a link) what the NFL theorems actually state? Specifically the if...then part.
Richard Wein · 17 June 2004
Richard Wein · 17 June 2004
Tom, I'm having difficulty understanding your paper "Evaluation of Evolutionary and Genetic Optimizers: No Free Lunch"
(http://www.tomenglishproject.com/EP96.pdf). I can't even work out where you actually state your NFL theorem! Could you please help.
Tom English · 17 June 2004
Richard,
I don't think that's particular crude. In 1994 I served on the thesis committee of a student addressing function optimization. In his defense he said the ultimate goal was to obtain an optimizer that was superior for all functions, and I pointed out that optimizing a uniformly drawn function was equivalent to "optimizing" the value of a uniform random variable. In the following year, a student who had attended the defense learned of the NFL tech report before I did, and brought me a copy.
Various people understood that no optimizer could be generally superior on "purely random" fitness landscapes prior to 1995. Wolpert and Macready were the first to work out the details, and they did so extremely well, though not without errors.
Something I should have said earlier, and that fits reasonably well here, is that the main NFL results directly address neither evolutionary computation nor biological evolution. Take away the flamboyant attack on fuzzy thinking in EC, and what remains is an outstanding treatment of deterministic sequential search. To debunk the notion that an evolution-inspired algorithm could be generally superior to others, Wolpert and Macready reason about a deterministic sequential simulation of the algorithm. Unfortunately, most of the NFL results cannot be related to EC, and certainly not to evolution, through simulation.
Tom English · 17 June 2004
Hi, Richard.
The interesting thing about it, from the "Icons of ID" perspective, is that the central result is the Conservation Lemma, in which I show that information is conserved in search. NFL follows easily from that. Incidentally, that is the first theoretical paper I ever wrote, so forgive the rough spots.
I wonder if there is a connection between conservation of information in my sense and conservation of complex specified information. I tried to grasp an answer for myself, but the Jello went squish. I would love to hear what someone who has gotten information through a side channel has to say.
Tom
Wesley R. Elsberry · 17 June 2004
Nick · 17 June 2004
Tom English · 17 June 2004
Tom English · 17 June 2004
Reed A. Cartwright · 17 June 2004
Let me see if I have this right.
If the search space is random, then no specific optimizer can do better on average than a random search can.
How does this pertain to optimizing strategies? In other words, instead of picking a specific optimizer, pick a strategy and tune that strategy to each search space encountered. I think this has been answered, but I want to make sure.
Speaking of information gain, I got around to reading Kimura's paper on it about a month ago. It really is an elegant proof.
Under no selection the probability that a new allele goes to fixation is 1/N. If instead, that allele is selected for the probability that it goes to fixation is about 1. Therefore, the "information" gained through selection is equivilent to the uncertianity lost: H(1/N)-H(1) = -log(1/N)/N = log(N)/N. This holds better for larger N.
Tom English · 17 June 2004
Reed,
If you present a test function to a search algorithm and it responds with the sequence of values it observed, there are many ways to measure the quality of the sequence. What we consider the search to be doing is a matter of how quality is measured. For instance, in optimization the objective is sometimes to locate the minimum of the function with as few evaluations as possible. The quality measure, then, is the index of the first occurrence of a minimum in the sequence of observed values.
When there is NFL in search, all search algorithms are indistinguishable by all measures of quality, including measures of optimization quality. That is a very strong statement. There are some non-NFL distributions of test functions for which all search algorithms are indistinguishable under some measure of quality.
Now it happens that there is serious misunderstanding of optimization and NFL. Wolpert and Macready make many comments about optimization speed in their 1997 article, but their theorems do not support that. There is no way to look at a sequence of observed values and tell how long it took the search algorithm to generate it. A fast algorithm and a slow one can visit the domain points in the same order, giving the same value sequence. Optimization quality can be a count of function evaluations, as I illustrated above, but optimization speed cannot be made into a quality of the value sequence.
Wolpert and Macready seem to make counts into speed by assuming that the running time of all search algorithms is linearly proportional to the number of function evaluations. But almost all search algorithms running in linear time are too big to exist in the world. Most of those that are impractically large are not functionally equivalent to smaller algorithms (i.e., they are incompressible). Some are functionally equivalent to smaller, slower search algorithms. The slower algorithm not only does the search, but spends time decompressing itself.
Thus there are intrinsically slower search algorithms. This means that counting function evaluations is a bad way to measure time. The NFL reasoning of Wolpert and Macready does not lead to the conclusion that all algorithms have identical average optimization speed. On the contrary, no algorithm has better average optimization speed than blind search, but some have worse. You may get the impression from some of Wolpert and Macready's writing that everything "balances out" but their argument is asymmetric, and is very much geared toward debunking naive beliefs about evolutionary computation.
The situation with NFL and optimization is much more complicated than Dembski seems to have realized.
Erik 12345 · 17 June 2004
Erik 12345 · 17 June 2004
Reed A. Cartwright · 17 June 2004
Mark Perakh · 17 June 2004
Tom English: you have shed a new light on Wolpert-Macready's paper, and I need some time to digest it. Until now, I had some general idea about the limitations of the NFL theorems, but now you have provided some specifics which I would like to think about. I still think it'd be interesting if you discussed it with David. As to Dembski, there is not much to discuss as he is confused on an elementary level despite his PhD in math. Cheers, Mark
Mark Perakh · 17 June 2004
It looks like Igel and Toissant have proven that for the overwhelming majority of fitness functions encountered in the real world the NFL theorems do not hold. Ha! I am really curious what Wolpert thinks about it. I'll ask him. Most of the other links given in the post that started this thread do not work.
Tom English · 17 June 2004
Tom English · 18 June 2004
Erik 12345 · 19 June 2004
Erik 12345 · 19 June 2004
Erik 12345 · 19 June 2004
Mark Perakh · 19 June 2004
Hi to all: While Panda's Thumb was off, I had an exchange of messages with Dave Wolpert. (For the record, the chapter on NFL in the forthcoming anthology from Rutgers was initially planned to be written by somebody recommended by Dave with him as the second co-author. However, those guys he recommended all finked out. Then I volunteered, as the last resort, to save the chapter by writing it myself - with Dave, again, as a co-author. While I was hitting the keybord, Dave decided that he did not wish to appear in the byline, not because of anything he disagreed with in my text, but rather because there was some misunderstanding among the NFL list members (including the anthology's editors) on matters not related to the gist of the chapter. I was left to do it on my own, which I did.
Now, when replying to my questions in regard to Tom English's and Igel & Toissant's ideas, Dave consented to my partially quoting on PT from his replies.
Here is a quote from Dave's reply to my first inquiry:
>>>Are Igel and Toissant the collaborators of Whitley's? As I recall,
their results characterize whether a subset of the set of all fitness
functions obeys NFL exactly. I never saw the significance of those
results; averaged over all subsets *outside* of those they characterize,
you still have NFL.
It's just like the meaninglessness of Kolmogorov complexity
arguments. You still have zero a priori reason to believe that the
"fitness functions encountered in the real world" favor one algorithm
or another.
Tom English doesn't have anything new to add, as I recall.<<<<
MP: And here is a quote from his reply to my second inquiry.
>>>The *subsets* for which NFL exactly holds may well be a tiny fraction
of all *subsets*. (Not of all functions - that's impossible, since NFL
holds for the subset of all functions.)
And so what? What are the implications for real-world learning/search?
If you know ahead of time (!!!) that your search will be over a
particular subset of all functions, then *of course* in general
algorithm A may outperform algorithm B. That's nothing more than a
(dull) special case of our inner product formula.<<<
MP: Of all the above, I construe his remark about distinguishing between "all functions" and "all subsets" as quite enlightening. While Igel/Toissant's theorem about subsets closed under permutation looks sound, the conclusion that only a tiny fraction of functions (in fact, of subsets, which is a very different story) encountered in the real world meet their condition, in the light of Wolpert's explanation, is not that important. I think I have more or less clarified the matter for myself, mostly in favor of Dave's position.
Tom English: although I am not responsible for Wolpert's remarks in your address, I feel embarrassed as they came in response to my inquiry and therefore I apologize (I did not expect such a wording from Dave). In my inquiry I did not give him any opinions on your contribution, so his response is solely his initiative and his opinion. Mark
Tom English · 19 June 2004
Mark,
No need for apology. I have worked on NFL from 1995 to the present. Wolpert published the opus magnum in 1997 and turned to other research topics. You perceive him as the authority because he wrote the seminal paper, but I think that's a poor criterion.
Most people love to see others extend their work. Others suffer serious cases of NIH (Not Invented Here). There is a reason you have never managed to get Wolpert to respond favorably to work other than his own. There is a reason you have gotten no coevolution work out of him.
As I suggested above, I have emphasized that conservation (unfortunately renamed "no free lunch") is a matter of degree, and will never be absolute in the real world. One of the major points of my latest paper is that search transforms one probability distribution on the functions into another, and the degree of divergence from a precise NFL distriction is preserved in the transformation. The question of whether there could be precise conservation in the real world has been open since 1995, and I resolved it this year arguing in terms of Kolmogorov complexity.
These things have a way of working themselves out in time. Other researchers and I have recently made great progress in conservation. There will be a lot to emerge over the next couple years. And this may be very disturbing for someone who wants everything to stay put.
Richard Wein · 20 June 2004
Tom. I've tried to work through your paper, but I just can't get it. Is there any way you can state your result in simplified terms? I'm used to thinking of NFL theorems in the form "NFL results hold for function distributions meeting such-and-such conditions". Can your theorem be expressed in this way?
Richard Wein · 20 June 2004
Erik 12345 · 20 June 2004
Erik 12345 · 20 June 2004
Mark Perakh · 20 June 2004
Tom English: Perhaps my attitude to Wolpert as an authority is indeed partially due to his having published the seminal article. It reminds me though of an old Russian joke. A radio station answers questions from listeners. One listener asks, "Is it true that Tchaikovski was a pederast?" The radio station answers, "Yes, this is true. However, some people also like him for his music." Besides the fact that Wolpert authored a seminal paper, I happen also to like that paper. I believe it is a very good one. Wolpert indeed has lately abandoned the NFL affair. Oddly, in his recent message to me he suggested that, since he is fed up with that matter, perhaps I take over the mantle of the defender of his theorems. This was absurd as I am not at all qualified to do so, so I even did not reply anything to that idea. As to your theorem, unfortunately the link to it did not work when I tried to read your article, so I have no opinion of it while I readily admit it may perhaps affect my view of Wolpert's theorem. Richard Wein seems to have read it and wrote that it is hard to comprehend. I highly respect Wein's intelligence, so, if he has problems with comprehending your paper, I expect to have such problem as well, and therefore I am joining him in the appeal to you to provide, if possible, an explanation of your theorem in terms as simple as possible (although you certainly owe us nothing). Thank you, and best wishes. Mark
Richard Wein · 21 June 2004
Erik 12345 · 21 June 2004
Reed A. Cartwright · 21 June 2004
Mark Perakh · 21 June 2004
Erik wrote: (the performance measure must be a function of only the sequence of the values of the objective function that the search algorithm encounters).
I don't think there is such a limitation insofar as we are discussing the NFL theorems as such (such a limitaion may, though, possibly arise in some specific applications of the NFL theorems). In many optimization scenarios, performance measure can be, for example, simply the number of iterations the search needs to reach a certain pre-selected value of the fitness function (obvioulsy, this particular choice of the performance measure is not very well suited for biological evolution, but Erik's statement seemed to be meant as a general idea for optimization searches). Perhaps also some other possible performance measures can be imagined not meeting Erik's limitation - to state it, a rigosous proof would be needed with exceptions determined. Just $0.02.
Richard Wein · 21 June 2004
Erik 12345 · 21 June 2004
Erik 12345 · 21 June 2004
Tom English · 21 June 2004
Dear All,
Sorry not to be responding. I am busy at a conference until Wednesday. People filled the room and stood in the doorway for my "No More Lunch" talk. I praised Wolpert and Macready's work, as always.
I met Robert Axelrod yesterday. By chance I had recently developed a minimalistic evolutionary algorithm, with much the flavor of Axelrod's models, to illustrate information gain in a process of random mutation and natural selection. I now believe it has scientific value.
Tom English · 23 June 2004
Mark Perakh · 23 June 2004
Richard Wein wrote: "That's a good point, which completely demolishes my argument. What more can I say?"
Bravo, Richard! You have set an example for all of us. What more can I say? Mark
Tom English · 23 June 2004
Tom English · 23 June 2004
Tom English · 23 June 2004
Pete Dunkelberg · 24 June 2004
Tom, a challenge: first, to note the obvious: if you can only get the message across to Richard, Mark and Eric you are reaching a rather limited audience. You can do better, and the aim of the challenge is to help.
It is of interest to know how the NFL theorem(s) apply, or not, to empirical biological situations. And, sorry to bring this up, it can be difficult to be empirical and uncountable at the same time. So as step 0 of the challenge (this may be the most wrenching part for you :)) I ask you to come down from the continuum, down even from countable, down, down to a number like three or two. Three would be fine although we'll have to add 1 later on.
The challenge is to work a concrete example.
Suppose we have a population of land snails with three phenotypes, let's say three color morphs: light pink, brown, and striped. Let's say the fitnesses of the morphs are 1.1, 1.0 and 0.9 respectively.
The objective is to give the details of a scenario that is both life like and NFL like, that is, the conditions of NFL are satisfied. If this can be done, fine. If not, why not? To make the answer clear, give two scenarios, one life like and the other NFL like.
You have to fill in the blanks from here on:
We need a set of functions. What are they?
The set must be closed under permutations. What permutations? Do the permutations involve associating each of the morphs with each of the fitnesses?
We need a performance measure. What is it?
To go further you may need to consider a mutation that gives us green snails.
Explain the situation(s) with respect to classic NFL, that is, performance of evolution vs blind search.
Explain the situation(s) with respect to your NFL as conservation theorems: what, concretely, is conserved or not in which case?
If for any reason you are not able to work with so simple an example, come as close as possible.
And if you are willing to humor me on this, thanks very much!
Pete
-
"If you can't explain it to the cleaning lady, you don't understand it yourself".
Erik 12345 · 24 June 2004
Pete Dunkelberg, this isn't the realistic example you are seeking, but it is a concrete (if purely fictional) story:
Let me tell you about the intergalactic pet store Unipet's latest business concept: Make-a-pet. The customers of Unipet, coming from all over the universe, have a truly diverse set of preferences and it is difficult to keep a stockroom of pets so that they have something for everyone. For instance, the Beheians would never consider buying any pet that isn't "irreducibly complex". Other customers would never consider buying a pet that is "irreducibly complex". Unipet's strategic surveys and polls show that the number of customer profiles needed to accurately capture all customers is gigantic. Hence the need for Make-a-pet.
The idea behind Make-a-pet is that a pet is tailor-made based on the preferences of each individual customer. Since it is just impossible to learn even a fraction of all languages in the universe well enough to discuss the customer's rather involved preferences, the communication between the customers and the pet engineers is very crude: The pet engineers will show a pet and the customer will simply rate it by holding up a sign with a digit between 0 and 9 on it. (This customer-rating is the objective function.) Based on all previous ratings by the customer, the pet engineers will construct a new pet that hopefully receives a higher rating by the customer, and so on. (The strategy used by the pet engineers to construct new pets based on the customer's ratings of previously constructed pets is the search algorithm. The fact that the customer rating is the only communication between the customer and the pet engineers makes the search algorithm a "black-box" algorithm.) It goes without saying that the most expensive part of this process is the production of new pets, and it is desirable to minimize the number of pets that need to be produced before one with satisfactorily high customer-rating is discovered. But with Unipet's rather advanced technology, the company's experts reasoned that producing pets still beats having a storeroom with a gigantic zoo of pets.
Make-a-pet was neither a stellar success nor a miserable failure. The most remarkable thing about the Make-a-pet project was that it didn't seem to matter which strategy the pet engineers used. They tried hiring the best statisticians in the universe to infer the preferences of individual customers based on their ratings. They tried ignoring the previous ratings by the customer altogether and just follow a fixed sequence of pets. They tried making pets at random. They tried having a population of pets that was allowed to evolve by differential reproductive success without outside intervention, and picking pets at random from this naturally evolving population. (This is how biological evolution comes in. Letting pets evolve in an unconstrained fashion is just one search strategy among zillions of others.) No matter which strategy the pet engineers used, it turned out that, when averaged over a large number of customers, the number of pets that needed to be produced before one with a satisfactorily high customer rating was discovered was the same for all strategies. The best statisticians in the universe performed no better than a random number generator. (This is the NFL theorem.)
When Unipet's statisticians discovered this fact they started thinking about what must be true about the distribution of customer preferences. Theoretically, the preferences of a customer can be represented as a list of pairs between possible pets and ratings between 0 and 9. For instance, such a list could begin with
(giraffe, 0), (cat, 7), (rabbit, 6), (ill-tempered sea-bass, 2), ...
By swapping the scores of two or more pets in such a list, a new list can be created. For instance, swapping the scores of the rabbit and the ill-tempered sea-bass gives the list
(giraffe, 0), (cat, 7), (rabbit, 2), (ill-tempered sea-bass, 6), ...
The criterion for when the strategy used by the pet engineers makes no difference for the average performance is that: All preference lists that can be transformed into each other by swapping scores in the above manner must be equally probable. Preference lists which cannot be transformed into each other in the above manner may have different probabilities, though. (Tom English refers to such probability distributions as "block-uniform", with the blocks being sets of preference lists that can be transformed into other by swapping scores.)
If Unipet doesn't give up on the Make-a-pet project first, the company statisticians may refine the analysis in the future, taking into account that some pets are more expensive to produce than others. When nothing is known about the customer's preferences it makes sense to try the possibilities that are cheap to produce first. Such factors are presently beyond the scope of the analysis, though.