Antievolution.org - Antievolution.org Discussion Board -Topic::Could ID be "science"? (From PT)

lutsko

Posts: 8
Joined: Nov. 2005

Posted: Nov. 14 2005,08:29

OK, I am here to be crucified. I do not support ID as formulated by Dembski, Behe, etc. but I think some defenders go over board in denying the possibility that ID could ever be a scientifically respectable hypothesis. An example I mentioned on PT was if a long, easily recognized mathematical sequence (like the first hundred digits of pi, or euler's constant or the golden mean or all in sequence) were found encoded in junk DNA. I do not mean any arcane code: suppose each amino acid corresponded to one of the digits from 0 to 3 and the number was coded in base 4.

One could, as suggested on the PT board, conclude that some unknown physical mechanism was at work, such as produces Fibonnaci sequences in nature, but I think it would be hard to say that an "argument from design" could be ruled out. Note that when **I** use the term "design", I only mean that the design was produced by perfectly natural designers, who themselves probably evolved naturalistically: I am a naturalist and would personally not evoke anything supernatural.

So, is it possible to imagine a world in which "design" was a scientifically respectable hypothesis? Shouldn't our political activists be careful about categorically ruling design on that basis?

Russell

Posts: 1082
Joined: April 2005

(Permalink)

Posted: Nov. 14 2005,09:39

Well, I'm game. IF let's say, pi to a hundred digits in base 4 were found in DNA, I can, at least at this moment, think of no plausible explanation for it. I wouldn't rule out, a priori, the possibility that there was an intelligent agency behind it.

That said, as you point out, this is nothing like the situation being discussed by the current champions of ID.

Also, just to forestall nitpickers, I think you mean nucleotides (of which there are, basically, four) not amino acids (of which there are, basically [acidicly?] twenty). And when you say "golden mean", I presume you mean "golden ratio": (1 + (5)^(1/2))/2.

--------------
Must... not... scratch... mosquito bite.

C.J.O'Brien

Posts: 395
Joined: Aug. 2005

(Permalink)

Posted: Nov. 14 2005,09:40

I come in peace. (with no desire to crucify anyone-- it just makes a martyr of 'em)

But the difficulty I see here is similar to one with ID proper: what are we saying happened, and when?

Continuing with the thought experiment, let's say that, yes indeed, the earth was visited by tinkering aliens a square billion years ago, and they "encoded" such a sequence into the genome of a protozoan, had a good laugh, and left, never to return. What is to keep that sequence from further mutating in the intervening billion years? Should we be looking for sequences "near" Pi, within the tolerances of the molecular clock?
To go even further, say we identified a very near sequence that later mutated, without human agency, to be a perfect 100 digits of Pi? What should we conclude about that sequence? Was it (to foist on you a stinker of an ID term) frontloaded? How prescient can we make our aliens without stretching credulity to its limits?
And, perhaps most important, what does contemplation of "near-Pi" sequences do to our calculation of the improbability of such sequences, which, remember, is what's leading us to the "design hypothesis" in the first place?

--------------
The is the beauty of being me- anything that any man does I can understand.
--Joe G

lutsko

Posts: 8
Joined: Nov. 2005

(Permalink)

Posted: Nov. 14 2005,09:47

Quote (C.J.O'Brien @ Nov. 14 2005,15:40)

never to return. What is to keep that sequence from further mutating in the intervening billion years? Should we be

I was actually surprised no one on the PT thread raised this point. Of course, if it happened a billion years ago, one would expect a certain amount of drift which could, I suppose, be used to date the intervention. I am a physicist, not a molecular biologist, and do not have the numbers at my fingertips, but i suppose that if the encoded sequence were long enough, enough could survive after a billion years for it to be identified to a high statistical accuracy even though errors would have accrued.

C.J.O'Brien

Posts: 395
Joined: Aug. 2005

(Permalink)

Posted: Nov. 14 2005,10:05

Quote

if the encoded sequence were long enough, enough could survive after a billion years for it to be identified to a high statistical accuracy even though errors would have accrued.

But, regarding my last point, wouldn't this "expanded" filter for sequences we will deem improbable enough to even begin considering designed create a situation where essentially random-looking sequences have to be considered too?

This sort of destroys the rationale for the inference in the first place.

--------------
The is the beauty of being me- anything that any man does I can understand.
--Joe G

Russell

Posts: 1082
Joined: April 2005

(Permalink)

Posted: Nov. 14 2005,10:17

Quote

I was actually surprised no one on the PT thread raised this point. Of course, if it happened a billion years ago, one would expect a certain amount of drift which could, I suppose, be used to date the intervention.

Well, since the whole scenario is contrary to our experience, what's one tiny increment of implausibility?

But, no. Without some kind of "selective pressure" I don't think there would be enough unmutated Pi to be able to recognize it after a billion years of genetic drift.

That's why if something like that were observed you would have to look for some explanation outside of the current theory of evolution.

Note, incidentally, that this is pretty much parallel to the prime number SETI radio signals in Carl Sagan's fictional work, "Contact". Sagan, being safely dead, is regularly trotted out by Dembski & co. in this context as if validating their project. If Sagan were not dead, I have no doubt he would take strong and eloquent exception to this abuse of his work. As Panda's Thumb has documented, the SETI people have no use for ID. (I'd give you the link, but I'm not that dextrous with this system just yet.)

--------------
Must... not... scratch... mosquito bite.

lutsko

Posts: 8
Joined: Nov. 2005

(Permalink)

Posted: Nov. 14 2005,10:25

Russell,
I am not a biologist and cannot claim to provide a bullet-proof example. However, I never said the sequence had to be a billion years old - even so, it would depend on the mutation rate.

The SETI business is a red herring. SETI looks for narrow-band signals and that is not relevant to finding a message in DNA.

So I would claim the challenge remains: a recognizable, simply coded sequence outside the bounds of chance is found embedded in the DNA of some organism: would it be unscientific to allow for design as an explanation?

Just to make it a little bit more juicey: suppose some rich, deluded person gave a group of molecular biologists and computer scientists a load of cash to look for such messages - would that be a valid "scientific" project?

C.J.O'Brien

Posts: 395
Joined: Aug. 2005

(Permalink)

Posted: Nov. 14 2005,10:52

Quote (lutsko @ Nov. 14 2005,16:25)

So I would claim the challenge remains: a recognizable, simply coded sequence outside the bounds of chance is found embedded in the DNA of some organism: would it be unscientific to allow for design as an explanation?

Just to make it a little bit more juicey: suppose some rich, deluded person gave a group of molecular biologists and computer scientists a load of cash to look for such messages - would that be a valid "scientific" project?

My answers: no, and no.

No, it would not be categorically "unscientific" to posit the design of (sme part of) a genome. After all, xenobiologists from Mars would be incorrect if they did not attribute the genomes of genetically engineered organisms to human design.

But, the search, absent a priori reasons to believe design occured, would be about as productive as an intensive search of pre-Cambrian strata looking for Haldane's rabbit. (That is, not very.)

--------------
The is the beauty of being me- anything that any man does I can understand.
--Joe G

Russell

Posts: 1082
Joined: April 2005

(Permalink)

Posted: Nov. 14 2005,11:43

Oh dear. I'm afraid we don't have a very stimulating argument here, because I'm afraid I agree. As I said before: No, I would not rule out "design" in the case described.

Much in the same sense that I would be hard pressed to rule out just about any bizarre explanation if, when Neil Armstrong stepped onto the moon, it turned out it was made of green cheese.

Now, lest I've just inadvertantly created a monster, I want to make it clear that high school science classes should not spend any time on the green-cheese-moon theory.

(Just for fun, though, when I have a moment I'll work through some of the math on the Pi in DNA scenario.)

--------------
Must... not... scratch... mosquito bite.

sir_toejam

Posts: 846
Joined: April 2005

(Permalink)

Posted: Nov. 14 2005,15:33

and back the the title of the thread..

as i was trying to present in the original post on PT, arguing whether this is science or not is summed up nicely by CJ, though i tended to ramble on a bit in the original PT discussion.

"But, the search, absent a priori reasons to believe design occured, would be about as productive as an intensive search of pre-Cambrian strata looking for Haldane's rabbit. (That is, not very.)"

it all boils down to what those a priori reasons would be based on, and as far as i can see, currently those are all subjective when we are discussing anything other than ourselves.

it doesn't rule out that there might appear objective evidence that would then give us a priori reasons in the future, but those don't exist now.

sir_toejam

Posts: 846
Joined: April 2005

(Permalink)

Posted: Nov. 14 2005,15:45

Quote

suppose some rich, deluded person gave a group of molecular biologists and computer scientists a load of cash to look for such messages - would that be a valid "scientific" project?

uh, correct me if I'm wrong, but didn't you in the same post claim that SETI was a red herring?

why not just simplify the argument and let's discuss whether SETI itself is a truly scientific endeavor.

normdoering

Posts: 287
Joined: July 2005

(Permalink)

Posted: Nov. 14 2005,17:29

Quote (lutsko @ Nov. 14 2005,14:29)

... I think some defenders go over board in denying the possibility that ID could ever be a scientifically respectable hypothesis. ...

I doubt you'll ever have an ID hypothesis that points at God for one simple reason: I don't think there is a God.

However, in the future, law makers and governments may want to know if virii and germs are evolved or designed so they know when a crime or an act of war has been committed.

They may want to tell genetic doping from naturally occuring differences between people.

In those cases the things to look for would be:

1) A lack of similar enough ancestors. (too big an evolutionary jump for to few generations)
2) Lack of junk DNA (though it might be faked)
3) Motive for design.

Anyone add more?

Hyperion

Posts: 31
Joined: June 2005

(Permalink)

Posted: Nov. 14 2005,19:58

In the same vein as similar ancestors, I'd add genetic diversity. One of the few leads in the anthrax cases a few years back was that scientists determined that the spores were all of the Ames strain from Ft. Detrick's labs. Bioweapons would presumably be genetically homogeneous, whereas a wildtype epidemic would have much variation. Look at the multiple strains of ebola, HIV, and influenza that exist in nature.

Genetic homogeneity is a good sign of design. Conversely, genetic diversity is a hallmark of evolved systems.

Tim Hague

Posts: 32
Joined: Nov. 2005

(Permalink)

Posted: Nov. 14 2005,20:55

How long would it take for a genetically engineered 'bioweapon' virus or bacteria to start showing diversity once it's released? Not long, I would think.

Also, someone engineering bioweapons would probably create more than one similar strain, specifically to avoid a single vaccine being effective.

sir_toejam

Posts: 846
Joined: April 2005

(Permalink)

Posted: Nov. 14 2005,21:13

Quote

Genetic homogeneity is a good sign of design

in a lab, maybe. in the field don't forget the effect of bottlenecks.

for example, california sea otters have a very high degree of homogeneity. does that mean they were designed?

nope, it means they were hunted to the point where there were only one or two mating pairs left that served as the nucleus for most of the otters now existing off the coast.

same with elephant seals iirc, and many populations of african lions (tho in that case it was mostly due to a disease).

Tim Hague

Posts: 32
Joined: Nov. 2005

(Permalink)

Posted: Nov. 14 2005,21:41

Quote

A lack of similar enough ancestors. (too big an evolutionary jump for to few generations)

The problem with that is that big changes sometimes do happen - particularly with transposons or retrotransposons. Or frame shift mutations. Or even symbiogenesis.

I think overall there is a problem with detecting design. This is the same problem encountered by the ID proponents - their 'design detection mechanisms' have been shown to be useless over and over again.

I realise that what they are trying to show is 'supernatural' design, but I think the same problems occur when trying to demonstrate human design as well.

This was one of the points I was making on the original thread - if hypothetically some (well funded) IDist splices a whale gene into a bacterium and claims it was found in the wild (therefore 'blowing evolution out of the water' yadda yadda) - how do you show it was designed - and specifically how do you show it was designed by a human and not some unspecified supernatural designer?

claw

Posts: 3
Joined: Nov. 2005

(Permalink)

Posted: Nov. 15 2005,00:00

Dear Lutsko

Thanks for coming over to the Bar. Others have already said most of what I wanted to already. If you want to take this further, you mught want to have a crack at the questions I asked in the other thread.

1. What sequence of pi could not be explained by known genetic events?

2. What sequence of pi could not be explained by as-yet-unknown naturalistic processes?

3. What is more likely, 100 binary places of pi or 1700 consecutive GAA triplets on a chromosome?
See http://www.ich.ucl.ac.uk/cmgs/neuro99.htm

4. How far into pi does your phone number occur?
See http://www.angio.net/pi/piquery

5. What are the chances of finding self-referential loop sequences in transcendental numbers? Would this be evidence for design?
See http://www.angio.net/pi/piquery

PaulK

Posts: 37
Joined: June 2004

(Permalink)

Posted: Nov. 15 2005,01:36

Intelligent design could, in principle, be scientific. But I don't beleive that a single data point - no matter how puzzling - is enough.

What it would take in my view is the production of fruitful design hypotheses. That is constructing hypotheses about what the Designer would do that lead to predictions about what we will find.

Wesley R. Elsberry

Posts: 4966
Joined: May 2002

(Permalink)

Posted: Nov. 15 2005,02:27

Back in 1997 at the first DI CRSC conference, we critics were asked to say that ID either was scientific or could, in principle, be scientific. We asked what would an ID hypothesis look like, and how would it be tested. We were told that would follow in the fullness of time.

Time seems to be getting fuller and fuller, but so far that specification of an ID hypothesis and means of testing it remains firmly backordered.

ID advocates need to tell us what must be true if their conjecture is true. So far, all we get is Paleyist "looks designed to me" stuff and "evolution doesn't explain X".

As for design detection, try out The Advantages of Theft Over Toil. Then go on to Information Theory, Evolutionary Computation, and Dembski�s Complex Specified Information.

--------------
"You can't teach an old dogma new tricks." - Dorothy Parker

Russell

Posts: 1082
Joined: April 2005

(Permalink)

Posted: Nov. 15 2005,06:08

ooh ooh ooh! can I play?

Quote

1. What sequence of pi could not be explained by known genetic events?

None, of course. What would seem to call for an explanation is the remarkable coincidence that such a sequence should occur in a genome. To wit: 100 nucleotides of a pre-specified sequence having no conceivable (or at least no conceived of as yet) connection to the biology, chemistry or physics of DNA, should occur by chance about once in every 4^100, or 1.6x10^60 sites examined. H. sapiens, for example, has about 3x10^9 sites to examine, so the odds of finding it by chance would be around 2x10^-51. Examine a million or so more completely independent genomes (of course, they don't exist, due to common descent, but just for the sake of argument...) you've only increased the chances to 2x10^-45. In other words, finding it would so defy the odds it would seem to call for some explanation. But now include the possibility of finding either pi, or 'e', or the Golden Ratio, or any of about 3 billion more irrational numbers, and your chance of finding oneof them is pretty good. (This is essentially what Dembski does.)

Quote

2. What sequence of pi could not be explained by as-yet-unknown naturalistic processes?

None, of course.

Quote

3. What is more likely, 100 binary places of pi or 1700 consecutive GAA triplets on a chromosome?

The latter, since they can arise by known mechanisms of polymerase "stuttering" and homologous recombination.

Quote

4. How far into pi does your phone number occur?

Don't know, but it should be on the order of 10,000,000 decimal places (16,666,667 DNA base pairs) - not including the area code. I.e it has a good chance of being found at least a hundred times in the human genome. In fact, I think that might be where the telemarketers found it.

--------------
Must... not... scratch... mosquito bite.

DaveRAFinn

Posts: 15
Joined: April 2005

(Permalink)

Posted: Nov. 15 2005,06:33

One of the main reasons that ID has a bad reputation is that the proponents confuse the concepts of intelligent creator and god. The two are quite distinct and indeed incompatible - an intelligent creator is, by definition, working within rules to achieve an effect, a god, presumably, is above rules and operates by whim.

An example of a scientific intelligent creator theory can be obtained by noting that any gravitationally dominated universe is unstable (see Einstein). A stable universe can only be obtained by use of a feedback system. One may offer the theory that since the universe is highly chaotic (in the technical sense), contains black holes (which affect curvature) and has intelligent life it has the necessary components of a feedback system (detector, amplifier, corrective effect) and may eventually acquire one once we have learnt how to play our part. The theory that the universe is optimised for this condition is an ID theory. It is testable - the optimisation can be checked and proved (or more probably disproved). Within any physical theory of this type the optimisation formula is the only representation of ID - essentially the formula is the ID, anything else is speculation and the ID would have a similar function to any other natural law. The only difference being that it is intentional in the sense that it is goal directed - the effect precedes the cause.

lutsko

Posts: 8
Joined: Nov. 2005

(Permalink)

Posted: Nov. 15 2005,07:25

"1. What sequence of pi could not be explained by known genetic events? Etc."

I think good answers to these questions were given a few posts up. One interesting question asked was "what are the odds if you consider all the possible irrational numbers one might look for"? Here, I think it necessary to step back and consider: why would a designer embed such a message in the genome. Clearly, it is **because** he wants to leave a signiture. Being a super-capable genetic engineer, I suppose he would know how many base pairs there were, etc. and would make the message unmistakeable, which i take to be ridiculously outside the bounds of chance. Whether that means pi to 100 digits or euler's constant to 10,000 i don't know but it wouldn't make sense otherwise.

[I have no trouble with this sort of speculation. The ID'ers wouldn't like it because it requires "reading the mind of God". God would of course leave a signal just ambigous enough to be explainable by chance but convincing to the ID'ers - like he always does.]

Hyperion

Posts: 31
Joined: June 2005

(Permalink)

Posted: Nov. 15 2005,09:00

I think that it becomes obvious pretty quickly, reading through our hypotheses, that one cannot know what evidence would point to a designer without first knowing the identity, or at least abilities and constraints, of the designer.

Since the ID camps has pretty much refused to state anything about their hypothetical designer, and in many cases their Fellows have implied that their "theory" does not describe the designer at all and never will, then there is simply no method of knowing what might constitute evidence for design.

No ID hypothesis or theory can be considered even testable unless it defines the "designer," and describes its methods, as I think our little exercise in hypotheticals here has shown quite well.

Russell

Posts: 1082
Joined: April 2005

(Permalink)

Posted: Nov. 15 2005,09:04

I think Lutsko is on to something here! Far more useful than any of the games they're playing now, the ID crowd should immediately drop everything else and begin scouring the genetic databanks for any of the 3 or 4 most obvious irrational numbers. If and when they find any of them to at least 100 basepairs, using any of the 24 possible 1 to 1 nucleotide to base-4 digit assignment schemes, they will earned some attention. Until then, their demands for attention are just a nuisance.

--------------
Must... not... scratch... mosquito bite.

Bulman

Posts: 8
Joined: Nov. 2005

(Permalink)

Posted: Nov. 15 2005,10:57

I feel a strong urge to bring up the "poker response". �A royal flush is a highly unlikely hand to get, but given that you were dealt five cards, any hand you could possibly hold has the exact same odds of occuring. �

"Junk DNA" carried out to a hundred digits assigning the numbers 1-4 (arbitrarily to each nucleotide) will consist of a hundred digits. �Getting Pi would be nothing more than "neat". �Maybe even "neat-o".

Although numerology can be fun, and is a science according to Behe (SATB), it should be used for entertainment only.

Russell

Posts: 1082
Joined: April 2005

(Permalink)

Posted: Nov. 15 2005,12:26

Bulman: you're right, the odds of getting a royal flush are the same as the odds of getting any hand of 5 cards. But the odds of saying: "The next hand I am dealt will be 2H,QD,JS,5H,KC" - and then getting it, in that order no less - would be a truly remarkable coincidence, or demand a nonchance explanation.

So, yes, finding any sequence of 100 nucleotides corresponding to an irrational number would be no more than "neat". But saying, in advance, that you're going to find one in particular - like Pi - and then actually finding it. Well, that would be impressive.

Dembski & co. are right if they say that finding such a thing would deserve more than a passing "neat-o" (provided, as I stipulated above, that (1) we know what we're looking for, and (2) define our "translation algorithm" before we start looking, and (3) what we're looking for has no plausible mechanism - like polymerase stuttering and homologous recombination to explain it in advance). But they're dead wrong in claiming that anything seen in any genome to date does anything like that.

--------------
Must... not... scratch... mosquito bite.

Bulman

Posts: 8
Joined: Nov. 2005

(Permalink)

Posted: Nov. 15 2005,12:42

"But the odds of saying: "The next hand I am dealt will be 2H,QD,JS,5H,KC" - and then getting it, in that order no less - would be a truly remarkable coincidence, or demand a nonchance explanation."

You're right, that does change things quite a bit from the regular poker argument. �I will admit that your argument is better. �However, I must succumb to the tempting practice of changing the argument when faced with valid refutation:

Lottery numbers are picked (or randomly generated) prior to drawings, and in the "PowerBall" versions the sequence does play a role.

**Edit** �But probability-wise, let them go on a merry chase and perhaps it will advance science by serendipity.

WayneFrancis

Posts: 4
Joined: Nov. 2005

(Permalink)

Posted: Nov. 15 2005,19:31

From the original poster
"I do not mean any arcane code: suppose each amino acid corresponded to one of the digits from 0 to 3 and the number was coded in base 4."

First amino acids are base 22 in DNA. �What you are talking about is nucleotides.

And even at that simple level we are left with 24 different combinations we would have to search for. �Which is far better then the 1,124,000,727,777,607,680,000 different combinations we would have to search for if we worked at the amino acid level.

Add to this the arbitrary selection of what "constant" we should look for along with the arbitrary "precision" we would look for said constant makes this search useless.

Hundreds or even thousands of constants to choose from, the vast range of precisions we could look for in each constants multiplied by 24 different translations of the base 4 leaves you with a very good chance that you will find some constant of some significant precision in some genome of some individual. �

We should also not expect an ID to put PI into any genome. �This is because PI can not be represented digitally accurately for 2 reasons
1) digital systems can not represent irrational numbers
2) digital systems need an agreed upon standard to translate real, rational and irrational, numbers into a digital format. �
This means we have to know how many digits are used to represent a standard number and how many of those digits represent the scale of said number.

Most of our computer systems use the following for a double precision floating point number (ie 64 bit).
Sign Bit = 1 bit
Exponent = 11 bits
Fraction = 52 bits
there is some serious logic in the use of floats. �The standards took a lot of work before it was agreed upon and I don't expect anyone that is not familiar with the IEEE 754 standard to be able to look at

11110000 11001100 10101010 00001111

and tell me what that binary number represents.
Just as I could not expect any scientist to look at

GCAGGTTAACAAGGAGTTTGCTAGAT

and tell me what number that represents. �Without a documented standard you would have no hope.

Do you have God's, I mean the Intelligent Designers, standards handy?

WayneFrancis

Posts: 4
Joined: Nov. 2005

(Permalink)

Posted: Nov. 15 2005,19:39

Quote

Whether that means pi to 100 digits or euler's constant to 10,000 i don't know but it wouldn't make sense otherwise.

so you do not know something but you know it makes sense? �

Please hand me over the "super-capable genetic engineer"'s standards for storing real numbers in a base 4 digital storage system. �Since you "know" it to be a sign of the "super-capable genetic engineer" then you must "know" the standards said "super-capable genetic engineer" uses. �Without said standards we have bucklies of finding this "unmistakeable message" left behind for us.

Tim Hague

Posts: 32
Joined: Nov. 2005

(Permalink)

Posted: Nov. 16 2005,22:16

Quote

ooh ooh ooh! can I play?

1. What sequence of pi could not be explained by known genetic events? None, of course.

You are wrong unfortunately.

Assuming some standard of coding was agreed on there is no particular reason why a human scientist couldn't manufacture a DNA sequence representing pi and splice it into a living organism.

Which would class as a known genetic event.

WayneFrancis

Posts: 4
Joined: Nov. 2005

(Permalink)

Posted: Nov. 17 2005,04:09

GCAGGTTAACAAGGAGTTTGCTAGAT

is one way to code PI to 16 digits. For this example 16 digits or 1600 digits it really doesn't matter
Say you find that sequence in organism X
Organism Y is thought to be closely related to organism X
Looking at Organism Y we find the sequence

GCTGGTTAACAAGGAGTTTGCTAGAT

What does this mean? The above sequence no longer = PI to any digits. How does this genetic sequence differ from the one above that equals PI in practical terms?
It doesn't! In organism Y we don't find PI. We find a sequence that if we have a point mutation it becomes PI. But both sequences will function exactly the same because they code the same amino acids.
So there you have a mutation that could occur from Organism Y to Organism X and the end result is that it produces on of many sequences that can be interpreted as PI.

But I've said it once and I'll say it again. Unless you have the Intelligent Designers standards for encoding floats into nucleotides then it doesn't matter. You can form an IEEE standard before going to look for PI to 1 million digits in any organism but it doesn't matter. What is the odds that you'll pick the same standards for encoding a float in a digital format in the same manner that an Intelligent Designer decided to?

PaulC

Posts: 18
Joined: Nov. 2005

(Permalink)

Posted: Nov. 17 2005,06:11

I haven't read the other ABC comments, but I want to add something I think is pertinent from my late replies on the Panda's Thumb thread.

First, in response to the subject: No, ID is claptrap. However, the mathematically rigorous field of decryption could be viewed as a set of methods for discovering human agency in data, and this strikes me as a better framework in which to cast the "finding pi" thought experiment.

The reason I favor this view is that it gives a simple counterexample to the notion that "finding pi" is always dismissable as cherry picking. This is easier to follow than my previous points about restricting the family of encodings or detecting string compressibility, so I present an improved thought experiment. It's better than the original because it doesn't depend on ridiculous assumptions about finding things in DNA that you will almost certainly never find there.

Suppose you're employed in a counter-terrorism agency to analyze communications for suspect content. A message comes to you to analyze. You know nothing about the senders, but roughly, it is a short message "Attached is the bacterial gene sequence you requested. If you need any more help, feel free to contact our lab any time. Sincerely, [etc.]" And there is an attached file that looks like a DNA sequence.

Assume we've already ruled out that this is a bacteria of interest to putative terrorists. You're basically a math geek who knows no biology and are employed to find encrypted messages.

One thing to note here is that it's not always feasible to decrypt a single message in isolation. Decryption experts are most successful when they have a long series of communications to analyze. But in this case, you have some time to spare and you get lucky.

After running the usual kinds of frequency analyses and finding nothing that suggests the string is not random, you're about ready to put it aside and go to the next problem.

But, being a math geek, you decide to see what happens if you try each of the 24 different ways of interpreting the bases A,C,T,G as bit pairs 00,01,10,11 and convert the string into bits according to that code. For each of these 24 strings, you take a prefix of the binary digit expansion of pi, XOR it against the string at every starting position, and interpret the resulting string as ASCII encoded text. Note: the likelihood that you did all of this is pretty low, but it is a plausible hypothetical scenario.

On a modern computer, all the tests should run in a matter of seconds. You do a frequency analysis on the resulting strings, and one of them shows a substring that looks statistically like English plaintext written in ASCII. You zoom in for a closer look and it is a paragraph of about 1k of grammatical English text that appears as a result of decoding the gene sequence as A=10, C=11, G=01, T=00 and XORing the resulting string with a prefix of the bit expansion of pi starting at the beginning of the text. Thus, the text has been XORed with about 8k bits of pi starting from the beginning.

Question 1: At this point do you dismiss your find as cherry picking or do you consider it more likely that the English text is an encrypted message put there by human agency and include this in a report as a significant finding?

Question 2: If your answer to the above is the former (dismiss as cherry picking) than do you believe that this "XOR with pi" cipher is a strong encryption method suitable for sending encrypted messages that will not be cracked by suitably motivated decryption experts?

PaulC

Posts: 18
Joined: Nov. 2005

(Permalink)

Posted: Nov. 17 2005,07:00

Let me add one comment to the previous post to counter objections that I have not "found pi" in the string but only hypothesized that it was used as an encryption pad.

Consider the same scenario, but instead of finding 1k of English text after the transformation, you find a 1k block of zero byte values. In that case, it would literally be the case that the string contained 8k bits of the prefix of pi encoded as one of the 24 substitution codes from nucleotide bases to base-4 digits.

That would be less interesting for counter-terrorism purposes than finding what looked like a message in human natural language, but I can ask the same question. Did I "find pi" because I engaged in cherry picking, or can I reasonably conclude that the inclusion of pi in the file is a significant and refutes the null hypothesis that the string is merely a sequence taken from ordinary wild type bacterial DNA.

And if you still explain it as cherry picking, do you think that the presence of pi in the sequence would escape the notice of decryption efforts given sufficient effort to find a pattern in the string.

To go off on a little bit of a tangent, suppose it wasn't just an email file, but something that had in fact resulted from sequencing actual DNA. Wouldn't there be some grounds to suspect that human agency was involved in inserting this sequence into the bacteria itself?

Reasonable alternatives are that it is either random chance, contributes to fitness, or can be explained in terms of the DNA copying process. Given the difficulties in any of these explanations, and given that a motivated human actually could insert encrypted messages in bacterial DNA, this one still strikes me as the best starting hypothesis.

PaulC

Posts: 18
Joined: Nov. 2005

(Permalink)

Posted: Nov. 17 2005,07:05

"given that a motivated human actually could insert encrypted messages in bacterial DNA"

And I would add: given that there are some common sense reasons to speculate that a human would be motivated to do so.

None of this, by the way, would establish the truth of the hypothesis. It would just establish the plausibility of the hypothesis of human agency.

Bulman

Posts: 8
Joined: Nov. 2005

(Permalink)

Posted: Nov. 17 2005,09:08

If I were a counter-terrorism agent, I would consider it a lead worthy of follow-up. �Likewise, as a scientist I would consider it an observation worthy of follow-up.

Either way, I would not consider it meaningful by itself.

Question 1 - I submit it as cherry picking.

Question 2 - I don't know enough about cryptology to answer. �But it sounds "neat".

PaulC

Posts: 18
Joined: Nov. 2005

(Permalink)

Posted: Nov. 17 2005,09:25

OK. Well, the correct answers are:

(1) the data string failed a rigorously defined test of randomness that can be formalized in terms of Kolgmorov complexity. It is therefore a significant finding.

(2) XORing with a prefix of pi is a very weak encryption method. Don't try it if you really want to keep your data private.

Bulman

Posts: 8
Joined: Nov. 2005

(Permalink)

Posted: Nov. 17 2005,09:59

Quote

After running the usual kinds of frequency analyses and finding nothing that suggests the string is not random, you're about ready to put it aside and go to the next problem.

I'm confused. �I guess my answer to Question 2 is more accurate than I thought. �Are you saying that finding pi is paltry, significant, or sufficient evidence for Design with the big "D"?

PaulC

Posts: 18
Joined: Nov. 2005

(Permalink)

Posted: Nov. 17 2005,10:45

Quote

I'm confused. I guess my answer to Question 2 is more accurate than I thought.

I can understand that it's a little confusing because people usually don't approach biology this way--and with good reason, since it is unlikely to be a fruitful approach. It is a fruitful approach to finding secret messages in places where you have some reason to suspect secret messages, and it is a rigorous methodology.

First off, my reference to "the usual kinds of frequency analyses" is to make the point that XORing with pi is not, as far as I know, part of any conventional randomness tests. Actually the digits of pi are sometimes used as a pseudorandom source (i.e. not actually random but mimicking many statistical properties of random digits).

I'm not a counter-terrorism agent and have only an amateur's interest in cryptology, so I cannot say if it would routinely be applied in decryption attempts. It seems a little too exotic to me, but what do I know?

Unfortunately, there are no foolproof tests of randomness. You can refute the hypothesis of randomness by showing that a string is compressible (a sufficiently long prefix of the expansion of pi is compressible into a much shorter computer program). However, there is no test that proves for certain that the string is incompressible.

However, a long string of the letters A,C,G,T each encoding a digit 0-4 in a uniform way so that the string was a prefix of the base-4 expansion of pi would indeed be a highly compressible string. So it would fail the test of randomness based on Kolmogorov complexity and would require some other explanation.

Quote

Are you saying that finding pi is paltry, significant, or sufficient evidence for Design with the big "D"?

I'm not sure what you mean by big D. (Seriously. Is a human designer a small or a big D?)

I'm saying that "finding pi" in the precise sense I have defined it (taking great care to rule out cherry picking the encoding, but without actually assuming a particular encoding) is significant in the sense that it refutes the hypothesis that the string is random.

If the string were found in bacteria, its non-randomness would be sufficient evidence of some mechanism not covered by our current understanding of the chemistry of DNA copying (we can get repeats, but we have not seen DNA perfoming numerical calculations that would produce pi) or by natural selection (how could secret encodings of pi improve the fitness?).

As to the particular mechanism, my first guess would be human agency. That would be a guess and not a conclusion unless I could demonstrate this some other way. The longshot runner up would be that somehow DNA can act as a numerical calculator to generate pi. If that could be refuted, then I think I'd just leave the observation open as unexplained until someone thinks of another testable hypothesis. However, the low Kolmogorov complexity would suffice to rule out chance as a reasonable explanation.

The reason ID is claptrap is that it doesn't stop by saying "we can't explain it yet."

Russell

Posts: 1082
Joined: April 2005

(Permalink)

Posted: Nov. 17 2005,10:51

PaulC - coupla questions from someone who would like to have been a math geek but just wasn't smart enough:

I would have guessed that it was a significant find, as the odds against finding such a thing by chance with the smallish number of algorithms I could imagine trying (probably less than a million!

strike me as astronomical. But I don't know how to formally apply the test you mention. Can you go through it step by step, or give a reference?

My second question is what exactly does "prefix of pi" mean? Is that just the first (however many) digits?

--------------
Must... not... scratch... mosquito bite.

Bulman

Posts: 8
Joined: Nov. 2005

(Permalink)

Posted: Nov. 17 2005,11:53

By Design with a big "D" I am redundantly implying Intelligent Design by emphasizing that I capitalized the word 'design'. �Yes, saying it once is better than implying it twice.

Quote

Unfortunately, there are no foolproof tests of randomness. You can refute the hypothesis of randomness by showing that a string is compressible (a sufficiently long prefix of the expansion of pi is compressible into a much shorter computer program). However, there is no test that proves for certain that the string is incompressible.

Quote

I'm saying that "finding pi" in the precise sense I have defined it (taking great care to rule out cherry picking the encoding, but without actually assuming a particular encoding) is significant in the sense that it refutes the hypothesis that the string is random.

Compressable strings never happen by chance? �Other than pi and phi, what other numbers never occur by chance?

Again, numerology is fun, but should only be used for entertainment. �I'm off to Google Kolmogorov.

PaulC

Posts: 18
Joined: Nov. 2005

(Permalink)

Posted: Nov. 17 2005,12:47

First off, you can find a lot about Kolmogorov complexity on the web. Here's a wiki page to start: http://en.wikipedia.org/wiki/Kolmogorov_complexity
It's completely rigorous and has nothing in common with numerology.

Quote

Compressable strings never happen by chance? Other than pi and phi, what other numbers never occur by chance?

Obviously "never" is not the word to use when asking questions about probability.

Compressible strings occur with very low probability in one-element samples taken from uniform distributions of strings. Examples of compressible strings include the first n bits of the expansions of pi, phi, e, etc. as well as the bit representation of the first n primes (or fibonacci numbers, catalan numbers, etc.) concatenated in increasing order. They also include strings with a lot of repetitions--ones that are compressible in the sense that a compression algorithm like gzip will reduce the size of files containing them. There are lots of other examples, with Kolmogorov complexity serving as the most general definition of compressibility.

To make this rigorous, you can think of an experiment in which I am able to flip an unbiased coin n times to produce a string of n bits (H=1, T=0). In addition, I have a universal computer of some kind (the details only affect some constants). I can speculate on the probability of finding any n-bit string for which I can write a program on my computer using k bits or less.

You might reasonably set k to about 1000 and n to about 1100 for purposes of illustration. 1000 bits is probably about enough to space for code to calculate most of the well-known transcendental constants. It might not be the most readable code or fastest, but with some ingenuity you can probably fit some iterative method in this space.

Now I run the experiment. My assistant flips 1100 coins one by one and writes down the result without showing it to me. Before looking at the string, I ask "What is the probability that this string will be identical to the output of some program written in 1000 bits or less of code on the computer I have here in front of me?" Note that output means the final result written by the program after it halts and includes all 1100 bits written out.

There are no more than 2^1000 strings that can be output using 1000 bits or less of code and 2^1100 possible strings that my assistant may have just generated, each occuring with equal probability.

Therefore, I can claim that the probability of it being one of these strings is no more than 2^1000/2^1100, or 2^(-100), which is a very small probability (about 10^-30).

Now I look at the string, which happens to be the prefix of the expansion of pi. I think "Oh, this is one of the very low probability events that I just defined." I then write a 1000-bit or less program to generate pi just to make sure I'm correct in thinking so.

Now, suppose it's possible my assistant could play a joke on me. Which is more reasonable, that flipping coins gave me one of the fewer than 2^1000 compressible strings (less than 10^-30 probability) or that my assistant is, in fact, playing a joke on me? I would say the latter, but you could disagree. I wouldn't draw hasty conclusions, but would just try to rule out the possibility in the future.

Which step in the above do you consider "numerology"? It is only elementary probability.

PaulC

Posts: 18
Joined: Nov. 2005

(Permalink)

Posted: Nov. 17 2005,14:13

For that matter, let's consider a more routine example of statistical inference similar to the previous example of flipping a coin.

In the new example, I will flip the coin 100 times and simply count the number of heads and tails that come up. Beforehand, I ask myself "What is the probability that a flipping a fair coin in this fashion will come up heads 10 times or fewer?"

I treat this question much like the previous example. There are N=2^100 possible results of fair coin flips, each equally probable. There are far fewer that fit my a priori constraints of having 10 heads or less, namely M=(100 choose 0) + (100 choose 1) + ... +(100 choose 10) (i.e. the sum of the first 11 degree-100 binomial coefficients). The probability that one of these comes up is M/N, which is quite small (left as an exercise).

If I do this experiment and find 10 or fewer heads, I may have witnessed a very low probability event occurring with a fair coin. But I may reasonable infer that in fact it's not a fair coin after all. In fact, there's a way to infer that using something called a p-value, based on the probability. The lower the probability the more confidence I have that the coin is not fair.

This is all pretty routine stuff, routinely accepted. No casino would use dice that failed this kind of statistical inference test.

Now if I base my statistical inference on Kolmogorov complexity, defining the low probability events as the subset having low Kolmogorov complexity, then how is that any different than defining it based on grouping results by number of heads. Both are simply ways to group results, and both are valid for statistical inference?

I'd like to add that none of this supports ID. I am merely discussing the scientific significance of highly compressible data. When data is compressible, it demands some explanation other than chance. In the case of living things, evolutionary theory provides that explanation.

PaulC

Posts: 18
Joined: Nov. 2005

(Permalink)

Posted: Nov. 17 2005,18:10

Russell: Prefix of pi means what you think it means.

The way I would formalize my reasoning here would be to define a p-value (a standard tool of statistical inference) for strings of nucleotide bases in terms of P(n, L), the probability that the Kolmogorov complexity of a uniformly chosen random nucleotide string of length n is L or less. For values of L significantly less than n, P(L) is vanishingly small (in other words, almost all strings are incompressible; this is a well known theorem).

Then for any given string s, you can take its Kolmogorov complexity L_s (unfortunately this cannot be calculated exactly but you can get an upper bound by measuring the length of a computer program to calculate it) and define its p-value as P(n, L_s). For any given string, a low p-value gives you confidence of its significance as follows:

Suppose you find a string s and note that its Kolmogorov complexity is L_s<<n. You might think that's significant, but you need to consider the possibility that it was produced by chance. So you ask, what's the probability that a string with such low Kolmogorov complexity could be produced by chance? That's your p-value. If the p-value is very low (which it will be for L_s<<n) then you have a measure of your confidence that it was not produced by chance.

Note that a much more conventional use of p-values would allow you to find a p-value for a sequence of coin flips based on the number of heads and conclude with some measurable degree of confidence that it was not produced by flipping an unbiased coin (e.g. because way too many flips came up heads). That is just very run of the mill statistical inference, nothing esoteric about it, and this is merely an extension taking Kolmogorov complexity into account.

In practice, nobody uses Kolmogorov complexity like this for scientific purposes, but it is the basis of a rigorous version of Occam's razor called Solomonoff induction. (Some of this is a little new to me from doing recent searches--it's not exactly my field--but there is a lot of information out there.)

Unlike ID, this is sound methodology. Some of Dembski's arguments are superficially similar to this, but he's blowing smoke because all you can do with these arguments is show that the string was not produced by chance alone. This is not sufficient to conclude anything about a "designer."

Evolution is a process of self-organization. It is influenced by chance events, but not identical to chance. There is also a wealth of evidence supporting it. So when you find pattern (compressibility) in nature, this in no way contradicts the idea that it was produced by a natural process. It only contradicts the assumption that it can be explain by uniform chance.

WayneFrancis

Posts: 4
Joined: Nov. 2005

(Permalink)

Posted: Nov. 17 2005,23:32

compairing a message that say "here is the DNA sequence you asked for [GCAGGTTAACAAGGAGTTTGCTAGAT]
then working with that sequense and finding it can represent PI is one thing

Looking at real DNA and finding the sequence GCAGGTTAACAAGGAGTTTGCTAGAT
in the DNA means another.

There is nothing saying you can't have a sequence of DNA that can be translated into PI occur naturally.

There is no invisible mechanism that stops said sequence from occuring.

All this is really hypothetical. So what if you analyse organisms X's DNA and don't find PI in any one of the 24 forms it could be held in? Also remember I said my example of PI was simplistic. It really does not represent PI but 3141592653589793. To really find PI you not only have to know the conversion to the 24 different combinations but you need to know the standard that was used to store floating point numbers in a digital format. Is it using a IEEE standard? How many nucleotides are used to store your number PI. How many nucleotides represent the precision of PI. What nucleotide/partial nucleotide represents positive and negative? At what position is the bias information held.

These are all questions you CAN NOT ANSWER. Give me any 1 million bit length of data and I can come up with some encoding meathod that I'll get PI or a string that if I used some other encoding of PI would result in some seemingly non random data given enough time.

In terms of PI in naturally occuring DNA don't even talk about as it relates to an "Intelligent Designer" unless you have said designers data storage standards.

Simple as that. The scenario you meantion with the message with a sequence in it only means 2 things.
1) the person that encoded the message is an idiot for using said encoding meathod.
2) you where lucky picking the key.

Finding it in real DNA means NOTHING. There is no biological mechanism that stops DNA from having a string of nucleotides that can be transalted into PI by some digital manipulations. None nadda zilch.

PaulC

Posts: 18
Joined: Nov. 2005

(Permalink)

Posted: Nov. 18 2005,03:50

Quote

Finding it in real DNA means NOTHING. There is no biological mechanism that stops DNA from having a string of nucleotides that can be transalted into PI by some digital manipulations. None nadda zilch.

Of course there's no mechanism that completely stops it from being explained by chance. However, using a rigorous technique from statistical inference (see my posting above) you can show that the chance explanation is not compelling, because the probability of selecting a sufficiently long string of such low Kolmogorov complexity from a uniform distribution of strings of that length is vanishingly small. If you substitute "uniform distribution" with our best statistical model of junk DNA, the statement still holds, although it would be more difficult to analyze.

This is elementary statistical hypothesis testing. The only difference is that I'm using Kolmogorov complexity rather than some more conventional statistics to group the set of possible outcomes.

If I flipped a coin 100 times and it came up heads only 10 of those times, would you simply accept that there is no "mechanism that stops" this from happening, or would you refer to a huge body of standard statistical techniques that would allow you to state your high confidence that this is not an unbiased coin?

If I'm looking at a sequence purporting to be DNA and I think it contains a sufficiently simple code for a bit prefix of pi, that means something. It probably means that it's not really DNA or that in the unlikely event it is, it was put there by a human being. But to simply claim its presence is dismissable as cherry picking is no more rational than rejecting the statistical inference techniques used to determine if various randomizers (like coins and dice) are actually unbiased.

PaulC

Posts: 18
Joined: Nov. 2005

(Permalink)

Posted: Nov. 18 2005,04:15

I think I'm beginning to understand some of the resistance here and on PT. Some of the worst obfuscated ID pseudoscience (yes, I mean Dembski) looks superficially like the legitimate, peer-reviewed discipline of randomness testing. http://www.ciphersbyritter.com/RES/RANDTEST.HTM This makes it difficult to bring up such arguments without being suspected of trying to push ID.

It's kind of an old joke that you ask someone to make some random choices. Then, they look at those choices and say, "Oh, that's not random enough." and choose something else. It's a JOKE because the choices taken from a uniform distribution are all equally probable, so, for instance, having my dog's birthday come up as the winning lottery ticket is no less probable than any other sequence.

All well and good, but there is a huge body of modern techniques that can actually take INDIVIDUAL strings and declare one to be more "random" than the other in a rigorous sense. The rigorous sense is always based on some notion of compressibility, the most general statement of which is Kolmogorov complexity. So if I ask you to write random 1s and 0s and you write 0101010101010101, there is a rigorous sense in which this is considered not "random enough." Of course it is as probable as any other uniformly chosen string but its compressibility is a salient feature and the probability of picking a string with that much compressibility is indeed vanishingly small.

Anyway, the is a huge body of rigorous peer-reviewed work aimed at testing individual strings and determining if they came from a random source. This is literally impossible, because no matter what the string, it COULD have come from the random source, as WayneFrancis points out. But when one is trying to design a good pseudorandom generator for sampling, or trying to resolve signal from noise, it's just not good enough to stop and say "Well, it could be due to chance." Fortunately, we do not have to stop there, as many researchers have demonstrated.

PaulK

Posts: 37
Joined: June 2004

(Permalink)

Posted: Nov. 18 2005,06:17

Paul, I agree with the general point that if you specify a sufficiently unlikely target in advance - say 100 digits of pi in a fixed point base-4 notation in the human genome - then finding it would be evidence that something odd is going on.

Going back to the subject of the thread, though, one anomaly does not make a scientific discipline. It might be a starting point for a scientific form of ID, but it could never make ID scientific in itself.

Going further, if ID is to be scientific it is up to the supporters of ID to actually BE scientific.

PaulC

Posts: 18
Joined: Nov. 2005

(Permalink)

Posted: Nov. 18 2005,06:39

Quote

Paul, I agree with the general point that if you specify a sufficiently unlikely target in advance - say 100 digits of pi in a fixed point base-4 notation in the human genome - then finding it would be evidence that something odd is going on.

Yes, but I'm pre-specifying a much more general unlikely target, namely any member of the set of length n strings of Kolmogorov complexity k<<n. This is why we're justified in taking anything that looks like a pattern (in a rigorous sense of Kolmogorov complexity and Solomonoff induction), and not just one prespecified string, as evidence that something odd is going on. Solomonoff induction is a very important result, since it formalizes scientific induction and really does provide a rigorous formalization (if not an epistemic justification) of the human tendency to spot patterns in data and a way to measure the explanatory value of these patterns (turns out, the shortest explanation is the best one just as in Occam's razor).

I've been a little bowled over by the controversy, since I believed that most rational people would think that a pair of dice that came up snake eyes 50 out of 100 times was compelling evidence of "something odd going on"--i.e., the dice being biased in some way, even though clearly the outcome could with small probability be due to chance.

The statistical inference in both cases is identical, and actually both are explicable in terms of compressibility. The sequence of rolls from a pair of fair dice are incompressible, whereas those from a biased pair are compressible (e.g. using a Huffman code to use a smaller bit sequence to represent the more frequent outcomes.)

Quote

Going back to the subject of the thread, though, one anomaly does not make a scientific discipline. It might be a starting point for a scientific form of ID, but it could never make ID scientific in itself.

Yes, I agree with that entirely. But this is not the same as saying that every anomaly is dismissable as cherry picking. An anomaly is an anomaly and there are rigorous means for identifying anomalies using statistical inference.

PaulC

Posts: 18
Joined: Nov. 2005

(Permalink)

Posted: Nov. 18 2005,06:55

Quote

Going further, if ID is to be scientific it is up to the supporters of ID to actually BE scientific.

ID the political arm of a religious movement, so don't hold your breath. In case there's any question, I just want to repeat (I've said it twice now), ID is claptrap.

Randomness testing is, however, a legitimate and fascinating field with many peer-reviewed results and implications about inductive reasoning. Unfortunately for ID, randomness testing cannot prove design, only a failure of uniform random processes to explain observations.

Given that evolution is no more random than water crystallization, galaxy formation, the arrangement of cracks in a field of drying mud, etc., etc., then we should not be very surprised to find that most biological data sets would tend to fail randomness tests.

Bulman

Posts: 8
Joined: Nov. 2005

(Permalink)

Posted: Nov. 18 2005,09:46

Adami, Christoph (2002) What is complexity?, BioEssays, Volume 24, Issue 12

Quote

One of the measures most often put forward as a candidate, the Kolmogorov complexity (see, e.g., Ref. 2), turns out to be a measure of the regularity, rather than complexity,of a sequence. This implies that a random sequence is accorded maximum Kolmogorov complexity, clearly not anything we would be interested in as biologists, because random sequences do not give rise to organisms.

I understand this is a thought experiment and you are not fronting Intelligent Design a la Dembski. �I assume your arguments are good, however, the water is too deep for me. �I'm heading back to the gene pool, �but before I go I must ask a few questions:

Shouldn't we expect a certain degree of regularity in complex symmetrical organisms?

Is the lottery sequence 12-34-42-9-3 PowerBall 12 less likely to occur because I pre-specify and buy a ticket with those numbers than if I never bought a ticket?

Is k<<n a typo for k<n or do I need to get a math degree?

Didn't Einstein say, "You do not really understand something unless you can explain it to your grandmother."? �(Maybe you can simplify your argument for me. �I have some statistical/calculus background, but don't bet on me understanding an argument requiring calculus.)

To return to the original topic: �I think that evidence of a recognizable number in nucleotide organization is more likely evidence of an underlying organizational principle rather than Intelligent Design, much the same way 9.8m/s^2 is evidence for local gravitational forces rather than Intelligent Falling.

Russell

Posts: 1082
Joined: April 2005

(Permalink)

Posted: Nov. 18 2005,09:53

Disclaimer: I don't think anyone participating in this discussion thinks ID is anything but a sham and a scam. Nothing I write should be interpreted otherwise!

Quote

...then we should not be very surprised to find that most biological data sets would tend to fail randomness tests

But do they? I think Dembski's "specification" (equivalent to stating beforehand that you're looking for Pi, or for string of greater than such & such Kolmogorov complexity in the above discussion) amounts just to "sequence corresponding to something biologically functional".

If you take a random chunk of DNA from, say, a human genome, you're likely to get a pattern that won't trigger any raised eyebrows by any mathematical analysis, but that corresponds to the gene for some essential enzyme. Does that qualify as failing a randomness test?

--------------
Must... not... scratch... mosquito bite.

PaulC

Posts: 18
Joined: Nov. 2005

(Permalink)

Posted: Nov. 18 2005,10:29

Quote

I understand this is a thought experiment and you are not fronting Intelligent Design a la Dembski.

Glad to hear it. The quote from Adami is correct and a pretty damning criticism of Dembski's entire "research program." Note that I am not proposing a test for high complexity here but for low complexity. The reason a bit-prefix of pi is worthy of note is that it is a LOW complexity object.

Quote

Shouldn't we expect a certain degree of regularity in complex symmetrical organisms?

Yes indeed! And we find it there. The order goes way beyond bilaterial or radial symmetry. In fact, every cell is statistically similar to other cells of the same kind. The statistical frequencies of various macromolecules are far from uniform. Living things are just brimming with regularity and therefore higher compressibility than uniformly random arrangements of matter.

In the sense of Kolmogorov complexity, a live frog is LESS complex than a frog that's been in a high speed blender for a minute. That is, the set of configurations of molecules that could be a live frog is a tiny subset of the set of configurations of molecules that could result after blending.

High Kolmogorov complexity is in no way a proxy for organization and this is where Dembski has to start scrambling for new definitions, which he generally pulls out of a hat (to put it gently). Actually low Kolmogorov complexity isn't a good proxy either, since living things are both less orderly than low-complexity objects like crystals, but more orderly than uniform random objects. This is why it's a difficult thing indeed to pin down life in terms of Kolmogorov complexity, and it's not even clear if the question has any meaning.

It is the relatively LOW Kolmogorov complexity of a frog that makes it more interesting than a frogshake. It is the relatively HIGH Kolmogorov complexity of a frog that makes it more interesting than a salt crystal of equal weight. So where does this leave us? Maybe Kolmogorov complexity is not a great tool for understanding why we find frogs interesting.

Quote

Is the lottery sequence 12-34-42-9-3 PowerBall 12 less likely to occur because I pre-specify and buy a ticket with those numbers than if I never bought a ticket?

Clearly not. Not sure where you're going with that. In a lottery any individual outcome is equiprobable. To make any interesting statistical inference, we need to group the values in some way before asking a question.

Quote

Is k<<n a typo for k<n or do I need to get a math degree?

Informally, it is used to mean "much much less". Obviously that's not defined rigorously, but I wanted to emphasize that I don't mean k=n-1. It would be sufficient to take k<n/10 to make the same point I was making.

Quote

Didn't Einstein say, "You do not really understand something unless you can explain it to your grandmother."? (Maybe you can simplify your argument for me. I have some statistical/calculus background, but don't bet on me understanding an argument requiring calculus.)

He might have but I have been trying to make this both rigorous and understandable and this is close to the limit of my expository skills.

Quote

To return to the original topic: I think that evidence of a recognizable number in nucleotide organization is more likely evidence of an underlying organizational principle rather than Intelligent Design, much the same way 9.8m/s^2 is evidence for local gravitational forces rather than Intelligent Falling.

Well, as I said, my first guess is that it would be evidence of a human either faking the data or (longshot) actually inserting the artifact into DNA. Beyond that, I don't care to speculate. My point is just that it would be an identifiable anomaly not obviously dismissed as cherry picking.

PaulC

Posts: 18
Joined: Nov. 2005

(Permalink)

Posted: Nov. 18 2005,10:41

Quote

If you take a random chunk of DNA from, say, a human genome, you're likely to get a pattern that won't trigger any raised eyebrows by any mathematical analysis, but that corresponds to the gene for some essential enzyme. Does that qualify as failing a randomness test?

Actually, the encoding part of DNA has to specify functional proteins composed of such structures such as alpha helices and beta sheets. A uniformly generated sequence of nucleotides would have very low probability of encoding a biologically functional protein in any organism (at least any that looks like the ones we see in nature). Its 3D conformation would not have the degree of order observed in biologically important proteins.

I agree, though, that you won't necessarily see that in the Kolmogorov complexity of an isolated gene. However, given the set of genes taken in aggregate over an organism, repeated "motifs" (similar functional sections) are often observed. So the DNA as a whole, while not as compressible as a bit prefix of pi, is still more compressible than a randomly generated sequence of nucleotides.

PaulC

Posts: 18
Joined: Nov. 2005

(Permalink)

Posted: Nov. 18 2005,10:49

I would like to add that the failure to demonstrate the compressibility of some data does not imply that there is no order to it. It may simply mean that you haven't figured out the sense in which it is orderly. Supposing that the orderliness of life comes out in phenotype, this need not be reflected in an obvious any way in a gene sequence, and vice versa.

Bulman

Posts: 8
Joined: Nov. 2005

(Permalink)

Posted: Nov. 18 2005,11:02

Great discussion, it just took me a while to figure out what you were saying. �Don't get me wrong, I'm not totally incompetent when it comes to math, but I can more easily understand a frog blender analogy than Russian mathmaticians.

I have tried to rephrase my lottery response in this comment and have failed numerous times. �Instead of digging a deeper hole, let me just say two things: my wife often accuses me of using "bad" metaphors, and; behavioral observations made by close associates tend to be more accurate than one's own.

PaulC

Posts: 18
Joined: Nov. 2005

(Permalink)

Posted: Nov. 18 2005,11:17

Russell: Actually I think I can do a little better on claiming that a sufficiently long encoding region of DNA is highly compressible.

We need to assume a couple of things.
(1) You need to have solved protein folding computationally. There has to be a computer program that can predict the conformation of a protein sequence encoded by some DNA.
(2) You need to have an effective method of ruling out proteins that are not even plausibly functional based on 3D structure.
(3) The computer code for these methods needs to fit into space less than that needed to store the genome you want to compress.
(4) The set of plausibly functional proteins of a certain length is much much smaller than the set of all nucleotide sequences that could encode proteins of that length.

The compression/decompression scheme would unfortunately not be feasible to run in any practical time frame (but it doesn't need to be). It would begin by enumerating all possible DNA sequences up to the maximum length needed (you can see this is not feasible). Then it would test each by finding its conformation and checking if it could possibly be functional (e.g. an enzyme). It gives each of these an integer index number starting at 0 and stores it in a table (for purposes here it is not an obstacle that this code would use up more memory than realistically available in the entire universe).

Now the compressed copy of the DNA consists of the computer program followed by the DNA sequence with the encoding regions replaced by the highly compressed index number (OK, you need extra information to deal with the fact that encoding regions may not be contiguous, but this is a broad proof of concept that the genome is highly compressible, not a practical plan.) Since the index numbers come from a smaller domain, it follows that they are shorter than the original nucleotide sequences. So the compressed version as a whole is smaller than the original even if there is no obvious pattern to the encoding regions other than the fact that they have to encode proteins.

To decompress, you "merely" run the program to build the table of proteins, look up the indices in the table, and output the resulting string.

claw

Posts: 3
Joined: Nov. 2005

(Permalink)

Posted: Nov. 19 2005,02:25

This has turned into a fascinating discussion, but we seem to have lost lutsko on the way.

If you're still reading this, lutsko, the point I was trying to make at the start of all this is that "designedness" has no definitive test. All that you were doing was quoting probabilities (ie. the probability of a DNA sequence that encodes pi to 100 places), but that is not a good test of designedness for a number of reasons that have already been elucidated by other posts here.

Anyway, I hope this thread has helped to pique your interest.

regards,
Chris

PaulC

Posts: 18
Joined: Nov. 2005

(Permalink)

Posted: Nov. 19 2005,05:39

Quote

If you're still reading this, lutsko, the point I was trying to make at the start of all this is that "designedness" has no definitive test.

I agree with this statement (in case it was not clear).

All randomness testing can do is give you a probabilistic measure of your confidence that the data was not generated by the random distribution you are using as your null hypothesis. It says absolutely nothing about where the data came from. To say anything else, you need some other hypothesis about your data, and it needs to be testable. The notion that some mysterious intelligent agency put it there is not testable. ID insists on leaving the "designer" ill-defined and therefore by definition cannot be a science.

Sometimes you can make a testable hypothesis of a human cause, because we can come up with a model for what humans do and don't do. Then, conceivably, you could make predictions that would be supported or refuted by later evidence. This happens in archeology, for instance, in cases where it might initially be unclear if a chipped stone is a human-made tool. And you make predictions about what else you might find at the site and related ones.

I would still say that finding pi in some "reasonable" (sufficiently parsimonious in a rigorous sense) encoding would be reason to hypothesize human agency. That wouldn't apply to any compressible string, but just one that we know humans find interesting like pi. If it happened just once, though, your hypothesis would be untestable and therefore (as I agreed earlier) not a basis for a scientific theory.

lutsko

Posts: 8
Joined: Nov. 2005

(Permalink)

Posted: Nov. 19 2005,21:37

To claw, PaulC, et al,
I have been following the discussion with interest but have not felt compelled to add anything since you, especially PaulC, are doing an admirable job of fleshing out the thought experiment. I think the point stands that the program of ID could be legitimate science although I hasten to add, as one must, that ID as pushed by Dembski and friends is anything but scientific. I also think that our friends fighting the political fight need to be careful not to make catagorical statements to the contrary which they might find hard to defend.

claw

Posts: 3
Joined: Nov. 2005

(Permalink)

Posted: Nov. 20 2005,05:05

Dear lutsko,

With due respect, I don't think you do understand what has been going on in this forum.

You say "the point stands that the program of ID could be legitimate science" but you fail to understand that this is exactly what everyone else has been arguing *against*. If by your statement you mean that the existence of design in biology could be scientific, then that is not much of a "point" because everyone agreed long before you formulated it. We even gave examples (eg., bacteria being designed in the lab, which already happens). But that's not what you said. You said "the program of ID could be legitimate science." Now what program is that? There is only one program of ID, and that's "to reverse the stifling dominance of the materialist worldview, and to replace it with a science consonant with Christian and theistic convictions." Quote unquote.

Now I understand that you repudiate Behe and Dembski et al, and I think that's wonderful. BUT, you have to start thinking more about what you are saying and the way you phrase it. ID is not a cover-all term for any form of design in science. It very specifically refers to the program running out of the Discovery Institute. Even the Catholic Church, which believes very strongly in divine intervention in the evolution of humanity, refuses to identify itself with ID. Please choose your words more carefully or else we will continue to waste time pinning down exactly what you mean before we can respond meaningfully.

Also, we have *not* been helping you "flesh out the thought experiment" and the fact that you would say so is actually rather insulting. We came across to this forum because we assumed you wanted to discuss the matter and maybe learn something, not pontificate about the good job everyone else is doing on developing your pet project. I think your thought experiment is fundamentally flawed and I have been trying to explain why it won't work, not "flesh it out." I know you don't support the Discovery Institute view of ID, and I applaud that, but rejecting Dembski et al doesn't automatically make your thought experiment any better than theirs. There is more than one way to be wrong about evolution.

I am also discouraged by your failure to contribute to a thread *you* started to pursue a question *you* asked. Right at the moment I'd like to read a substantive post before I make any further effort in this thread. You could start by attempting an answer to the counter-questions I posed earlier.

regards,
Chris

PaulC

Posts: 18
Joined: Nov. 2005

(Permalink)

Posted: Nov. 20 2005,05:09

Quote

I think the point stands that the program of ID could be legitimate science

I think this is far from obvious, since nobody has yet come up with a way to "infer design" without assuming something about the properties of the designer. IDers as we know them are insistent about making no such assumptions.

The problem I've been criticizing is the conflation of "rejecting random cause with high confidence" and "inferring design." The two are not synonymous, and indeed there are rigorous techniques for rejecting randomness. (Google "randomness testing", better yet take a university course in it, or if you must, read the stuff I wrote above. I think I at least have the main ideas right).

Suggesting the two are synonymous works into the hands of people like Dembski, since he can misapply randomness-testing techniques to provide a rigorous-looking veneer to his claims of inferring design. There's nothing new to Dembski's tactic, since creationists have been using inappropriate probabilistic arguments for a long time. Dembski has simply done the best obfuscation job so far, misappropriating even very new and esoteric results to his cause.

Anyway, my point was to refute the categorical claim that every discovery of a pattern in data could be attributed to cherry picking. In fact, the presence of a low-complexity pattern in a much larger set of data can be inferred in a rigorous sense with no a priori assumptions except for the universal computational model in which patterns are described, and this only effects the constants in the analysis.

Quote

I also think that our friends fighting the political fight need to be careful not to make catagorical statements to the contrary which they might find hard to defend.

The problem is that once you've finished making a fully qualified statement about the proverbial the barn door, the cows may already be out stampeding. In other contexts, I'm comfortable with saying ID is claptrap and leaving it at that. There's no reason to ask if it could be a "real science", because the parts of it that aren't sheer baloney already are real sciences with peer-reviewed results.

There's more reason to believe that Pinocchio could become a "real live boy" some day. At least he learned his lesson after his nose started to grow.

ID as we know it is purely a political artifact. Nobody I can think of got into it out of an actual interest in the problem of inferring design. Certainly, Dembski would be just as happy saying "believe or be damned to ####" if he could get away with it. Behe is a little harder to peg, but it also seems that he started out with an unshakable belief that evolution cannot explain certain things and has been constructing his argument post hoc to defend it. That is not how science works.

	Antievolution.org :: Antievolution.org Discussion Board The Critic's Resource on Antievolution