PaulC
Posts: 18 Joined: Nov. 2005
|
I haven't read the other ABC comments, but I want to add something I think is pertinent from my late replies on the Panda's Thumb thread.
First, in response to the subject: No, ID is claptrap. However, the mathematically rigorous field of decryption could be viewed as a set of methods for discovering human agency in data, and this strikes me as a better framework in which to cast the "finding pi" thought experiment.
The reason I favor this view is that it gives a simple counterexample to the notion that "finding pi" is always dismissable as cherry picking. This is easier to follow than my previous points about restricting the family of encodings or detecting string compressibility, so I present an improved thought experiment. It's better than the original because it doesn't depend on ridiculous assumptions about finding things in DNA that you will almost certainly never find there.
Suppose you're employed in a counter-terrorism agency to analyze communications for suspect content. A message comes to you to analyze. You know nothing about the senders, but roughly, it is a short message "Attached is the bacterial gene sequence you requested. If you need any more help, feel free to contact our lab any time. Sincerely, [etc.]" And there is an attached file that looks like a DNA sequence.
Assume we've already ruled out that this is a bacteria of interest to putative terrorists. You're basically a math geek who knows no biology and are employed to find encrypted messages.
One thing to note here is that it's not always feasible to decrypt a single message in isolation. Decryption experts are most successful when they have a long series of communications to analyze. But in this case, you have some time to spare and you get lucky.
After running the usual kinds of frequency analyses and finding nothing that suggests the string is not random, you're about ready to put it aside and go to the next problem.
But, being a math geek, you decide to see what happens if you try each of the 24 different ways of interpreting the bases A,C,T,G as bit pairs 00,01,10,11 and convert the string into bits according to that code. For each of these 24 strings, you take a prefix of the binary digit expansion of pi, XOR it against the string at every starting position, and interpret the resulting string as ASCII encoded text. Note: the likelihood that you did all of this is pretty low, but it is a plausible hypothetical scenario.
On a modern computer, all the tests should run in a matter of seconds. You do a frequency analysis on the resulting strings, and one of them shows a substring that looks statistically like English plaintext written in ASCII. You zoom in for a closer look and it is a paragraph of about 1k of grammatical English text that appears as a result of decoding the gene sequence as A=10, C=11, G=01, T=00 and XORing the resulting string with a prefix of the bit expansion of pi starting at the beginning of the text. Thus, the text has been XORed with about 8k bits of pi starting from the beginning.
Question 1: At this point do you dismiss your find as cherry picking or do you consider it more likely that the English text is an encrypted message put there by human agency and include this in a report as a significant finding?
Question 2: If your answer to the above is the former (dismiss as cherry picking) than do you believe that this "XOR with pi" cipher is a strong encryption method suitable for sending encrypted messages that will not be cracked by suitably motivated decryption experts?
|