|Wesley R. Elsberry
Joined: May 2002
Kirk Durston is back for more at Jeff Shallit's blog, this time once again going on at length about "functional complexity", but using PFAM data related to the RecA protein family instead of "ankyrin". At least, that's what he claims, though the current numbers at PFAM regarding RecA don't precisely match the numbers he gives, and in some cases there's a substantial discrepancy.
So far, it looks like Kirk has thoroughly micomprehended what numbers have to be plugged into Hazen's "functional complexity" equation or his own derivative to handle non-uniform distributions. Kirk was plugging in an estimate of number of mutations for the "total configurations" spot, and the number of proteins in the PFAM database for RecA as the "functional configurations" number. For the first, the number of mutations is not a limit on the diversity of sequences generated, since (1) not all mutations are point mutations and (2) other processes, like recombination, produce sequence novelty. For the second, protein databases generally catalog just one protein per protein family per species, ignoring almost all intraspecific protein variation. That means there's a whole repertoire of functional sequence diversity that goes unreported by the protein databases, and seriously biases any attempt to take database holdings as a compendium of such diversity. Kirk doesn't take any of those into account, which makes his "functional complexity" math an overblown "garbage-in, garbage-out" exercise.
"You can't teach an old dogma new tricks." - Dorothy Parker