In other words, just because researchers identify function for, say, duplicated pseudogenes doesn’t mean all pseudogenes possess function. However, it would be natural in other scientific investigations to assume that if a particular property has been identified for a representative sample, then the entire system possesses that property. Yet when reviewing the ENCODE Project, some evolutionary biologists are eschewing this common practice. Many of the human DNA sequences to which ENCODE assigned function reside within regions of the genome long thought to be junk DNA. It seems Doolittle and others are reluctant to conclude that a part represents the whole due to a pre-commitment to the belief that junk DNA arises via evolutionary processes.
According to the evolutionary paradigm, junk DNA sequences originate when random physical, chemical, and biochemical processes convert sequences from functional to nonfunctional. But this is not the end of the story. Many biologists believe evolutionary processes will make use of junk DNA sequences occasionally, altering them into something useful via a process called neofunctionalization. As Doolittle puts it, “It is surely inevitable that evolution, that inveterate tinkerer, will have sometimes co-opted some TEs [transposable elements, a class of junk DNA] for such purposes.”4 It seems many evolutionary biologists cannot accept the ENCODE data due to their belief that evolutionary processes would not convert all members of a class of junk DNA into functional elements.
The results of the ENCODE Project challenge this evolutionary perspective. The ENCODE team performed a large number of assays, systematically surveying the human genome and cataloging the functional sequences. They didn’t simply identify a few examples of function in particular members of a junk DNA category and then conclude the whole class must be functional. Instead, they identified, one by one, members of a sequence elements group that displayed function.
So, in effect, Doolittle’s complaint holds no weight. It also fails to take into account other work—such as research involving pseudogenes—that not only identifies function for individual members of this junk DNA class, but also presents an elegant framework to explain the function of all members of the category. (Go here and here to read about the competitive endogenous RNA hypothesis as a comprehensive model for pseudogene function.) This type of advance coheres nicely with the catalogue of functional elements ENCODE identified.
Conflating Biochemical Activity with Function?
Graur’s group, Doolittle, and researchers Deng-Ke Niu and Li Jiang all complain that the ENCODE Project confused biochemical activity with function. This was, in fact, one of the first critiques leveled against the ENCODE Project. As I explained in a previous article, the ENCODE Project measured biochemical activity known to play a role in gene regulation. However, the ubiquity of this objection makes it worth a more comprehensive response.
ENCODE detractors might concede that the biochemical assays ENCODE researchers carried out did indeed measure activity related to function—but these skeptics would still maintain that not all the activity measured is actually functional. For example, the ENCODE Project determined that about 60 percent of the human genome is transcribed to produce RNA. ENCODE skeptics would argue that not all of these transcripts possess function. In fact, they might say that most are nonfunctional. Graur’s group asserts that “some studies even indicate that 90% of transcripts generated by RNA polymerase II may represent transcriptional noise.”5
To me, this criticism of ENCODE seems motivated by a strong commitment to the evolutionary paradigm. In other words, the experimentally generated ENCODE results don’t square with the expectations of the theory of biological evolution; therefore, the ENCODE results must be wrong. This is an example of theory-dependent reasoning, in which the theoretical framework holds more sway than the actual experimental and observational results. ENCODE skeptics’ commitment to the evolutionary paradigm is so strong it appears that they unwittingly abandoned one of science’s central practices: experimental results dictate a theory’s validity, not the other way around.
These criticisms ignore two important points: (1) biochemical noise costs energy; and (2) random interactions among genome components would be highly deleterious to the organism.
Let me illustrate the first point by focusing on transcription. From my vantage point, it is reasonable to conclude that the transcripts produced from the human genome are, by and large, all functional. First, researchers know the identity of these transcripts. The various RNA molecules transcribed from the human genome all play a role, either direct or indirect, in protein synthesis or in gene expression regulation. So, on this basis alone, it is reasonable to suspect that most of the transcripts possess these functions.
But more importantly, transcription is an energy- and resource-intensive process. Therefore, it would be untenable to believe that most transcripts are mere biochemical noise. Such a view ignores cellular energetics. Transcribing 60 percent of the genome when most of the transcripts serve no useful function would routinely waste a significant amount of the organism’s energy and material stores. If such an inefficient practice existed, surely natural selection would eliminate it and streamline transcription to produce transcripts that contribute to the organism’s fitness.
Graur and his colleagues, along with Niu and Jiang, would argue that most instances of transcription factor binding to DNA are also nothing more than biochemical noise. Graur’s group claims that most transcription factor binding occurs by chance. Each transcription factor latches on to a particular motif (pattern) within a relatively short DNA sequence. These motifs occasionally appear in a random sequence. Graur and his colleagues argue that nonfunctional transcription factor binding sites are dispersed randomly throughout the human genome and account for most of the binding interactions.
Apart from energetics considerations, this argument ignores the fact that random binding would make a dire mess of genome operations. In fact, other studies indicate that protein surfaces are designed to minimize so-called promiscuous (random) interactions. (Go here and here to read articles I wrote on the extent to which biochemical systems go to avoid unwanted protein-protein interactions in the cell’s interior.) Without minimizing these disruptive interactions, biochemical processes in the cell would grind to a halt. It is reasonable to think that the same considerations would apply to transcription factor binding with DNA.
While it’s true that biochemical activity doesn’t necessarily equate to function, the ENCODE researchers appear to have gone to great efforts to ensure that they measured activity with biological meaning. The idea that activities associated with the genome—such as the transcription of the genome, methylation of DNA, modification of histones, binding of transcription factors, and others—are mostly noise borders on the ridiculous because it ignores well-established principles of biochemical operations.
Squaring with the C-Value Paradox?
Both Doolittle and researcher Sean R. Eddy protest that ENCODE’s results don’t make sense in light of the C-value paradox. This conundrum traces back to the early days of molecular biology. Scientists observed that the nucleus of each cell type within a particular organism contained a constant amount of DNA. Therefore, biochemists refer to this amount of DNA as the C value (C for constant).
Initially, researchers expected the amount of DNA to correlate with an organism’s biological complexity. Yet studies revealed that no such relationship existed. Some relatively simple organisms possess a larger C value than do more complex organisms. To resolve this paradox, molecular biologists proposed that the majority of an organism’s genome consists of DNA that doesn’t code for proteins or regulate gene expression. Researchers concluded that the non-coding DNA served no real purpose. They viewed it as vestiges of evolutionary processes, or junk.
However, if the ENCODE Project’s conclusion that most, if not all, of the human genome contains functional DNA is valid, then the genome contains very little junk DNA. According to Doolittle, “If the human genome is junk-free, then it must be very luckily poised at some sort of minimal size for organisms of human complexity.”6 From an evolutionary perspective, all the different classes of junk DNA would have to evolve new function to make the ENCODE Project’s conclusion possible. Doolittle states that we would be the “first among many in having made such full and efficient use of all of its millions of SINES and LINES (retrotransposable elements) and introns to encode the multitudes of lncRNAs and house the millions of enhancers necessary to make us the uniquely complex creatures that we believe ourselves to be.”7 For Doolittle, the absurdity of this prospect means the ENCODE Project’s results cannot be correct.
In light of the C-value paradox, the ENCODE results would mean that less sophisticated organisms with larger genomes (compared to humans) must also possess more functional elements. But such a scenario makes no sense—at least from an evolutionary perspective. Yet it is possible to account for the larger genomes in organisms less complex than humans. It may be that the excess DNA plays a role other than coding for proteins and regulating gene expression. A number of studies, for example, indicate that DNA dictates the size of the cell nucleus. (To read more about this idea go here and here.)
To me, this criticism of ENCODE seems motivated by a strong commitment to the evolutionary paradigm. In other words, the experimentally generated ENCODE results don’t square with the expectations of the theory of biological evolution; therefore, the ENCODE results must be wrong. This is an example of theory-dependent reasoning, in which the theoretical framework holds more sway than the actual experimental and observational results. ENCODE skeptics’ commitment to the evolutionary paradigm is so strong it appears that they unwittingly abandoned one of science’s central practices: experimental results dictate a theory’s validity, not the other way around.
ENCODE’s Validity
Despite these latest criticisms, I see no real scientific reason to dismiss the ENCODE Project’s results. Careful consideration reveals that the objections have more to do with philosophy than science. The ENCODE skeptics seem to feel that the ENCODE results must be wrong because they don’t line up with key concepts of the evolutionary paradigm. The ENCODE skeptics even depart from standard scientific practices to maintain their commitment to evolution in the face of the ENCODE discoveries.
The ENCODE Project’s conclusions—namely that at least 80 percent of the human genome is comprised of functional DNA sequences—remain valid evidence for elegant design, befitting the work of a Creator, in the human genome and, by extension, the genomes of other organisms.