Have you ever thought about which characteristics are critical for an infant’s development? By observing whether an infant is breathing, cooing, eating, sleeping, crying, and responding to movement and sounds, we can project that the child will continue growing. But at such an early stage, no one can predict what kind of teen or adult the infant will become. One can speculate, but no one can really say what will pan out. We must wait to see.
In a similar way, we are only in infancy in understanding and unraveling the human genome. So I was stunned recently to hear a research presentation in which the speaker claimed that we are in a “post-genomic era.” In presenting metagenomic data, he emphasized our ability to collect DNA from every accessible environmental niche, revealing vast numbers of highly diverse organisms. The amount of DNA data generated is so staggering, it requires computer algorithms for extremely challenging sorting and assembly—a challenge akin to combing through a field of haystacks for needles and other unknown objects. It is exciting to think of the things we will discover as we unravel these challenging and complex DNA puzzles, but it is ludicrous to assert that we are in a post-genomic era just because we face more challenging DNA assembly problems.
After the presentation I chatted with the speaker, and he agreed that we are not yet in a post-genomic era but are just beginning to unravel the complexity of the human genome.
While working on a current research project, I was surprised to find that large chunks of the human genome are still inaccessible to today’s sequencing and assembly methods. More than 10 percent of the human genome remains unsequenced, unassembled, and omitted from genomic comparisons. Further confirmation emerged in a recent review of heterochromatin and gene transcription in mammals.1 The article highlighted the still enigmatic elements of the human genome that have yet to be thoroughly sequenced and assembled.
“But wait!” you might be thinking (as I did), “Didn’t we finish sequencing and assembling the human genome in 2004? Haven’t we sequenced numerous genomes since then? Can’t I get my own genome sequenced for $1,000 or less?” Well, no—not quite.
The portions of the human genome left unassembled and unanalyzed are due in part to the difficulties in accessing DNA in constitutive (tightly packed and highly repetitive) heterochromatin (approximately 8 percent) and other highly repetitive sequences (more than 2 percent).2 Highly repetitive sequences make the assembly of full genomes from very short sequence reads extremely difficult, if not impossible. Although newer sequencing technologies are allowing much longer reads and assembly through highly repetitive sections of well preserved DNA,3 these are still inapplicable for tightly bound (inaccessible) heterochromatin.
The gaps remaining in the human genome are significant. It is shortsighted to claim that because something is inaccessible to scrutiny, it lacks significance. Nevertheless, neo-Darwinian dogma predicts that large portions of complex genomes are useless remnants of evolution. They say that these “artifacts of evolution” point to species relatedness via common descent, but are otherwise insignificant except for possible contributions to sequence-independent structural functions. This prevailing dogma is why these portions of unassembled, unanalyzed genomic segments are disregarded in genomic comparisons and not even mentioned in the vast majority of corresponding publications.
It seems intuitive to at least consider the possibility that these inaccessible and unanalyzed portions of hominid genomes may account for significant differences between the species and among individuals in a given species.
Keep in mind that published genomes are mere genetic snapshots of fully developed, single individuals captured in time. Chromatin structural changes occur during development and in various tissues and physiological states. These changes indicate that “inaccessible” chromatin captured in a snapshot may once, or at various times, contribute to gene expression and regulation. The human being is an extremely complex entity of multiple diverse subsystems in near-constant flux. A single snapshot cannot begin to capture the great variation within a single individual, let alone within a highly diverse species. If we imagine that much of what happens early in a series of contingent events is critical in effecting ultimate outcomes, then segments of human genomes that may be more accessible during gestation or infancy, but which are later sequestered and largely silenced, might be of critical significance and great interest.
As we begin to unravel more data in this new era of deep sequencing and metagenomic, proteomic, and transcriptomic analyses, we must strip ourselves from neo-Darwinian paradigms that would restrict our ability to explore. We need humility to acknowledge that we are still in the infancy stage of understanding our own genomes. Caution is warranted before making broad, sweeping conclusions when comparing the human genome with other species’.
One of the hallmarks of science is that much of what has been studied before is being continually reevaluated and challenged by new data. Improving methods and recent discoveries render some previous conclusions less significant and, at times, incorrect. New discoveries of critical elements in these omitted stretches of our DNA will continue to be made.4 I believe that as we unpack our genomes more fully, even as challenges to common descent arise, we will see increasing signs of common design when comparing highly divergent species’ genomes. The progressive creation model at RTB, grounded in the Christian Scriptures and committed to scientific inquiry, actually contributes to advancement by providing a rationale that encourages us to look deeper into the human genome to gain a better understanding of human uniqueness.
- Nehmé Saksouk, Elisabeth Simboeck, and Jérôme Déjardin, “Constitutive Heterochromatin Formation and Transcription in Mammals,” Epigenetics & Chromatin 8 (January 2015), doi:10.1186/1756-8935-8-3.
- In humans, this includes centromeric alpha satellite regions (mega-base scale, high-order repetitive sequences), unique pericentric sequences not represented in human genome drafts, and telomeric and ribosomal regions.
- The PacBio RS II system allows very large fragments of DNA to be sequenced (up to 10,000 bp) which can provide nonheterologous sequences at the ends of long, highly repetitive sequences, which helps in fitting these segments of the human genome into the proper gaps. It can also help determine specific copy number variations which differ from human to human and from chromosome to chromosome, even in an individual. However, it also requires relatively large amounts of high-quality DNA so is not applicable for ancient-DNA sequencing.
- Megan E. Aldrup-MacDonald and Beth A. Sullivan, “The Past, Present, and Future of Human Centromere Genomics,” Genes 5 (January 2014): 33–50, doi:10.3390/genes5010033; Glenn A. Maston, Sara K. Evans, and Michael R. Green, “Transcriptional Regulatory Elements in the Human Genome,” Annual Review of Genomics and Human Genetics 7 (September 2006): 29–59, doi:10.1146/annurev.genom.7.080505.115623.