Error Control Coding in Biology Implies Design, Part 2 (of 5)

Error Control Coding in Biology Implies Design, Part 2 (of 5)

In part 1 of this series we learned how the genetic system is an information-processing system, and outlined several reasons why we could expect to find coding techniques in play to protect the genetic data. Such coding techniques are known and used by engineers to protect the data processed by many modern digital communications systems.

We now turn our attention to a few analogies of such coding techniques.

Analogy: Optimality of the Genetic Code and Gray Mapping

The first analogy will show from a qualitative and quantitative perspective that the genetic code is in fact an optimal (or near optimal) mapping from codons to amino acids. (See here for a table describing the genetic code.)

The genetic code seems optimized to the specific nucleotide error probabilities quite well, as is the case for a good code from an engineering perspective. For example, the first and third nucleotides of a codon (see here) are more likely to be misread during translation, and this error appears to be taken into account in the genetic code mapping. These most common errors, or mutations, translate the desired codon into a codon that codes either for the same amino acid, or for an amino acid that has very similar physicochemical properties, thus minimizing function loss. This is similar to Gray codes used in digital communications.

More specifically, the genetic code seems to be specifically designed to code for the same or very similar amino acids for the most common types of substitution mutations (errors), thereby minimizing protein function loss. In like fashion, Gray codes used in engineering are specifically designed to code for the most similar bit patterns for the most common types of symbol errors, thereby minimizing information loss.

I noticed the similarity to Gray coding after reading a paper by Dr. Fazale Rana in 2002. The Gray code interpretation was highlighted by Manish Gupta in a paper published in 2006. Gupta plotted the 64 codons used in the genetic code in terms of nucleotide distance (see Figure 3 here), and remarked on the correspondence to Gray codes used in engineering. The concept of nucleotide distance and the illustrated plot establishes the validity of the Gray map interpretation.

Recall from part 1 that many genetic code mappings are possible due to the high level of redundancy. Therefore, from a qualitative perspective, and from an engineering perspective, the genetic code is superb, perhaps much better than one may expect from a naturalistic perspective.

Recent work shows just how remarkable the natural code is. (See here and here.) Researchers studying the error-minimizing properties of the genetic code noticed that prior work concluded that the natural code ranked in the top 0.02 percent for efficiency, but that the prior work overlooked bias in mutations.1 When this bias is taken into account, the natural code makes a radical leap forward from the top 0.02 percent to literally one in a million.

Dr. Fazale Rana comments further on the error-minimizing properties of the genetic code:

The genetic code’s error-minimization properties are actually more dramatic than these results indicate. When researchers calculated the error-minimization capacity of one million randomly generated genetic codes, they discovered that the error-minimization values formed a distribution where the naturally occurring genetic code’s capacity occurred outside the distribution. Researchers estimate the existence of 1018 possible genetic codes possessing the same type and degree of redundancy as the universal genetic code. All of these codes fall within the error-minimization distribution. This finding means that of 1018 possible genetic codes, few, if any, have an error-minimization capacity that approaches the code found universally in nature. 2

In summary, qualitative and quantitative evidence suggests that the natural genetic code is highly optimized and, in fact, tuned to the most common type of errors (mutations). In addition, this work highlights an underlying analogy between the genetic system and modern communications systems—the so-called Gray code.

The next article in this series will look at another coding analogy between modern digital communications systems and the genetic information-processing system.

Keith McPherson

Keith McPherson received his Master of Science in Electrical Engineering from Georgia Institute of Technology in 1993, and currently works as an electrical engineer in Melbourne, FL, in the fields of communications and signal processing.

Page 1 | Page 2 | Page 3 | Page 4 | Page 5
  1. Bias includes the fact that not all codons are equally mistranslated to other codons, and that certain nucleotide positions within the codon are more prone to error. Purine/purine and pyrimidine/pyrimidine errors (transition mutations) are more common than purine/pyrimidine errors (transversion mutations), and the ranking of the positions is 3rd, 1st, and 2nd in terms of being more error prone.

  2. Fazale Rana, “FYI: I.D. in DNA; Deciphering Design in the Genetic Code,” Facts for Faith, Quarter 1, 2002, 14-23.