The creationist argument about information theory often relies on the analogy of DNA with language. It is often phrased such that things like languages which contain information cannot come from random processes, they must be designed. Rather than refer to examples that show an information increase during evolutionary processes, or argue about a definition of information, I would like this thread to be focused on a simple point: DNA is not a typical language.
I tend to agree with this analogy, only adding in that exons act as periods. For clarification, a codon is a set of three nucleotide bases which codes for an amino acid. A gene is a series of codons which produces a string of amino acids, or a protein. A protein is a molecule which performs various functions. (These are layman definitions)
DNA is a language with 4 letters (ACGT), meaning that there are 4^3 different words or codons that can be made, or 64 different words. Let me also point out that each codon refers to an amino acid, except for the exons.
Many creationist arguments procede as follows. Take a word, like evolution Perform some random substitution: evolhtion Now you have something which is nonsensical, it has lost meaning and is not readable.
In these cases, what is being performed is a point mutation, a substitution of a single nucleotide base for another. Let's carry the analogy. Say we have this codon: ACG It codes for Threonine. By a point mutation, we get. GCG which codes for Alanine. http://www.geneticengineering.org/chemis/Chemis-NucleicAcid/RNA.htm This link has a chart that gives a complete list of how codons code for amino acids.
In other words, we imposed a random change in the codon, and meaning is still preserved. The important point is that this was not a carefully chosen example. All words or codons contain meaning in the language of DNA: a point mutation can never produce meaningless codons, or degrade genetic information.
One possible objection: point mutuations often lead to non-functional proteins, thus indicating a loss of information. Counter: But information is not actually lost or degraded. Each codon still has meaning, and the sentence as a whole has meaning- it still produces a protein, if a non-functional one. This is where the analogy really breaks down. The language of DNA is not simply information, it has a physical significance- it produces proteins.
DNA can only be viewed as a set of instructions for making proteins rather than any old smalltalk, though even here the analogy breaks down. DNA doesn't tell something else how to build a protein, it is directly involved in the building process. If only the instructions to my new ikea table set did that!
Additionally, the instructions have not lost meaning when a point mutation produces a non-functional protein. The instructions are just as clear as before, "build this list of amino acids in a chain." The language is still just as readable. The instructions just simply don't work.
Point mutations in the English language produce nonsensical words. Point mutations in the genetic language never produce nonsensical words, they always produce a changed but still meaningful phrase. Thus, treating DNA as a language like English is silly.
Words in natural language generally have an arbitrary relationship to the thing that they signify. So we have "dog" in English, but "hond" in German, "chien" in French, etc. It doesn't matter what word we use, as long as the community agrees what each word-sound signifies. In human language, what guarantees the relationship between signifier and signified is a consensus over correct word use. At the level of the sentence, meaning is generated by consensual rules of grammar.
In the case of DNA, on the other hand, the "meaning" of a triplet of nucleotides is a physical property of the structure of transfer RNA molecules. The DNA triplet "CTT" codes for the amino acid leucine because the transfer RNA molecule which bears leucine has the nucleotide triplet "GAA" in its anticodon region, and the connection between the signifier and signified is guaranteed by the physical existence of hydrogen bonds between complementary bases. In natural language, the connection between signifier and signified is not guaranteed by a physical property of the universe in this way.
What difference does grammar make? In human language, the sentence "I like taking my dog for a walk" is well-formed and meaningful, while the 'mutated' version, "I like taking my dog for a", is badly-formed and meaningless, because the interrelationships between component words do not follow consensual rules of grammar.
In DNA, on the other hand, there are no grammar rules, so "ACTGAGACC" is just as meaningful as "ACTGAG". This difference is rooted in the fact that sentences in language are statements about the world. The sentence "I like taking my dog for a walk" is a statement about the world. The question "where are the bananas?" is a statement to the effect "I want you to tell me where the bananas are". The DNA sequence "ACTGAGACC", in contrast, is not a statement about anything.
Structural constraints on the information content of DNA sequences (for example, the rule that each codon must consist of three nucleotides) should not be confused with grammar because these constraints are not related to the meaning of the DNA sequence. This is clear if you imagine a language consisting only of three-letter words, such that each word is an analogy for a codon. This makes no difference to the grammatical rules of language: "the dog ate ham" is a meaningful sentence while "the ate ham dog" is meaningless. In DNA on the other hand, we can switch around codons as much as we like without ever breaking a grammatical rule and hence generating an "invalid" nucleotide sequence.
The analogy does not work even for stop codons, which might be taken at first glance as corresponding to full stops. If we introduce a full stop into the sentence "the dog ate ham", we might get something like "the." which does not mean anything. If we introduce a stop codon into a DNA sequence we will get a shorter but always grammatically valid protein.
So, for me, it is the question of grammar that is the key difference between DNA and human language. Meaning in human language is mediated by grammar and consensus, while in DNA it is guaranteed by the physical structure of the molecules involved. Meaning can be lost in natural language when sentences are poorly formed. DNA sequences cannot be poorly formed because there is no grammar to define what is "poor".
I suppose that a creationist would say that a DNA sequence corresponding to a nonfunctional protein (or a protein with reduced function) is defined as "poor" based on its effect in the living organism. But to preempt this argument i would point out that poor performance of a protein does not equate to "loss of information", any more than the sentence "I want you to tell me where the bananas are" contains less information when it is ineffective and nobody tells you where to get the fruit.
There is another difference. Having two copies of the set "Where are the bananas?" adds no new information in English. However in genetics, genes and DNA sequences are also raw materials. Having multiple copies of the same sequence does add functionality. It provides additional raw materials that can be modified to produce new traits without losing the function of the original.
The IDists and Biblical Creationists love to try to make comparisons to language, but when they are examined all that I have seen simply do not stand up to examination. Even something as simple as word order is specific to a given language so that in Latin the phrase "Cat house" has much the same meaning as "House cat".
But something else is important to note here about genes. It is the way the genetic code for a protein is stored on DNA. Notice that the exons, as they appear linearly on DNA, have no structural resemblance to the strings of amino acids, as they appear in proteins. This clearly means that one pattern holds a code (digital information) that can be transcribed into another pattern (structural protein). In this respect, genes are not “blueprints” of proteins, because genes look nothing like proteins. Instead the DNA holds only coded messages for proteins. Translating the code into a protein requires rules of language; a simple language, but a language nonetheless. This is why geneticists are utterly dependent upon their Genetic Dictionary; for them, “language” is not a metaphor.
Translating the code into a protein requires rules of language; a simple language, but a language nonetheless.
I'm not sure I consider the arbitrary mapping of symbol to symbol to be a language. On the keyboard I am using to type this, I know that if I press the second key on the fifth row a letter "a" will appear on the screen. This relationship is largely arbitrary (I know the keyboard was designed so that frequently used letters were in easy-to-locate positions but the "a" key could easily swap places with the "l" key without the keyboard being useless). Yet I do not believe that the relationship between my keyboard and the text input box of this web form constitute a "language", despite the fact that a mapping of key position to letter exists.
Similarly, ants which recognize their own species by its scent (an arbitrary mapping of aroma to kinship) are not using language in my view.
The mapping of codon to amino acid falls into the same category, I think. However I'm not a linguist, perhaps they consider this kind of mapping to be a language of sorts.
As a separate point, you are using code and language interchangeable. Is this justified? The morse code is definitely a code, not a language. Yet the morse code has no meaning on its own. We can perform a 1 to 1 mapping of the code onto English, and thus get a phrase with meaning. But the morse code only has meaning because it refers to a language.
I think this is the point mick is getting at. DNA does seem to be a sort of code. But what does this code translate into? Certainly not a language. It translates into a set of amino acids, which form a protein. Take the gene for dark skin, for instance. Does your skin "decide" to become dark after "reading" the DNA? No, it produces a protein which produces dark pigments in your skin. (I'm not sure if this is exactly how skin tone is determined, but I would guess that it is something like this, as are many other traits) In this case, there is no exchange of some immaterial information, the process is very physical. You can think of the trait as being contained in the code of DNA, but that analogy can only go so far.
You hear evolutionist says we are descedant from apes and monkees. Sure, but that's not the point. All of life is related, not just human's with monkees. If you hug a tree, you're hugging a relative, a very distant relative, but a relative nonetheless." Dr. Joan Roughgarden in Evolution and Christian Faith
Aoccdrnig to a rscheearch at an Elingsh uinervtisy, it deosn't mttaer in waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht frist and lsat ltteer is at the rghit pclae. The rset can be a toatl mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae we do not raed ervey lteter by itslef but the wrod as a wlohe.'
This is a demonstration that (written) language is actually very 'sloppy'. It is highly redundant and full of extraneous elements. This is actually very important to the utility of the language, particularly in the face of typos, transpositions and other insults. This is equally true of the DNA code and translation machinery. Creationists are fond of stating or implying that a protein has to have an exact sequence of amino acids to perform its function and that any deviation from this sequence renders the protein ineffectual.
I often wonder how many of these creationists are diabetics. If they are diabetics, they most likely take daily insulin shots. The insulin they inject comes from cows or sheep from the slaughter house. (I also wonder how many animal rights activists are diabetic and use insulin from slaughtered animals.) The insulin from sheep and cows does not have the exact same sequence as human insulin and each differs from human insulin, as well as from each other, in several amino acids. But they are perfectly functional for their intended purpose. Insulin is used in control of glucose metabolism across the entire range of animal species, and if you compare the AA sequences between several distantly related species, what molecular biologists call a 'homology map', you find that only a small fraction of the the amino acids are common. There is even a considerable range of lengths to the insulin chain amongst the various species.
In very strong support of the evolutionary hypothesis is that the differences in the insulin sequences between species (and that of all other proteins) is comparable to their 'evolutionary' distance based on morphology and paleontology. In fact, these differences are use to estimate when species diverged along different evolutionary branches, and these estimates are in good general agreement with other dating methods.
Maybe I’m pushing it a bit too far to insist that the genetic code is a language. After further consideration I got tripped up on this matter of syntax. In this regard, two leading evolutionary biologists, John Maynard Smith & Eörs Szathmáry, have something worthwhile to say about language and syntax (from The Origins of Life, 1999, p. 169):
The analogy between the genetic code and human language is remarkable. Spoken utterances are composed of a sequence of a rather small number of unit sounds, or phonemes (represented, at least roughly, by the letters of the alphabet). The sequence of these phonemes first specifies different words, and then, through syntax, the meanings of sentences. By this system, the sequence of a small number of kinds of unit can convey an indefinitely large number of meanings. The genetic message is composed of a linear sequence of only four kinds of unit. This sequence is first translated, via the code, into a sequence of 20 kinds of amino acid. These strings of amino acids fold to form three-dimensional functional proteins. Through gene regulation, the right proteins are made at the right times and places, and an indefinite number of morphologies can be specified.
Thus in both systems a linear sequence of a small number of kinds of unit can specify an indefinitely large number of outcomes. But there is one respect in which the two systems cannot usefully be compared. In language, the meanings of sentences depend on the rules of syntax. These rules are formal and logical. In contrast, the ‘meaning’ of the genetic message cannot be derived by logical reasoning. Thus, although the amino acid sequence of the proteins can be simply derived from the genetic message, the way they fold up to form dimensional structures, and the chemical reactions that they catalyse, depend on complex dynamic processes determined by the laws of physics and chemistry. It does not seems possible to draw a useful comparison between the way in which meaning emerges from syntax, and that in which chemical properties emerge from the genetic code.
Giving fair weight to their respectable opinions, I will have to re-think or give up my position of a genetic “language.” What remains, however, is this troubling absence of ANY physical or chemical principles that account for the formation of the genetic code in the first place.
I know that you were just using insulin as an example but I thought I should correct some of what you said. Insulin is no longer harvested from animals but produced recombinantly in e.coli. Also this insulin is often not "human" but an engineered varient. For example I take two forms of insulin, one for rapid release "NovoRapid" and one for slow release. The slow release has the human sequence but the "rapid" contains a point mutation which inhibits dimer and hexamer formation and allows the protein to leave the circulation more rapidly. On a more general note I notice that everyone keeps using the example of protein coding as if this is all DNA dose. In the human the vast majority of DNA dose NOT code for protein. The codon coding is the closest DNA comes to forming a linear digital code. An example of where the DNA code can be thought of as more of an anologue rather than digital code is in transcription factor binding sites. Here the "code" is bound based on its "shape". The binding protein can recognise varients of the sequence and will bind more or less strongly and so will activate/deactivate transciption depending on the concentration of the transcription factor.
In language, the meanings of sentences depend on the rules of syntax. These rules are formal and logical. In contrast, the ‘meaning’ of the genetic message cannot be derived by logical reasoning.
I have to disagree with Maynard Smith & Eörs Szathmáry, although I do so with trepidation since i have an almost reverential respect for these two. One of the fastest growing and most important fields in molecular biology and genetics is called bio-informatics. One of the major thrusts of this field is to use a set of very high power computer programs to scan through DNA codes and do exactly what Maynard Smith & Eörs Szathmáry say cannot be done: derive by logical reasoning the genetic content of that code, the gene interactions, and to some extent the structure of the encoded proteins of the DNA sequence. This is possible precisely because the DNA code does have a fairly rigorous syntax and a lot of the current work in genetics is devoted to deciphering that syntax.
The human genome has about six billion base pairs (the As, Ts, Gs, and Cs) encoding about 20,000 genes. The genes average about 1000 base pairs to encode the typical 300 amino acid protein. So, only 20 million of the six billion base pairs, or about 0.3 % of the DNA is used to encode proteins. Well, that isn't quite true. Many of the genes occur multiple times, there is also DNA sequences near these genes that regulate when the gene is expressed, several RNA molecules of various types are also encoded, and there are a few other short functional sequences. Still, only about 1.5 % of the genome is used. Researchers have been able to excise DNA segments several million base pairs long from the genomes of fertilized mice eggs and the resultant adult mice appeared normal in all respects. In addition, that typical 1000 base pair gene sequence is hardly ever in one contiguous piece. It is almost always broken into several pieces, called exons because they are the expressed parts (that is, they provide the code for the protein that results from that gene) , separated by non-coding base pair sequences, called introns, that can be up to several 1000s of base pairs long. (Note that the OP is in error on this point. An intron is not like a period. That is the function of the tree stop codons. An intron is like what you get when your cat walks across your keyboard while you're typing.)
If you read an original paper on the sequencing of the DNA for some critter, once you get past the pages listing all the authors of the paper, which often number over 100, you will find charts and diagrams listing the number of genes, the numbers of the various types of proteins and RNAs encoded, how they are controlled, where they are located in the sequence, where the entrons are, and a lot of other information. How was this determined? By understanding(applying logical reasoning to) the grammar and syntax of the code. The entrons have specific sequences at either end the couple to special complexes called splicosomes so they can be removed from the each gene and the exons joined to make the final messenger RNA. Genes have special initiator and terminator sequences and are controlled by regulatory regions with specific characteristics (that, for example, allow certain proteins called transcription factors to latch onto them to initiate or inhibit gene transcription).
You will also find enumerated a list of "pseudogenes" that have the structure of genes but no initiator and so are never expressed, as well as genes that were spliced into the genome by a virus or a bacteria at some time in the species history. (The human genome has 139 such insertions.) How can they determine that? Amongst other indicators, virus and bacterial genomes tend to be rich in C-G pairs (more that 50% and less than 50% A-T pairs) while eukariotic genomes are slightly rich in A-T pairs. So, while all species speak the same 'language', there are definite dialectic differences.
Much more can be said about this exciting field, such as how an experienced researcher can determine a great deal about the structure of the encoded protein by examining the DNA code, but the point is that the 'language' analogy goes very deep and includes syntax and grammar as well as alphabet and words. These rules are so functional and powerful that a genome is more poetry than prose.
Finally, I must confess that I don't have any background in this area, I'm mere a fascinated bystander with his nose pressed against the window, so I would greatly appreciate any one more knowledgeable correcting or amplifying what I have posted.