Maynard Smith & Ers Szathmáry writes:
In language, the meanings of sentences depend on the rules of syntax. These rules are formal and logical. In contrast, the ”meaning’ of the genetic message cannot be derived by logical reasoning.
I have to disagree with Maynard Smith & Ers Szathmáry, although I do so with trepidation since i have an almost reverential respect for these two. One of the fastest growing and most important fields in molecular biology and genetics is called bio-informatics. One of the major thrusts of this field is to use a set of very high power computer programs to scan through DNA codes and do exactly what Maynard Smith & Ers Szathmáry say cannot be done: derive by logical reasoning the genetic content of that code, the gene interactions, and to some extent the structure of the encoded proteins of the DNA sequence. This is possible precisely because the DNA code does have a fairly rigorous syntax and a lot of the current work in genetics is devoted to deciphering that syntax.
The human genome has about six billion base pairs (the As, Ts, Gs, and Cs) encoding about 20,000 genes. The genes average about 1000 base pairs to encode the typical 300 amino acid protein. So, only 20 million of the six billion base pairs, or about 0.3 % of the DNA is used to encode proteins. Well, that isn't quite true. Many of the genes occur multiple times, there is also DNA sequences near these genes that regulate when the gene is expressed, several RNA molecules of various types are also encoded, and there are a few other short functional sequences. Still, only about 1.5 % of the genome is used. Researchers have been able to excise DNA segments several million base pairs long from the genomes of fertilized mice eggs and the resultant adult mice appeared normal in all respects. In addition, that typical 1000 base pair gene sequence is hardly ever in one contiguous piece. It is almost always broken into several pieces, called exons because they are the expressed parts (that is, they provide the code for the protein that results from that gene) , separated by non-coding base pair sequences, called introns, that can be up to several 1000s of base pairs long. (Note that the OP is in error on this point. An intron is not like a period. That is the function of the tree stop codons. An intron is like what you get when your cat walks across your keyboard while you're typing.)
If you read an original paper on the sequencing of the DNA for some critter, once you get past the pages listing all the authors of the paper, which often number over 100, you will find charts and diagrams listing the number of genes, the numbers of the various types of proteins and RNAs encoded, how they are controlled, where they are located in the sequence, where the entrons are, and a lot of other information. How was this determined? By understanding(applying logical reasoning to) the grammar and syntax of the code. The entrons have specific sequences at either end the couple to special complexes called splicosomes so they can be removed from the each gene and the exons joined to make the final messenger RNA. Genes have special initiator and terminator sequences and are controlled by regulatory regions with specific characteristics (that, for example, allow certain proteins called transcription factors to latch onto them to initiate or inhibit gene transcription).
You will also find enumerated a list of "pseudogenes" that have the structure of genes but no initiator and so are never expressed, as well as genes that were spliced into the genome by a virus or a bacteria at some time in the species history. (The human genome has 139 such insertions.) How can they determine that? Amongst other indicators, virus and bacterial genomes tend to be rich in C-G pairs (more that 50% and less than 50% A-T pairs) while eukariotic genomes are slightly rich in A-T pairs. So, while all species speak the same 'language', there are definite dialectic differences.
Much more can be said about this exciting field, such as how an experienced researcher can determine a great deal about the structure of the encoded protein by examining the DNA code, but the point is that the 'language' analogy goes very deep and includes syntax and grammar as well as alphabet and words. These rules are so functional and powerful that a genome is more poetry than prose.
Finally, I must confess that I don't have any background in this area, I'm mere a fascinated bystander with his nose pressed against the window, so I would greatly appreciate any one more knowledgeable correcting or amplifying what I have posted.