29 Evidences for Macroevolution: Part 4

Prediction 17: Functional molecular evidence - Protein functional redundancy

The support for common descent given by studies of molecular sequences can be phrased as a deductive argument. This argument is unique within this FAQ, as it is the only instance we can directly conclude that similarity implies relatedness. This conclusion depends upon the similarity of biological structures within a specific context: the similarity observed between ubiquitous genes from different species.

The following discussion is somewhat technical, so it is first presented in the outline of a deductive argument, which makes the logical thread easy to follow. Here are listed the premises of the argument followed by the conclusion and further discussion.

The gist of the argument:

(P1) There are certain genes that all living organisms have because they perform very basic life functions; these genes are called ubiquitous genes.

(P2) Ubiquitous genes have no relationship with the specific functions of different species. For example, it doesn't matter whether you are a bacterium, a human, a frog, a whale, a hummingbird, a slug, a fungus, or a sea anemone - you have these ubiquitous genes, and they all perform the same basic biological function no matter what you are.

(P3) Any given ubiquitous protein has an extremely large number of different functionally equivalent forms (i.e. protein sequences).

(P4) Obviously, there is no a priori reason why every organism should have the same sequence or even similar sequences. No specific sequence is functionally necessary in any organism - all that is necessary is one of the large number of functionally equivalent forms of a given ubiquitous gene or protein.

(P5) There is one, and only one, observed mechanism which causes two different organisms to have ubiquitous proteins with similar sequences. That mechanism is heredity.

(C) It follows that organisms which have similar sequences for ubiquitous proteins are genealogically related, and the more similar the sequences, the closer the relationship.

Discussion:

Before the advent of DNA sequencing technology, the amino acid sequences of proteins were used to establish the phylogenetic relationships of species. Sequence studies with functional genes have centered on genes of proteins (or RNAs) that are ubiquitous (i.e. all organisms have them). This is done to insure that the comparisons are independent of the overall species phenotype.

For example, suppose we are comparing the protein sequence of a chimpanzee and that of a human. Both of these animals have many similar anatomical characters and functions, so we might expect their proteins to be similar too, regardless of whether they are genealogically related or not. However, we can compare the sequences of very basic genes that are used by all living organisms, such as the cytochrome c gene, which have no influence over specific chimpanzee or human characteristics.

Cytochrome c is an essential and ubiquitous protein found in all organisms, including eukaryotes and bacteria (Voet and Voet 1995, p. 24). The mitochondria of cells contain cytochrome c, where it transports electrons in the fundamental metabolic process of oxidative phosphorylation. The oxygen we breathe is used to generate energy in this process (Voet and Voet 1995, pp. 577-582).

Using a ubiquitous gene such as cytochrome c, there is no reason to assume that two different organisms should have the protein sequence, unless the two organisms are genealogically related. This is due in part to the functional redundancy of protein sequences and structures. Here, "functional redundancy" indicates that many different protein sequences form the same general structure and perform the same general biological role. Cytochrome c is an extremely functionally redundant protein, because many dissimilar sequences all form cytochrome c electron transport proteins. Functional redundancy need not be exact in terms of performance; some functional cytochrome c sequences may be slightly better at electron transport than others, but that is irrelevant for the purposes of this argument.

Decades of biochemical evidence have shown that most amino acid mutations, especially of surface residues, have no effect on protein function or on protein structure (Harris, Sanger et al. 1956; Li 1997, p. 2, Matthews 1996). A striking example is that of the c-type cytochromes from various bacteria, which have virtually no sequence similarity. Nevertheless, they all fold into the same three-dimensional structure, and they all perform the same biological role (Moore and Pettigrew 1990, pp. 161-223; Ptitsyn 1998).

Even within species, most amino acid mutations are functionally silent. For example, there are at least 250 different amino acid mutations known in human hemoglobin, carried by more than 3% of the world's population, that have no clinical manifestation in either heterozygotic or homozygotic individuals (Bunn and Forget 1986; Voet and Voet 1995, p. 235). The phenomenon of protein functional redundancy is very general, and is observed in all known proteins and genes, regardless of the species.

With this in mind, consider again the molecular sequences of cytochrome c. It has been shown that the human cytochrome c protein works just fine in yeast (a unicellular organism) that has had its own native cytochrome c gene deleted, even though yeast cytochrome c differs from human cytochrome c over 40% of the protein (Tanaka et. al 1988a; Tanaka et al. 1988b; Wallace and Tanaka 1994). In fact, the cytochrome c genes from tuna (fish), pigeon (bird), horse (mammal), Drosophila fly (insect), and rat (mammal) all function well in yeast that lack their own native yeast cytochrome c (Clements et al. 1989; Hickey et al. 1991; Koshy et al. 1992; Scarpulla and Nye 1986). Furthermore, extensive genetic analysis of cytochrome c has demonstrated that the majority of the protein sequence is unnecessary for its function in vivo (Hampsey 1986; Hampsey 1988). Only about a third of the 100 amino acids in cytochrome c are necessary to specify its function. Most of the amino acids in cytochrome c are hypervariable (i.e. they can be replaced by a large number of functionally equivalent amino acids) (Dickerson and Timkovich 1975). Importantly, Hubert Yockey has done a careful study in which he calculated that there are a minimum of 2.3 x 10⁹³ possible functional cytochrome c protein sequences, based on these genetic mutational analyses (Hampsey 1986; Hampsey 1988; Yockey 1992, Ch. 6, p. 254). For perspective, the number 10⁹³ is about one billion times larger than the number of atoms in the visible universe. Thus, functional cytochrome c sequences are virtually unlimited in number, and there is no a priori reason for two different species to have the same, or even mildly similar, cytochrome c protein sequences.

In terms of a scientific statistical analysis, the "null hypothesis" is that the identity of non-essential amino acids in the cytochrome c proteins from human and chimpanzee should be random with respect to one another. However, from the theory of common descent and our standard phylogenetic tree we know that humans and chimpanzees are quite closely related. We therefore predict, in spite of the odds, that human and chimpanzee cytochrome c sequences should be much more similar than, say, human and yeast cytochrome c - simply due to inheritance.

Confirmation:

Humans and chimpanzees have the exact same cytochrome c protein sequence. The "null hypothesis" given above is false. In the absence of common descent, the chance of this occurrence is conservatively less than 10^-93 (1 out of 10⁹³). Thus, the high degree of similarity in these proteins is a spectacular corroboration of the theory of common descent. Furthermore, human and chimpanzee cytochrome c proteins differ by ~10 amino acids from all other mammals. The chance of this occurring in the absence of a hereditary mechanism is less than 10^-29. The yeast Candida krusei is one of the most distantly related eukaryotic organisms from humans. Candida has 51 amino acid differences from the human sequence. A conservative estimate of this probability is less than 10^-25.

One possible, yet unlikely, objection is that the slight differences in functional performance between the various cytochromes could be responsible for this sequence similarity. This objection is unlikely because of the incredibly high number of nearly equivalent sequences that would be phenotypically indistinguishable for any required level of performance. Additionally, nearly similar sequences do not necessarily give nearly similar levels of performance.

Nonetheless, for the sake of argument, let us assume that a cytochrome c that transports electrons faster is required in organisms with active metabolisms or with high rates of muscle contraction. If this were true, we might expect to observe a pattern of sequence similarity that correlates with similarity of environment or with physiological requirement. However, this is not observed. For example, bat cytochrome c is much more similar to human cytochrome c than to hummingbird cytochrome c; porpoise cytochrome c is much more similar to human cytochrome c than to shark cytochrome c. As stated earlier in prediction 3, the phylogenetic tree constructed from the cytochrome c data exactly recapitulates the relationships of major taxa as determined by the completely independent morphological data (McLaughlin and Dayhoff 1973). These facts only further support the idea that cytochrome c sequences are independent of phenotypic function (other than the obvious requirement for a functional cytochrome c that transports electrons).

Recap:

The point of this prediction is subtly different from prediction 3, "Convergence of independent phylogenies." The evidence given above demonstrates that for many ubiquitous functional proteins (such as cytochrome c), there is an enormous number of equivalent sequences which could form that protein in any given organism. Whenever we find that two organisms have the same or very similar sequences for a ubiquitous protein, we know that something fishy is going on. Why would these two organisms have such similar ubiquitous proteins when the odds are astronomically against it? We know of only one reason for why two organisms would have two similar protein sequences in the absence of functional necessity: heredity. Thus, in such cases we can confidently deduce that the two organisms are genealogically related. In this sense, sequence similarity is not only a test of the theory of common descent; common descent is also a deduction from the principle of heredity and the observation of sequence similarity. Finally, the similarity observed for cytochrome c is not confined to this single ubiquitous protein; all ubiquitous proteins that have been compared between chimpanzees and humans are highly similar, and there have been many comparisons.

Potential Falsification:

Without assuming the theory common descent, the most probable result is that the cytochrome c protein sequences in all these different organisms would be very different from each other. If this were the case, a phylogenetic analysis would be impossible, and this would provide very strong evidence for a genealogically unrelated, perhaps simultaneous, origin of species (Dickerson 1972; Yockey 1992; Li 1997).

Furthermore, the very basis of this argument could be undermined easily if it could be demonstrated (1) that species specific cytochrome c proteins were functional exclusively in their respective organisms, or (2) that no other cytochrome c sequence could function in an organism other than its own native cytochrome c, or (3) that a mechanism besides heredity can causally correlate the sequence of a ubiquitous protein with a specific organismic morphology.

Prediction 18: Functional molecular evidence - DNA coding redundancy

Like protein sequence similarity, the DNA sequence similarity of two ubiquitous genes also implies common ancestry. Of course, comprehensive DNA sequence comparisons of conserved proteins such as cytochrome c also indirectly take into account amino acid sequences, since the DNA sequence specifies the protein sequence. However, with DNA sequences there is an extra level of redundancy. The genetic code itself is informationally redundant; on average there are three different codons (a codon is a triplet of DNA bases) that can specify the exact same amino acid (Voet and Voet 1995, p. 966). Thus, for cytochrome c there are approximately 3¹⁰⁴, or over 10⁴⁹, different DNA sequences (and, hence, 10⁴⁹ different possible genes) that can specify the exact same protein sequence.

Here we can be quite specific in our prediction. Any sequence differences between two functional cytochrome c genes are necessarily functionally neutral or nearly so. The background mutation rate in humans (and most other mammals) has been measured at ~1-5 x 10^-8 base substitutions per site per generation (Mohrenweiser 1994, pp. 128-129), and an average primate generation is about 20 years. From the fossil record, we know that humans and chimpanzees diverged from a common ancestor less than 10 million years ago (a conservative estimate - most likely less than 6 million years ago) (Stewart and Disotell 1998). Thus, if chimps and humans are truly genealogically related, we predict that the difference between their respective cytochrome c gene DNA sequences should be less than 3% - probably even much less, due to the essential function of the cytochrome c gene.

Confirmation:

As mentioned above, the cytochrome c proteins in chimps and humans are exactly identical. The clincher is that the two DNA sequences that code for cytochrome c in humans and chimps differ by only one base (a 0.3% difference), even though there are 10⁴⁹ different sequences that could code for this protein.

The combined effects of DNA coding redundancy and protein sequence redundancy make DNA sequence comparisons doubly redundant; DNA sequences of ubiquitous proteins are completely uncorrelated with phenotype, but they are strongly causally correlated with heredity. This is why DNA sequence phylogenies are considered so robust.

Potential Falsification:

The most probable result is that the DNA sequences coding for these proteins should be radically different. This would be a resounding falsification of macroevolution, and it would be very strong evidence that chimpanzees and humans are not closely genealogically related. Of course, the potential falsifications for prediction 17 also apply to DNA sequences.

Prediction 19: Nonfunctional molecular evidence - Transposons

Transposons are very similar to viruses. However, they lack genes for viral coat proteins, cannot cross cellular boundaries, and thus they replicate only in the genome of their host. They can be thought of as intragenomic parasites. Except in the rarest of circumstances, the only mode of transmission from one metazoan organism to another is directly by DNA duplication and inheritance (e.g. your transposons are given to your children) (Li 1997, pp. 338-345).

Replication for a transposon means copying itself and inserting the copied DNA randomly somewhere else in the host's genome. Transposon replication (also called transposition) has been directly observed in many organisms, including yeast, corn, wallabies, humans, bacteria, and flies, and recently the mechanisms have become well understood (Li 1997, pp. 335-338; Futuyma 1998, pp. 639-641). Specific observed cases of retrotransposition are known to have caused neurofibromatosis and hemophilia in humans (Kazazian, Wong et al. 1088; Wallace, Andersen et al. 1991).

Finding the same transposon in the same chromosomal location in two different species is strong direct evidence of common ancestry, since they insert randomly and generally cannot be transmitted except by inheritance. In addition, once a common ancestor has been postulated that contains this transposition, all the descendants of this common ancestor should also contain the same transposition. A possible exception is if this transposition were removed due to a rare deletion event; however, deletions are never clean and usually part of the transposon sequence remains.

Confirmation:

A common class of transposon is the SINE retroelement (Li 1997, pp. 349-352). One important SINE transposon is the 300 bp Alu element. All mammals contain many Alu elements, including humans where they constitute 10% of the human genome (i.e. 60 million bases of functionless DNA) (Smit 1996; Li 1997, pp. 354, 357). Very recent human Alu transpositions have been used to elucidate historic and prehistoric human migrations, since some individuals have newer Alu insertions that other individuals lack. Most importantly, in the human α-globin cluster there are seven Alu elements, and each one is shared with chimpanzees in the exact same seven locations (Sawada, Willard et al. 1985).

More specifically, three different specific SINE transpositions have been found in the same chromosomal locations of cetaceans (whales), hippos, and ruminants, all of which are closely related according to the standard phylogenetic tree. However, all other mammals, including camels and pigs, lack these three specific transpositions (Shimamura 1997).

More detail and explanation can be found on this topic in Edward Max's Plagiarized Errors and Molecular Genetics FAQ.

Prediction 20: Nonfunctional molecular evidence - Pseudogenes

Other nonfunctional molecular examples that provide evidence of common ancestry are pseudogenes. Pseudogenes are very closely related to their functional counterparts (in primary sequence and often in chromosomal location), except that either they have faulty regulatory sequences or they have internal stops that keep the protein from being made. They are functionless and do not affect an organism's phenotype when deleted. Pseudogenes, if they are not vestigial (like the examples in prediction 7), are created by gene duplication and subsequent mutation. There are many observed processes that duplicate genes, including transposition events, chromosomal duplication, and unequal crossing over of chromosomes. Like transpositions (c.f. prediction 19), gene duplication is a rare and random event and, of course, any duplicated DNA is inherited. Thus, finding the same pseudogene in the same chromosomal location in two species is strong evidence of common ancestry.

Confirmation:

There are very many examples of shared pseudogenes between primates and humans. One is the ψη-globin gene, a hemoglobin pseudogene. It is shared among the primates only, in the exact chromosomal location, with the same mutations that render it nonfunctional (Goodman, Koop et al. 1989). Another example is the steroid 21-hydroxylase gene. Humans have two copies of the steroid 21-hydroxylase gene, a functional one and a nonfunctional pseudogene. Inactivation of the functional gene leads to congenital adrenal hyperplasia (CAH), a rare and serious genetic disease. Chimps and humans both share the same eight bp deletion in this pseudogene that renders it nonfunctional (Kawaguchi, O'hUigin et al. 1992). Note that in this case, the nonfunctionality of the pseudogene has been positively demonstrated.

Prediction 21: Nonfunctional molecular evidence - Endogenous retroviruses

Figure 4.20.1. Human endogenous retrovirus K (HERV-K) insertions in identical chromosomal locations in various primates (Lebedev et al. 2000).

Yet another nonfunctional example is given by endogenous retroviruses. Endogenous retroviruses are molecular remnants of a past parasitic viral infection. Occasionally, copies of a retrovirus genome are found in its host's genome, and these retroviral gene copies are called endogenous retroviral sequences. Retroviruses (like the AIDS virus or HTLV1, which causes a form of leukemia) make a DNA copy of their own viral genome and insert it into their host's genome. If this happens to a germ line cell (i.e. the sperm or egg cells) the retroviral DNA will be inherited by descendants of the host. Again, this process is rare and fairly random, so finding retrogenes in identical chromosomal positions of two different species indicates common ancestry.

Confirmation:

In humans, endogenous retroviruses occupy about 1% of the genome, in total constituting ~30,000 different retroviruses embedded in each person's genomic DNA (Sverdlov 2000). There are at least seven different known instances of common retrogene insertions between chimps and humans, and this number is sure to grow as both these organism's genomes are sequenced (Bonner, O'Connell et al. 1982; Dangel, Baker et al. 1995; Svensson, Setterblad et al. 1995; Kjellman, Sjogren et al. 1999; Lebedev et al. 2000; Sverdlov 2000). Figure 4.20.1 shows a phylogenetic tree of several primates, including humans, from a recent study which identified numerous shared endogenous retroviruses in the genomes of these primates (Lebedev et al. 2000). The arrows designate the relative insertion times of the viral DNA into the host genome. All branches after the insertion point (to the right) carry that retroviral DNA - a reflection of the fact that once a retrovirus has inserted into the germ-line DNA of a given organism, it will be inherited by all ancestors of that organism.

The Felidae (i.e. cats) provide another example. The standard phylogenetic tree has small cats diverging later than large cats. The small cats (e.g. the jungle cat, European wildcat, African wildcat, blackfooted cat, and domestic cat) share a specific retroviral gene insertion. In contrast, all other carnivores which have been tested lack this retrogene (Futuyma 1998, pp. 293-294).

Potential Falsification:

It would make no sense, macroevolutionarily, if certain other mammals (e.g. dogs, cows, platypi, etc.), had these same retrogenes in the exact same chromosomal locations. For instance, it would be incredibly unlikely for dogs to also carry the three HERV-K insertions that are unique to humans, as shown in the upper right of Figure 4.20.1, since none of the other primates have these retroviral sequences.

References

Bonner, T. I., C. O'Connell, et al. (1982). "Cloned endogenous retroviral sequences from human DNA." PNAS 79: 4709.

Bunn, H. F. and E. G. Forget (1986). Hemoglobin: Molecular, Genetic, and Clinical Aspects. Saunders.

Clements, J.M., O'Connell, L.I., Tsunasawa, S., and Sherman, F. (1989) "Expression and activity of a gene encoding rat cytochrome c in the yeast Saccharomyces cerevisiae." Gene 83: 1-14.

Dangel, A. W., B. J. Baker, et al. (1995). "Complement component C4 gene intron 9 as a phylogenetic marker for primates: long terminal repeats of the endogenous retrovirus ERV-K(C4) are a molecular clock of evolution." Immunogenetics 42: 41-52.

Dickerson, R. E. (1972). Scientific American. 226: 58-72.

Dickerson, R. E. and R. Timkovich (1975). cytochrome c. The Enzymes. P. D. Boyer. New York, Academic Press. 11: 397-547.

Futuyma, D. (1998). Evolutionary Biology. Third edition. Sunderland, MA, Sinauer Associates.

Goodman, M., B. F. Koop, et al. (1989). "Molecular phylogeny of the family of apes and humans." Genome 31(316-335).

Hampsey, D. M., Das, G., and Sherman F. (1986). "Amino acid replacements in yeast iso-1-cytochrome c." Journal of Biological Chemistry 261: 3259-71.

Hampsey, D. M., Das, G., and Sherman F. (1988). "Yeast iso-1-cytochrome c: genetic analysis of structural requirements." FEBS Letters 231: 275-83.

Harris, J. I., F. Sanger, et al. (1956). "Species differences in insulin." Archives of Biochemistry and Biophysics 65: 427-438.

Hickey, D.R., Jayaraman, K., Goodhue, C.T., Shah,J., Fingar, S.A., Clements, J.M., Hosokawa, Y., Tsunasawa, S., and Sherman, F. (1991) "Synthesis and expression of genes encoding tuna, pigeon, and horse cytochromes c in the yeast Saccharomyces cerevisiae." Gene105: 73-81.

Kawaguchi, H., C. O'hUigin, et al. (1992). "Evolutionary origin of mutations in the primate cytochrome P450c21 gene." American Journal of Human Genetics 50: 766-780.

Kazazian, H. H., C. Wong, et al. (1988). "Haemophilia A resulting from de novo insertion of L1 sequences represents a novel mechanism for mutation in man." Nature 332: 164.

Kjellman, C., H. O. Sjogren, et al. (1999). "HERV-F, a new group of human endogenous retrovirus sequences." Journal of General Virology 80: 2383.

Lebedev, Y. B., Belonovitch, O. S., Zybrova, N. V, Khil, P. P., Kurdyukov, S. G., Vinogradova, T. V., Hunsmann, G., and E. D. Sverdlov. (2000). "Differences in HERV-K LTR insertions in orthologous loci of humans and great apes." Gene 247: 265-277.

Li, W.-H. (1997). Molecular Evolution. Sunderland, MA, Sinauer Associates.

Matthews, B.W. (1996). "Structural and genetic analysis of the folding and function of T4 lysozyme." FASEB J. 10: 35-41.

McLaughlin, P. J. and M. O. Dayhoff (1973). "Eukaryote evolution: a view based on cytochrome c sequence data." Journal of Molecular Evolution 2(99-116).

Moore, G. R. and G. W. Pettigrew (1990). Cytochromes c: Evolutionary, Structural and Physicochemical Aspects. Berlin, Springer-Verlag.

Ptitsyn, O. B. (1998). "Protein folding and protein evolution: common folding nucleus in different subfamilies of c-type cytochromes." Journal of Molecular Biology 278: 655.

Sawada, I., C. Willard, et al. (1985). "Evolution of Alu family repeats since the divergence of human and chimpanzee." Journal of Molecular Evolution 22(316).

Scarpulla, R.C., and Nye, S.H. (1986) "Functional expression of rat cytochrome c in Saccharomyces cerevisiae." Proc Natl Acad Sci 83: 6352-6.

Shimamura, M., et al. (1997). "Molecular evidence from retroposons that whales form a clade within even-toed ungulates." Nature 388: 666.

Smit, A. F. A. (1996). "The origin of interspersed repeats in the human genome." Current Opinion in Genetics and Development 6: 743-748.

Svensson, A. C., N. Setterblad, et al. (1995). "Primate DRB genes from the DR3 and DR8 haplotypes contain ERV9 LTR elements at identical positions." Immunogenetics 41: 74.

Sverdlov, E. D. (2000). "Retroviruses and primate evolution." BioEssays 22: 161-171.

Tanaka, Y., Ashikari, T., Shibano, Y., Amachi, T., Yoshizumi, H., and Matsubara, H. (1988a) "Amino acid replacement studies of human cytochrome c by a complementation system using CYC1 deficient yeast." J Biochem (Tokyo) Sep;104: 477-80.

Tanaka, Y., Ashikari, T., Shibano, Y., Amachi, T., Yoshizumi, H., and Matsubara, H. (1988b) "Construction of a human cytochrome c gene and its functional expression in Saccharomyces cerevisiae." J Biochem (Tokyo) 103: 954-61.

Voet, D. and J. Voet. (1995). Biochemistry. New York, John Wiley and Sons.

Wallace, C.J. and Tanaka, Y. (1994) "Improving cytochrome c function by protein engineering?: studies of site-directed mutants of the human protein." J. Biochem. (Tokyo) 115: 693-700.

Wallace, M. R., L. B. Andersen, et al. (1991). "A de novo Alu insertion results in neurofibromatosis type 1." Nature 353: 864-866.

Yockey, H. P. (1992). Information Theory and Molecular Biology. New York, Cambridge University Press.

29 Evidences for Macroevolution

Part 4:
The Molecular Sequence Evidence

Part 4 Outline

Prediction 17: Functional molecular evidence - Protein functional redundancy

The gist of the argument:

Discussion:

Confirmation:

Recap:

Potential Falsification:

Prediction 18: Functional molecular evidence - DNA coding redundancy

Confirmation:

Potential Falsification:

Prediction 19: Nonfunctional molecular evidence - Transposons

Confirmation:

Prediction 20: Nonfunctional molecular evidence - Pseudogenes

Confirmation:

Prediction 21: Nonfunctional molecular evidence - Endogenous retroviruses

Confirmation:

Potential Falsification:

References

29 Evidences for Macroevolution

Part 4:The Molecular Sequence Evidence

Part 4 Outline

Prediction 17: Functional molecular evidence - Protein functional redundancy

The gist of the argument:

Discussion:

Confirmation:

Recap:

Potential Falsification:

Prediction 18: Functional molecular evidence - DNA coding redundancy

Confirmation:

Potential Falsification:

Prediction 19: Nonfunctional molecular evidence - Transposons

Confirmation:

Prediction 20: Nonfunctional molecular evidence - Pseudogenes

Confirmation:

Prediction 21: Nonfunctional molecular evidence - Endogenous retroviruses

Confirmation:

Potential Falsification:

References

Part 4:
The Molecular Sequence Evidence