Anyone familiar with the debates on this site is likely to have seen many references to the evolution, or rather sudden appearance, of the Nylonase gene. The Nylonase gene is thought to have arisen through a frame shift mutation giving rise to a novel open reading frame coding the nylon degrading enzyme.
This sort of
de novo generation of a new gene is considered a very rare event, and little is known about such occurences as they are generally very hard to detect. Apart from Nylonase there are only a few known instances in Yeasts and
Drosophila
A recent research paper in 'Genome Research' entitled 'Recent de novo origin of human protein-coding genes' looks for exactly these sort of genes in humans (
Knowles and McLysaght, 2009).
Using a variety of sequence similarity search tools on several primate species, and information from a number of other organisms, the Authors identified 3 genes which produced protein products in human but not in any of the other primate lineages, or in other species that they studied. In the primate lineages all three genes shared specific sequence differences which meant that the sequence would not be transcribed or would produce a radically different protein if it was transcribed, certainly in the chimpanzee all three genes are considered non-coding.
The mutations producing the
de novo genes are a single nucleotide subsitution removing a stop codon, a single nucleotide deletion causing a 'frameshift' and a 10bp insertion creating another 'frameshift' (is it really a frameshift if the original sequence is non-coding?).
The authors estimate that there are likely to be ~18 such
de novo genes unique to the human lineage. They can only estimate because only a subset of human coding genes, ~4000, were suitable for their methods of analysis.
This is a really interesting paper, but what does it mean in terms of the debate between Evolution and Creation. Does the existence of these sort of mutations provide further evidence of the ability of random mutation to create functional sequences, albeit ones we don't really know the function of. Does the pre-existence of a sequence which in 1 small step can suddenly produce a full coding sequence for a new gene argue instead for some sort of front-loading style ID. How can the switch from coding to non-coding be encomapassed in the sort of informational metrics which come up in the EvC debate?
TTFN,
WK