What are the chances that a random DNA sequence will have beneficial function, and therefore information? A lot better than you may think.
A team of researches inserted random DNA sequences into E. coli and then tested to see if those random DNA sequences increased fitness. To their surprise, 25% of the random DNA sequences were beneficial as either RNA molecules or as proteins.
quote:Random sequences are an abundant source of bioactive RNAs or peptides
It is generally assumed that new genes arise through duplication and/or recombination of existing genes. The probability that a new functional gene could arise out of random non-coding DNA is so far considered to be negligible, as it seems unlikely that such an RNA or protein sequence could have an initial function that influences the fitness of an organism. Here, we have tested this question systematically, by expressing clones with random sequences in Escherichia coli and subjecting them to competitive growth. Contrary to expectations, we find that random sequences with bioactivity are not rare. In our experiments we find that up to 25% of the evaluated clones enhance the growth rate of their cells and up to 52% inhibit growth. Testing of individual clones in competition assays confirms their activity and provides an indication that their activity could be exerted by either the transcribed RNA or the translated peptide. This suggests that transcribed and translated random parts of the genome could indeed have a high potential to become functional. The results also suggest that random sequences may become an effective new source of molecules for studying cellular functions, as well as for pharmacological activity screening.
Not only is information present in random DNA sequences, it is very common. This counters the ID claim that information can only come about if an intelligence puts it there.
The 25% beneficial claim comes from those that perform better than the empty vector but the empty vector is not a true negative control; rather it still contains the FLAG tag so rather than comparing whether random sequences are beneficial per se, it is really comparing whether they are better or worse than having a different sequence before the FLAG tag is better or worse than having the FLAG tag alone. Also, I can see no evidence that they controlled for plasmid copy number, and I think the plasmid they're using can have variable copy number.
I would think that an increase in fitness between a FLAG tag and a FLAG fusion protein (or RNA molecule) would indicate that the added 150 base pairs is responsible for the increase in fitness.
As to copy number, it would appear to be a pretty standard expression vector which would have multiple copies and very high induced expression. My criticism is that the introduced gene may make up a disproportionate percentage of total RNA or total protein.
As to stop codons, the gene could still have activity as an RNA gene, so I don't see too much trouble with the translated peptide not having activity.
It does indicate an increase in fitness, but it does not indicate that the random DNA sequence is coding for anything of consequence. They want to argue that the sequences they've introduced are doing something of consequence. I think the more obvious interpretation is that they are not doing anything in, and of, themselves but rather preventing a harmful effect.
That is a fair criticism. It would be much more helpful if they moved these genes into the bacterial chromosome so they could compare fitness to the wild type strain, and especially so for the FLAG-only version of the gene. It certainly wouldn't be feasible for the thousands of random sequences in their library, but they could have at least done this with their control plasmid.
Quite. But they measure (organism) fitness using the proxy of the number of copies of the plasmid in their population; if the plasmid varies systematically in copy number that will bias their results.
A very good point. In my own experience, plasmid stability can vary quite a bit due to different inserts in the same plasmid.
RNA can certainly have activity; however the majority of active sequences in the body are protein coding. If you're claiming that you have hundreds of beneficial sequences but only one is influenced by the STOP codon that implies a ratio of active protein sequences to RNA sequences I find implausible.
From my reading, they only tested 3 clones individually, 2 of which were not affected by the insertion of a stop codon. From the paper:
"Although two of our three individually tested clones suggest that the RNA function could be more important than the protein function, this constitutes at present only a small sample and may not be indicative of the true ratio between RNA and peptide functions."
A Nature blog post critical of the study. Their objections are different to mine but I agree with them too.
I agree with most of them, too. There is no way around the fact that the lac promoter is leaky, so you will always get some expression. This will bias your results, even in the absence of induction with IPTG. Even after a few generations, their library will be skewed away from lethal and neutral mutations and towards slightly beneficial and beneficial mutations. That selection will be skewed once again when you have strong over-expression with IPTG.
One could also easily argue that a specific gene is slightly beneficial with low expression, but deleterious with high expression. This may be why they see stable variation with no induction, but changes after induction.
Either way, they seem to have found beneficial genes among random sequences. The real question is how common are they?