Message 31 of 32 (720114)
02-20-2014 11:28 AM
Not aligned =/= 0% similarity
The other egregious, but easy to miss, deception that Tomkins uses is that sequence that could not be aligned between humans and chimps is necessarily 0% similar. This is completely false.
When they say that sequence could not be aligned they are saying that they aren't sure where in the genome that chunk of DNA belongs. This often happens in regions with lots of repeats. If they can't verify where a DNA sequence belongs in a genome they can not guarantee that they are looking at orthologous DNA, so they keep it out of the comparisons. For example, if they aren't sure if there are 10 repeats or 15 repeats between one chunk of DNA and the rest of the genome, it is said to be unaligned. This can be due to something as simple as not having enough sequencing coverage in that part of the genome. It does not necessarily mean that they could not find homologous sequence in the other genome. Even two random DNA sequences will share 25% homology, but Tomkins claims that unaligned sequence necessarily means that there is no homologous sequence in the other genome.
There are several instances where Tomkins makes this claim. This is one example:
"Nevertheless, enough data from the 2005 chimp genome project was available to allow rough estimates of overall genome similarity. Tomkins and Bergman (2012) derived a calculation that included published concurrent information from the human genome project along with the data reported in the 2005 chimpanzee paper and estimated an overall genome DNA similarity of 80.6%, which they proposed as a very conservative figure (see Tomkins and Bergman 2012, for details). "
How did they get that 80.6% figure? They counted the unaligned sequence as 0% similar.
"In summary, only 2.3 Gb of chimp sequence aligned onto the highly accurate and complete human genome (2.85 Gb) an operation that included the masking of low complexity sequences. For the chimp sequence that aligned, the data for substitutions and indels indicates 95.8% similarity, a biased figure which excludes the masked regions. Using these numbers, an overall estimate of chimp compared to human DNA produces a conservative estimate of genome-wide similarity at 80.6%."
The moral of the story is that you can't compare unaligned sequence because you don't know if it is orthologous or not. That is why it is excluded, not because it lacks any homology to the other genome.
Edited by Taq, : No reason given.