|
QuickSearch
Welcome! You are not logged in. [ Login ] |
EvC Forum active members: 59 (9025 total) |
| AZPaul3 (1 member, 47 visitors)
|
JustTheFacts | |
Total: 883,277 Year: 923/14,102 Month: 326/597 Week: 104/96 Day: 21/28 Hour: 0/0 |
Thread ▼ Details |
|
Thread Info
|
|
|
Author | Topic: DNA similarity between Chimpanzee and Human 70% | ||||||||||||||||||||||||||||||||||||||||
RAZD Member (Idle past 187 days) Posts: 20714 From: the other end of the sidewalk Joined: |
Hi Telesto, and welcome to the fray.
My guess is that the algorithm is similar to other matching algorithms (such as tree rings) ... So they take one as the baseline and then compare the second one starting with matching both at one end and then shifting the second one along the first one base at a time, recording the degree of matching for each step. The DNA likely has a lot of regions that were duplicated and then modified, so those would produce matches with lower percentages. Enjoy
by our ability to understand Rebel American Zen Deist ... to learn ... to think ... to live ... to laugh ... to share. • • • Join the effort to solve medical problems, AIDS/HIV, Cancer and more with Team EvC! (click) • • •
|
||||||||||||||||||||||||||||||||||||||||
Taq Member Posts: 8468 Joined: Member Rating: 6.0 |
That isn't true. Go back to my post in message #8. Using those sequences, the best hit for the gapped alignment would be 88% similarity. The best hit for the ungapped alignment would be 29%. This isn't because of different length sequences, or comparing different parts of the genome. This is comparing the same two sequences using different parameters. The creationist article biases their methodology by excluding indels. There is no way around it. They do this in order to get a lower percentage for similarity. They use a different methodology that they know will falsely return a lower percentage, and is different than the methodology used in the other papers. It's not as if the author re-sequenced the genomes from scratch and found out that the scientists had reported the wrong sequence. They are using deception to con people that aren't familiar with genetics.
No, it isn't. Different chromsomes have diverged at different rates. There is no expectation that the similarities will be the same for a comparison of any two chromosomes.
The Y chromosome has 50 million bases, or just 1.6% of the total genome. You do know this, right? Let's put this another way. If I said that the average life expectancy was 85 years old, could I prove this wrong by pointing to a baby that died at 1 year old? If I said that the average life expectancy was 85, does this mean that everyone dies at 85, and at no other age?
More importantly, chimp and gorilla are more different than chimp and human. Chimp and orangutan are more different than chimp and human. No species is closer to chimps than humans.
|
||||||||||||||||||||||||||||||||||||||||
Dr Adequate Member Posts: 16112 Joined: |
He means identical. As a way of calibrating the method --- if he gives it two identical bits of data, it should give him 100% as an answer, or there's something wrong with it.
|
||||||||||||||||||||||||||||||||||||||||
Dr Adequate Member Posts: 16112 Joined: |
It wasn't, but they cite it approvingly and the code is there for you to use.
|
||||||||||||||||||||||||||||||||||||||||
Telesto Junior Member (Idle past 2418 days) Posts: 10 From: Zlín Joined: |
First of all... I think we don't understand each other. Probably it is caused by my english - as you realized I am not native speaker.
I completly understand. But I think that this is not the case for blastn algorithm used in the research. I made similar experiment. I created two identical strings 50 bases long. Then I deleted one base in second one on 25 position. So that the second string has only 49 bases and is shifted with one base. I understand what you have told me about overall differences. But lets try to use blastn with parameter -ungapped and -word_size 11. The results are below (numbers: percent identical, sequence length, mismatch bases): 1) For identical strings - 1 hit 2) Second string shorten in the middle - 2 hits 3) One base changed in the middle - 1 hit These are results from blastn. What now?
I was talking about exactly the same chromosomes (e.g. Human Y vs. Human Y).
Sure I know. I chose this chromosome because of its length and because in the research there was smallest similarity. I know this has a little impact for whole genome. But they used in the paper also chromosome Y separately and their result was 43%. I tried to get this number also.
I meant the difference between rat vs. mouse is larger than between chimp vs. human.
|
||||||||||||||||||||||||||||||||||||||||
Telesto Junior Member (Idle past 2418 days) Posts: 10 From: Zlín Joined: |
You'r right. They didn't use it. However I tried to use these scripts and it seems it calculate something I tried to use it on some reference sequences but I failed. I am not sure what values I should set. Perl is quite difficult to read for me
|
||||||||||||||||||||||||||||||||||||||||
Taq Member Posts: 8468 Joined: Member Rating: 6.0 |
It does. If you leave out gaps you will have a much lower score than if gaps are included.
Actually, sfs over at Christian Forums has already done some of the leg work. sfs also happens to be an author on the chimp genome paper, for what it is worth. In message 56 he writes: "I checked: the low percentage of matches does in fact result from only looking for ungapped alignments. I downloaded the human and chimpanzee genomes and the BLAST executable. As a test set, I pulled 500 randomly sampled, non-overlapping slices from chimpanzee chromosome 12, each 300 base pairs long. After dropping any slices that contained unknown sequence (i.e. 'N's), I had 471 test sequences. I fed these into BLASTN against human chromosome 12, using the parameters specified by Tomkins, with and without allowing gaps in the alignment. With no gaps, 68% of my queries yielded matches, in good agreement with Tomkins's finding. With gaps allowed, 100% of queries matched; of these, one or two were of poor quality and likely represent random matches. So the actual matching rate, when doing a proper alignment, was 99.6%." It has already been confirmed that changing from ungapped to gapped makes a huge difference.
|
||||||||||||||||||||||||||||||||||||||||
Telesto Junior Member (Idle past 2418 days) Posts: 10 From: Zlín Joined: |
Hi RAZD,
Yes I understand. Do you think that it is possible to get overall genetical similarity with such method (gapped or ungapped)? I think the blast algorithm is not created for this purpose. Anyway I would like to get the numbers from the research (even if they are wrong). It is bad that I don't know what to do with all the numbers I got. What is the algorithm to get one number that represent overall similarity. I always got thousands of numbers. How they got 43% from these numbers?
|
||||||||||||||||||||||||||||||||||||||||
Taq Member Posts: 8468 Joined: Member Rating: 6.0 |
sfs over at CF had a good analogy for gapped v. ungapped. Let's say that you had a 2 books, each with 1,000 pages. When you begin looking at the 2 books you realize that they are nearly identical. The only difference between the 2 books is that there is an extra space smack dab in the middle of one of the books. Every word, letter, and piece of punctuation is otherwise identical. Now, would you say that these two books are nearly 100% identical? Tomkins would say no. He would say that the two books are only 50% identical. Why? Because he ignores the extra space which puts every letter one space off so that they no longer match up. That is how ridiculous Tomkin's comparison is.
|
||||||||||||||||||||||||||||||||||||||||
saab93f Member (Idle past 176 days) Posts: 265 From: Finland Joined:
|
I wonder how the creationistis reconcile their utter and total lack of integrity with their preconception of moral superiority compared to "secular scientists"? The scientific community should raise their voice a notch or three and really hammer this deceitful nature of creationism so that every layman can understand it. Loathable folks them cretins...
|
||||||||||||||||||||||||||||||||||||||||
Pressie Member Posts: 2087 From: Pretoria, SA Joined:
|
Thanks guys for all the free education.
I'm about six months into my genetics course and I'm starting to understand what you are trying to say, even though I'm not near the level of even attempting a post on genetics here yet! So much to learn. Edited by Pressie, : Spelling
|
||||||||||||||||||||||||||||||||||||||||
Pressie Member Posts: 2087 From: Pretoria, SA Joined: |
quote: I actually agree with you. However, I don't think that a lot of scientists are really interested in taking note or even contemplating commenting on the ramblings of crazy people. Those scientists who do that are spread very thin. Especially in countries where creationists are an endangered species. Those scientists who do read creationist ramblings do it for the fun of it. It's like an early morning dose of comedy just to wake up laughing.
|
||||||||||||||||||||||||||||||||||||||||
Telesto Junior Member (Idle past 2418 days) Posts: 10 From: Zlín Joined:
|
Hi Taq,
Well I made simple application that works as follows: 1) Referenced (subject) chromosome is Human chromosome a) First of all I check if the Chimp subsequence matched. I am not sure what can be considered as MATCH. I guess match means the whole subsequence was found. In this case match is 300 bases long (or longer if I use gaps). If matched subsequence was shorter I counted is as "not match". For example: Best sequence is 298 bases long with 5 mismatch. - NOT match In the end I calculated the percentage of matched sequences according to the above logic. I think this number has nothing to do with the whole genome comparison. It just says how many 300 (or more) bases long similar subsequences of Chimp chromosome was found in Human chromosome. b) Then I was trying to calculate some relevant similarity percentage. First number was taken only from matched subsequences. Subsequences shorter than 300 bases were completly ignored. From these numbers I take the best match. Longest sequence with the lowest number of mismatch. Example: 300 - 5 the winner is 300 - 2 I summarized all these bases and compared them with summarized mismatch. This is I think not much useful. It ignores shorter sequences that were found. For example if in the result file is the best match 289 - 2, it is ignored. c) Next number took into account also shorter sequences, but the rest of bases were added. The missing were counted as mismatch. For example: Best match from result file 289 - 2 was recalculated to 300 - 13 Not sure if this is right... d) Next number was taken from number as they were in result file. Example: best match from result file 289 - 2 was not changed. In the end it was compared with exactly the same number of bases and mismatch. No changes... e) The last number was calculated also from all steps in experiment - matched (300 or longer) and not matched (shorter) sequences. However if the sequence was marked as not matched (shorter) the number was calculated as completly wrong. Example: best match 289 - 3 was marked as not aligned and calculated in sum as 300 - 300 (300 bases long with 300 mismatch = 0% similarity) And here are results for chromosome Y: 1) Ungapped! 2) Gapped So... What is right what is wrong. The only think I can see is the number 45.77% similarity that is very close to 43% reported in research paper. Of course this number is nonsense - but that is another story I hope you understand to my "methodology". Or is there better approach? As you can see gapped was better but not much. I think the most representative number is d) Calculated as it was with no recalculation and no penalty. But with ungapped parameter the results were better 96.03% than with gaps 95.76%. But both very close. I would like to do the same experiment for chromosome 1. But it will take much more time as it is 250 MB large (in contrast to 60 MB of human chromosome Y).
|
||||||||||||||||||||||||||||||||||||||||
Taq Member Posts: 8468 Joined: Member Rating: 6.0 |
Not much? You went from 47% to 72% for matches. I would call that a pretty massive jump, especially given that Tomkins is comparing a 70% match to 95% similarity. As cited above, sfs has already run it and he is more familiar with BLAST batch runs than either of us are. He gets results very close to Tomkins for the ungapped alignments, and near 100% results for gapped. I would call that a real problem for Tomkins.
|
||||||||||||||||||||||||||||||||||||||||
Telesto Junior Member (Idle past 2418 days) Posts: 10 From: Zlín Joined: |
Well for matching yes. But I hoped for allmost 100% according to sfs resluts. But I know that human Y chromosome is most diverse. So I will wait for results of other chromosomes. But I am really curios about these numbers. Do you really think that Tomkins compare number of matches with similarity? Unbelivable... I hoped not, but from my preliminary results it really looks like he did it. I will try to contact sfs
|
|
|
Do Nothing Button
Copyright 2001-2018 by EvC Forum, All Rights Reserved
Version 4.0 Beta
Innovative software from Qwixotic © 2021