Understanding through Discussion


Welcome! You are not logged in. [ Login ]
EvC Forum active members: 122 (8764 total)
Current session began: 
Page Loaded: 06-28-2017 10:11 PM
398 online now:
Coragyps, DrJones*, jar, nwr, Riggamortis (5 members, 393 visitors)
Chatting now:  Chat room empty
Newest Member: superuniverse
Upcoming Birthdays: ooh-child
Post Volume:
Total: 812,380 Year: 16,986/21,208 Month: 2,875/3,593 Week: 342/646 Day: 105/115 Hour: 0/0

Announcements: Reporting debate problems OR discussing moderation actions/inactions


Thread  Details

Email This Thread
Newer Topic | Older Topic
  
1
23Next
Author Topic:   DNA similarity between Chimpanzee and Human 70%
Telesto
Junior Member (Idle past 1081 days)
Posts: 10
From: Zlín
Joined: 02-03-2014


Message 1 of 32 (719960)
02-18-2014 4:12 PM


There is an article in Answers Research Journal with title: Comprehensive Analysis of Chimpanzee and Human Chromosomes Reveals Average DNA Similarity of 70%

What do you think about this article. Wouldn't be greate to reproduce the data from publicaly available resources using BLASTN algoritm? I am very interested in reproducing the results. If we achieve the same data we can use the same methodology for two species inside one baramin. I think it would be very interesting.

All resources (blastn program, DNA sequences) are available free.


Replies to this message:
 Message 2 by AdminNosy, posted 02-18-2014 4:33 PM Telesto has responded

    
AdminNosy
Administrator
Posts: 4753
From: Vancouver, BC, Canada
Joined: 11-11-2003


Message 2 of 32 (719961)
02-18-2014 4:33 PM
Reply to: Message 1 by Telesto
02-18-2014 4:12 PM


More Please
I suggest you add a link and discuss this in your own words to some degree. Then we can see about promotiing it.
This message is a reply to:
 Message 1 by Telesto, posted 02-18-2014 4:12 PM Telesto has responded

Replies to this message:
 Message 3 by Telesto, posted 02-19-2014 6:56 AM AdminNosy has responded

  
Telesto
Junior Member (Idle past 1081 days)
Posts: 10
From: Zlín
Joined: 02-03-2014


Message 3 of 32 (719962)
02-19-2014 6:56 AM
Reply to: Message 2 by AdminNosy
02-18-2014 4:33 PM


Re: More Please
There are several papers about DNA comparison of Human and Chimpanzee by comparing nucleotide bases:

http://www.answersingenesis.org/articles/arj/v4/n1/blastin
http://www.answersingenesis.org/...n1/human-chimp-chromosome

All of them indicate large differences 70%-89% between Human and Chimpanzee, which is in contrast to generally accepted difference between 94%-98%. This is presented as proof that Humans and Chimpanzees are not closly related as it was presented for decades.

I was very interested in these papers. I am software developer so I have close to "playing" with algorithms. And because all resources are free on the Internet I was wondering if I could reproduce results.

I think these numbers are taken by different method than previous high similarity results. So I think other reference data is needed. But I have difficulties in obtaining results presented in papers above so I was wondering if there is anybody, who can help.

I have several goals:

1) Opinions: I would like to know what other think about this research and used methodology.
2) Verification: I would like to verify data from papers to be sure my method is exactly the same as it was used in papers.
3) Test: Because I am suspicious (my preliminary results are far from numbers preseted in paper), I would like to do some method verification test: e.g. compare Human-Human DNA.
4) Further research: By the same method compare DNA between species inside one baramin (e.g. mouse and rat).

I think sharing these information, advices, tips, hints in this disscussion would be benefical. We can find some interesting results together.

What do you think?


This message is a reply to:
 Message 2 by AdminNosy, posted 02-18-2014 4:33 PM AdminNosy has responded

Replies to this message:
 Message 4 by AdminNosy, posted 02-19-2014 11:50 AM Telesto has not yet responded
 Message 9 by Dr Adequate, posted 02-19-2014 1:55 PM Telesto has responded

    
AdminNosy
Administrator
Posts: 4753
From: Vancouver, BC, Canada
Joined: 11-11-2003


Message 4 of 32 (719963)
02-19-2014 11:50 AM
Reply to: Message 3 by Telesto
02-19-2014 6:56 AM


some details please
That's more but it might not help people. AiG is notorious for doing things badly. You need to tempt people to want to bother to read the papers which they may think is going to be a waste of time. Though some will enjoy just diving in anyway. Maybe they will post to ask that this be promoted without you doing more work.

If you could describe the methods a bit it might help.

However, maybe you are like me and don't know enough about this to unscramble it. If that is the case tell me and I'll promote this to see if others will help you out.


This message is a reply to:
 Message 3 by Telesto, posted 02-19-2014 6:56 AM Telesto has not yet responded

  
AdminNosy
Administrator
Posts: 4753
From: Vancouver, BC, Canada
Joined: 11-11-2003


Message 5 of 32 (719965)
02-19-2014 1:08 PM


Thread Copied from Proposed New Topics Forum
Thread copied here from the DNA similarity between Chimpanzee and Human 70% thread in the Proposed New Topics forum.
  
NosyNed
Member
Posts: 8782
From: Canada
Joined: 04-04-2003
Member Rating: 4.1


Message 6 of 32 (719966)
02-19-2014 1:09 PM


You may thank the good doctor A.
since he suggested promoting this.
  
Coyote
Member
Posts: 5868
Joined: 01-12-2008
Member Rating: 3.9


Message 7 of 32 (719971)
02-19-2014 1:22 PM


Comparing the human and chimpanzee genomes: Searching for needles in a haystack

Ajit Varki and Tasha K. Altheide

From the abstract:

The chimpanzee genome sequence is a long-awaited milestone, providing opportunities to explore primate evolution and genetic contributions to human physiology and disease. Humans and chimpanzees shared a common ancestor ∼5-7 million years ago (Mya). The difference between the two genomes is actually not ∼1%, but ∼4%—comprising ∼35 million single nucleotide differences and ∼90 Mb of insertions and deletions. ...

http://genome.cshlp.org/content/15/12/1746.long

Given the reputation of Answers in Genesis for accurate scholarship (i.e., none to speak of), I would expect a result much closer to the 4% cited in the above article than what AiG cites.


Religious belief does not constitute scientific evidence, nor does it convey scientific knowledge.

Belief gets in the way of learning--Robert A. Heinlein

How can I possibly put a new idea into your heads, if I do not first remove your delusions?--Robert A. Heinlein

It's not what we don't know that hurts, it's what we know that ain't so--Will Rogers

If I am entitled to something, someone else is obliged to pay--Jerry Pournelle

If a religion's teachings are true, then it should have nothing to fear from science...--dwise1


  
Taq
Member
Posts: 6651
Joined: 03-06-2009
Member Rating: 4.0


(4)
Message 8 of 32 (719974)
02-19-2014 1:26 PM


One word: Gaps
The main thesis of the creationist paper falls apart once you understand the underhanded tricks that they use to get their 70%. To the uninitiated, it may seem like a subtle difference, but it isn't. The author uses a non-gapped alignment.

"Gapping was disallowed for a variety of reasons. First, Altschul et al. (1990) determined that the addition of gapping strategies for alignments designed to locate regions of local similarity using BLAST was negligible. Secondly, an objective comparison among all queries negates the use of gapping with the algorithm."
http://www.answersingenesis.org/articles/arj/v4/n1/blastin

Those excuses are just utter BS. If you are going to compare genomes in an objective, fair, and unbiased manner you have to include indels. It just so happens that insertions and deletions really do happen, so they have to be part of any comparison between two genomes. Let's take a look at what a massive difference a gapped and ungapped alignment can make using a random stretch of DNA:

Gapped alignment

Species A:  TATA-AGCGTAGGCAAT
Species B: CATAGAGCGTAGGCAAT

With this alignment, there is a one base indel and one substitution mutation at the very beginning for an overall identity of 15/17, or 88%. Now for the ungapped alignment.

Species A:  TATAAGCGTAGGCAAT
Species B: CATAGAGCGTAGGCAAT

The overall identity is now 5/17, or 29%.

The author of the creationist paper has rigged the methodology to ignore gaps, and therefore return a false result.

The projection in the rest of the paper is also worth discussing, but this is the one major issue that the paper has and so it should be discussed first.

Edited by Taq, : No reason given.

Edited by Taq, : No reason given.


Replies to this message:
 Message 10 by RAZD, posted 02-19-2014 2:03 PM Taq has responded
 Message 13 by Telesto, posted 02-19-2014 4:03 PM Taq has responded
 Message 14 by NosyNed, posted 02-19-2014 4:14 PM Taq has not yet responded
 Message 25 by saab93f, posted 02-20-2014 1:10 AM Taq has not yet responded

  
Dr Adequate
Member
Posts: 15942
Joined: 07-20-2006
Member Rating: 3.3


Message 9 of 32 (719976)
02-19-2014 1:55 PM
Reply to: Message 3 by Telesto
02-19-2014 6:56 AM


Re: More Please
All of them indicate large differences 70%-89% between Human and Chimpanzee, which is in contrast to generally accepted difference between 94%-98%.

Psst ... you mean similarities, not differences.

Now, is it "in contrast to" the generally accepted figure? Well, if it was, their results would have been meaningful, and they could have published them in a real journal. But what they've actually done is picked a different method of measuring difference. Your weight in kilograms is not in contrast to your weight in ounces, it's just a different metric.

I think these numbers are taken by different method than previous high similarity results.

Yes.

Now the ideal metric for difference would be one that measures the number of mutations needed to get from one genome to the other; this would be biologically meaningful. As Taq points out, the creationists are ignoring the possibility of indels, so they get a different and less meaningful figure. But it's larger, which is what they're aiming for. But by doing that, they've rendered worthless their conclusion that "this defies standard evolutionary time-scales".

4) Further research: By the same method compare DNA between species inside one baramin (e.g. mouse and rat).

An excellent idea.

From the data I can find, humans and chimps should be further apart than the two sequenced species of macaques, which belong to the same genus; maybe a little further apart than a domestic cat and a tiger; and closer than a rat and a mouse. If the latter is the case, the creationists would be hoist on their own petard.

---

The paper speaks approvingly of this guy. The web page makes his perl scripts freely available, so it should be easy enough to re-use his techniques on other genomes.


This message is a reply to:
 Message 3 by Telesto, posted 02-19-2014 6:56 AM Telesto has responded

Replies to this message:
 Message 12 by Taq, posted 02-19-2014 2:53 PM Dr Adequate has not yet responded
 Message 15 by Telesto, posted 02-19-2014 4:27 PM Dr Adequate has responded

  
RAZD
Member
Posts: 18669
From: the other end of the sidewalk
Joined: 03-14-2004
Member Rating: 3.8


Message 10 of 32 (719984)
02-19-2014 2:03 PM
Reply to: Message 8 by Taq
02-19-2014 1:26 PM


intentionally misusing science
The main thesis of the creationist paper falls apart once you understand the underhanded tricks that they use to get their 70%. To the uninitiated, it may seem like a subtle difference, but it isn't. The author uses a non-gapped alignment.

That was my first suspicion, the second would be ignoring reversed sequences that still accomplish the same functions.

The author of the creationist paper has rigged the methodology to ignore gaps, and therefore return a false result.

We have seen this type of intentionally misusing science in other areas, such as carbon 14 dating and living animals (seals at McMurdo Sound, etc), and several other dating methodologies.

No surprises.


we are limited in our ability to understand
by our ability to understand
Rebel American Zen Deist
... to learn ... to think ... to live ... to laugh ...
to share.


• • • Join the effort to solve medical problems, AIDS/HIV, Cancer and more with Team EvC! (click) • • •

This message is a reply to:
 Message 8 by Taq, posted 02-19-2014 1:26 PM Taq has responded

Replies to this message:
 Message 11 by Taq, posted 02-19-2014 2:49 PM RAZD has acknowledged this reply

  
Taq
Member
Posts: 6651
Joined: 03-06-2009
Member Rating: 4.0


(2)
Message 11 of 32 (719991)
02-19-2014 2:49 PM
Reply to: Message 10 by RAZD
02-19-2014 2:03 PM


Re: intentionally misusing science
That was my first suspicion, the second would be ignoring reversed sequences that still accomplish the same functions.

As long as the reversed sequence spanned the 300 or 30 base stretches that the author was using, it shouldn't make a difference. Plus/Plus and Plus/Minus strand matches are treated equally.

We have seen this type of intentionally misusing science in other areas, such as carbon 14 dating and living animals (seals at McMurdo Sound, etc), and several other dating methodologies.

No surprises.

There are other "no surprises" moments as well. For example:

"Non-alignable regions are typically omitted and gaps in alignments are often discarded or obfuscated. . .

One of the first publications to compare large regions of the chimpanzee genome with human, was Britten’s lab in 2002 using an in-house Fortran computer program. The study was based on five large DNA fragments (BAC clones) from chimpanzee known to be homologous to human that were thoroughly sequenced. The total length of the DNA sequence for all 5 BACs was 846,016 bases, but only 92% of the DNA aligned to human and the paper reported on only 779,132 bases. The alignment with insertions and deletions (indels) indicated a human-chimp similarity of 95% (Britten 2002). However, when the complete sequence of all 5 BACs is included, a final DNA similarity of 87% is the final figure for the compared homologous regions between chimp and human."
http://www.answersingenesis.org/articles/arj/v4/n1/blastin

The author keeps making a big stink about non-alignable sequence, as if it has higher than normal differences so it is kept out to keep the percentages higher. Of course, that misses the actual truth by a mile. You can't compute the % similarity between two DNA sequences unless you can align them first.

To use an analogy, let's say that you want to find the average weight of a finch on a single island. After 3 months you have snagged 90% of the individuals, and the average weight is 100 grams. Would it be fair to say that the rest of the finches weigh zero grams, so the actualy average weight is actually 90 grams? No. That's not how it works, and yet that is how the author is treating these comparisons. If the sequences can't align he treats them as being 0% similar, which just isn't the case.

Edited by Taq, : No reason given.


This message is a reply to:
 Message 10 by RAZD, posted 02-19-2014 2:03 PM RAZD has acknowledged this reply

  
Taq
Member
Posts: 6651
Joined: 03-06-2009
Member Rating: 4.0


Message 12 of 32 (719992)
02-19-2014 2:53 PM
Reply to: Message 9 by Dr Adequate
02-19-2014 1:55 PM


Re: More Please
But what they've actually done is picked a different method of measuring difference. Your weight in kilograms is not in contrast to your weight in ounces, it's just a different metric.

That's a good analogy. To carry it a little further, if I buy 16 ounces of ham at the store and then find that I have 1 pound of ham when I get home, did the ham lose 94% of its weight on the way home (15/16=0.94)?

That's the kind of game that creationists are trying to play with these numbers.

Edited by Taq, : No reason given.


This message is a reply to:
 Message 9 by Dr Adequate, posted 02-19-2014 1:55 PM Dr Adequate has not yet responded

  
Telesto
Junior Member (Idle past 1081 days)
Posts: 10
From: Zlín
Joined: 02-03-2014


Message 13 of 32 (720007)
02-19-2014 4:03 PM
Reply to: Message 8 by Taq
02-19-2014 1:26 PM


Re: One word: Gaps
Hi Taq,

Gapped alignment
Species A: TATA-AGCGTAGGCAAT
Species B: CATAGAGCGTAGGCAAT
With this alignment, there is a one base indel and one substitution mutation at the very beginning for an overall identity of 15/17, or 88%. Now for the ungapped alignment.

Species A: TATAAGCGTAGGCAAT
Species B: CATAGAGCGTAGGCAAT

Thanks for replay. I am not sure that the blastn algorithm compute the sequence as you described. I think the algorithm rather split the sequence to two parts and compare it separately. No doubt that both cases are not much usefull.

I tried to use same arguments as they did in their research. I changed only word_size to 50. 11 was to small and it took a lot of time and memory. I tried compare whole chromosomes Chimpanzee Y and Human Y (they separated chimp chromosome to 100-450 base long slices - I have no idea why). There should be only 43% similarity. However after few minutes I got results completly confusing.

First of all. There is no ONE result number. The algorithm compared about 650000 sequences with about 400 million bases in summary (Human Y is 60 million bases long, Chimp Y is about 20 million bases long). So compared sequences overlaped many times. I also got mismatch bases. Each sequence had (in my case) percentage identity, sequence length and number of mismatch bases. For example:

97.3% 4552 105

The shortes sequence was 50 bases long (according to attribute word_size). Lowest indentity percentage was 69% and highest 100%.

I really don't know how to get ONE number representing overall similarity from this set of data. The research doesn't say anything about calculation of overall similarity from all of the data.

I made only logical move. I summarized all number of bases in all sequences (about 400 million) and compared it to sum of all mismatch bases. I got with this method for chromosome Y 93% similarity.

I didn't try this with gaps (indels) - I uses parameter -ungapped as they did. I think the number would be similar anyway. More interestingly I compared with this method Human Y chromosome and Human Y chromosome (yes - exactly the same chromosomes) and overall similarity was about 97%!! Not real 100%. So... It looks like this method is completly useless.

However, I didn't get 43% similarity. May be this is caused by comparing whole chromosome and not only slices 100-450 long. Anyway I tried to compare Chimp slices 100 bases long and overall similarity was around 80%-90%. But I didn't try it for whole chromosome - only a few slices.

Please do you have any idea how they got the overall similarity by blastn algorith. Do you know this programe??


This message is a reply to:
 Message 8 by Taq, posted 02-19-2014 1:26 PM Taq has responded

Replies to this message:
 Message 16 by RAZD, posted 02-19-2014 4:35 PM Telesto has responded
 Message 22 by Taq, posted 02-19-2014 6:28 PM Telesto has responded

    
NosyNed
Member
Posts: 8782
From: Canada
Joined: 04-04-2003
Member Rating: 4.1


(2)
Message 14 of 32 (720008)
02-19-2014 4:14 PM
Reply to: Message 8 by Taq
02-19-2014 1:26 PM


Gaps
You know, you'd think I wouldn't be surprised. Of course being AiG I knew there would be something done to twist the result. But they manage to surprise me that they would do something so obviously fraudulent. You say "subtle" but I don't see it that way.
This message is a reply to:
 Message 8 by Taq, posted 02-19-2014 1:26 PM Taq has not yet responded

  
Telesto
Junior Member (Idle past 1081 days)
Posts: 10
From: Zlín
Joined: 02-03-2014


Message 15 of 32 (720011)
02-19-2014 4:27 PM
Reply to: Message 9 by Dr Adequate
02-19-2014 1:55 PM


Re: More Please
Hi Dr Adequate,

Psst ... you mean similarities, not differences.

Oh... sure. Thanks

As Taq points out, the creationists are ignoring the possibility of indels, so they get a different and less meaningful figure.

In theory - yes. However, I think that blastn program does something else. I does not compare base after base. It search referenced sequence and is trying to find some similarity everywhere. Algorith compares overlapping sequences many times and is trying to find best mach. From longest sequences to shortes. The only difference between gapped and ungapped sequences would be in total length of longest sequences (that is my guest - I didn't try it).

Anyway I tried compared ungapped sequences and the result (useless in my opinion) is still high above 43%. So where in the hell they got this number?

An excellent idea.

After few hours playing with blastn program I have better one. Compare two identical chromosomes - expected result is 100%. But first of all I need to get 43% similarity for chromosome Y. When I verify the methodology I can go on with other jobs.

From the data I can find, humans and chimps should be further apart than the two sequenced species of macaques, which belong to the same genus; maybe a little further apart than a domestic cat and a tiger; and closer than a rat and a mouse. If the latter is the case, the creationists would be hoist on their own petard.

Yes... rat and mouse are more different than human and chimp. And for both the genom is already sequenced and available for free.

The paper speaks approvingly of this guy. The web page makes his perl scripts freely available, so it should be easy enough to re-use his techniques on other genomes.

I am not sure if this algorithm was used in the particular research. But I will look at this script and try to reproduce those numbers.


This message is a reply to:
 Message 9 by Dr Adequate, posted 02-19-2014 1:55 PM Dr Adequate has responded

Replies to this message:
 Message 17 by Taq, posted 02-19-2014 4:52 PM Telesto has responded
 Message 19 by Dr Adequate, posted 02-19-2014 5:20 PM Telesto has responded

    
1
23Next
Newer Topic | Older Topic
Jump to:


Copyright 2001-2015 by EvC Forum, All Rights Reserved

™ Version 4.0 Beta
Innovative software from Qwixotic © 2017