Understanding through Discussion


Welcome! You are not logged in. [ Login ]
EvC Forum active members: 78 (8896 total)
Current session began: 
Page Loaded: 03-21-2019 6:26 PM
114 online now:
AZPaul3, JonF, Minnemooseus (Adminnemooseus), Tangle, vimesey (5 members, 109 visitors)
Chatting now:  Chat room empty
Newest Member: WookieeB
Post Volume:
Total: 848,531 Year: 3,568/19,786 Month: 563/1,087 Week: 153/212 Day: 20/49 Hour: 0/1


Thread  Details

Email This Thread
Newer Topic | Older Topic
  
Prev12
3
Author Topic:   Sequence comparisons (Bioinformatics?)
mark24
Member (Idle past 3271 days)
Posts: 3857
From: UK
Joined: 12-01-2001


Message 31 of 42 (216456)
06-12-2005 3:58 PM
Reply to: Message 27 by randman
06-11-2005 5:24 PM


Re: here's one with CytoB
randman,

74 and 76 are nearly identical in some respects.

It's the type & location of differences that are informative in deriving trees, not the just the percentile number of differences.

Mark


There are 10 kinds of people in this world; those that understand binary, & those that don't
This message is a reply to:
 Message 27 by randman, posted 06-11-2005 5:24 PM randman has not yet responded

Replies to this message:
 Message 33 by NosyNed, posted 06-12-2005 5:03 PM mark24 has responded

    
mark24
Member (Idle past 3271 days)
Posts: 3857
From: UK
Joined: 12-01-2001


Message 32 of 42 (216459)
06-12-2005 4:00 PM
Reply to: Message 28 by Wounded King
06-12-2005 9:06 AM


Re: XP compatible version of ClustalX
Hi WK,

I'm having a problen generating a sequence file to load into clustalX. How do you manage it?

Mark


There are 10 kinds of people in this world; those that understand binary, & those that don't
This message is a reply to:
 Message 28 by Wounded King, posted 06-12-2005 9:06 AM Wounded King has responded

Replies to this message:
 Message 34 by Wounded King, posted 06-13-2005 2:33 AM mark24 has responded

    
NosyNed
Member
Posts: 8838
From: Canada
Joined: 04-04-2003


Message 33 of 42 (216471)
06-12-2005 5:03 PM
Reply to: Message 31 by mark24
06-12-2005 3:58 PM


Other than percentages??
It's the type & location of differences that are informative in deriving trees, not the just the percentile number of differences.

In what way are all these used to determine a tree? Is there somewhere explaining it in some detail?

Thanks.


This message is a reply to:
 Message 31 by mark24, posted 06-12-2005 3:58 PM mark24 has responded

Replies to this message:
 Message 36 by mark24, posted 06-13-2005 3:44 AM NosyNed has not yet responded

  
Wounded King
Member (Idle past 2170 days)
Posts: 4149
From: Edinburgh, Scotland
Joined: 04-09-2003


Message 34 of 42 (216518)
06-13-2005 2:33 AM
Reply to: Message 32 by mark24
06-12-2005 4:00 PM


Input format for ClustalX
The easiest form of input is simply to make a plain text file with a set of FASTA data like those we have been using previously. You can get Genbank or GenPept to display the DNA/protein sequences in FASTA format and just c+P them into a txt file. You can even get a set of FASTA files throught the Homologene database.

TTFN,

WK


This message is a reply to:
 Message 32 by mark24, posted 06-12-2005 4:00 PM mark24 has responded

Replies to this message:
 Message 35 by mark24, posted 06-13-2005 3:27 AM Wounded King has not yet responded

    
mark24
Member (Idle past 3271 days)
Posts: 3857
From: UK
Joined: 12-01-2001


Message 35 of 42 (216523)
06-13-2005 3:27 AM
Reply to: Message 34 by Wounded King
06-13-2005 2:33 AM


Re: Input format for ClustalX
WK,

Got it, I don't think it would upload because I had 2 line breaks between the label & sequence. Thanks.

Mark


There are 10 kinds of people in this world; those that understand binary, & those that don't
This message is a reply to:
 Message 34 by Wounded King, posted 06-13-2005 2:33 AM Wounded King has not yet responded

    
mark24
Member (Idle past 3271 days)
Posts: 3857
From: UK
Joined: 12-01-2001


Message 36 of 42 (216525)
06-13-2005 3:44 AM
Reply to: Message 33 by NosyNed
06-12-2005 5:03 PM


Re: Other than percentages??
Ned,

>1
aaaaaaaaa
>2
ttttttttt
>3
aaaaaaatt
>4
tttttttaa
>5
ataaatatt

In the above example, we should see 1&3 group together, & 2&4 due to shared similarities. Omit sequence 5 & try it.

With sequence 5 added, you may note that it is overall more similar at any given loci to 1&3 as comparted to 2&4, & so should appear as a sister group to 1&3 rather than 2&4.

Hope that helps.

Mark


There are 10 kinds of people in this world; those that understand binary, & those that don't
This message is a reply to:
 Message 33 by NosyNed, posted 06-12-2005 5:03 PM NosyNed has not yet responded

    
Wounded King
Member (Idle past 2170 days)
Posts: 4149
From: Edinburgh, Scotland
Joined: 04-09-2003


Message 37 of 42 (216526)
06-13-2005 4:58 AM


If people are really interested in bioinformatics then I would definitely recommend downloading the Phylip suite of programs. It is fairly fiddly and technical to use all of the programs to do exactly what you want but the analyses produced are considerably more powerful and sophisticated than the sort of things we have been doing with Clustal in terms of phylogenetics.

TTFN,

WK


Replies to this message:
 Message 40 by derwood, posted 07-10-2005 4:00 PM Wounded King has not yet responded

    
Wounded King
Member (Idle past 2170 days)
Posts: 4149
From: Edinburgh, Scotland
Joined: 04-09-2003


Message 38 of 42 (220954)
06-30-2005 12:49 PM


Another nice new program has just been released. V2 of Jalview is out now and while it is not the best program for doing your actual alignments it allows you to visualise pre-existing alignments quite nicely and does some quite cool things with the trees, although it won't let me specify an outgroup for some reason. It also lets you run sequences through ClustalW and will display your data as a principal component analysis plot.

Some quite nice features to play around with.

TTFN,

WK


    
derwood
Member
Posts: 1457
Joined: 12-27-2001


Message 39 of 42 (222974)
07-10-2005 3:54 PM


I would recommend the PAUP* package. It is not free ($85 when I purchased mine 2 years ago), but it is easy to use and offers a wide variety of programs and parameters with which to analyze data (nucleotide, protein, or morphological (coded)).

Ned asked how such data is used to generate trees.
In short, parsimony and likelihood algorithms analyze the data for patterns of nucleotide substitution. Indeed, the degree of similarity is ignored in such programs for identical nucleotide/amino acid sites are irrelevant.
Distance methods do use 'similarity', but I do not use such methods much.
Amino acid sequence data is not used in phylogentic analyses nearly as much as it used to be for a couple of reasons - DNA data can provide at least 3 times the phylogenetic informton that amino acid sequence data can (hypothetically, providing we are only using protein coding sequence).
Of course, non-coding DNA is usually much more phylogenetically informative in that it can accumulate more substitutional change than can conserved sequence (such as protein coding sequence).

Making up the input files for these programs can be tedious and frustrating. PAUP, for example, will produce an error message if you have misplaced punctuation (certain symbols are used in tghe input files - e.g., a ";" is used to denote the end of a data block) but it will not tell you where the missing symbol is (at least the earlier versions did not - I think the new one does).
Someone had mentioned making plain text files - that usually works.
I am pretty lazy, so when I am making a new input file, I usually just use an old one that I know works and cut and paste the new data into it.

As for the supposed anomalous trees using cytochrome C and B, immediately the use of amino acid data tells me not to put much stock in it, plus the fact that as has been mentioned, they represent only two small loci (mitochondrial loci at that, which are known to in genral mutate faster than nuclear genes).


    
derwood
Member
Posts: 1457
Joined: 12-27-2001


Message 40 of 42 (222976)
07-10-2005 4:00 PM
Reply to: Message 37 by Wounded King
06-13-2005 4:58 AM


I agree with WK re: the use of Clustal for analysis.

For one thing, we are assuming that the alignment Clustal produced is optimal or at least very good.
In my experience with Clustal, it produces good starting alignments that then need to be re-done by eye. Of course, I am used to doing alignments with 20 to 45 species each with up to 12 thousand nucleotides. It may work perfectly for 30 to 100 amino acids, but when you start tossing in big indels and such in huge nucleotide files, it starts spitting out weird results. I recall once putting in just 2 sequences and the result it gave me was one entire sequence in a row, followed by the second entire sequence - no alignment at all. Whoever wrote it is right - there are all sorts of parameters you can fiddle with that can help avoid problems like that, even so, with big files, I have found alignment programs of several types only good for getting a starting point.

And, if you are suign a questionable alignment, then one should expect any results from any analyses to be odd.


This message is a reply to:
 Message 37 by Wounded King, posted 06-13-2005 4:58 AM Wounded King has not yet responded

    
Brad McFall
Member (Idle past 3108 days)
Posts: 3428
From: Ithaca,NY, USA
Joined: 12-20-2001


Message 41 of 42 (222997)
07-10-2005 6:45 PM
Reply to: Message 1 by NosyNed
06-08-2005 6:45 PM


RNA as demonic unscrambler of comparisons
S. Wolfram, in the book MATHEMATICA, makes the claim that seems to me somewhat correct, that there will never be a Maxwell of Kant's grass biology able to "unscramble"(Wolfram's scientific term) a volume of gas but I can not get out of my mind the possiblity that RNA could be demon unscrambling"" 1-D symmetry projections to effects on molecular motion in the enviornment of DNA and proteins by means of the analyticity of the other kind of 1-D symmetry under rules from the 2nd law of thermodynamics quantum mechanically.

If this kind of reverse engineering was congizable sequence comparisions should indicate configurations of nanotechnology more desirable than those guessed at by starting from random walks.

It would then be interesting to develop means of relating DNA and protein comparisons to a similar distance metric. Perhaps that has already been done. I am not an expert in bioinformatics.


This message is a reply to:
 Message 1 by NosyNed, posted 06-08-2005 6:45 PM NosyNed has not yet responded

    
Wounded King
Member (Idle past 2170 days)
Posts: 4149
From: Edinburgh, Scotland
Joined: 04-09-2003


Message 42 of 42 (232274)
08-11-2005 11:54 AM


That is not dead which can eternal lie....
*Bump*

Just bringin this up to the surface since we have a number of new faces on the board at the moment, and also 'cos modulous was mentioning notbeing able to find it with search at the moment.

TTFN,

WK


    
Prev12
3
Newer Topic | Older Topic
Jump to:


Copyright 2001-2018 by EvC Forum, All Rights Reserved

™ Version 4.0 Beta
Innovative software from Qwixotic © 2019