|
Register | Sign In |
|
QuickSearch
Thread ▼ Details |
|
Thread Info
|
|
|
Author | Topic: Recent paper with an ID spin? Abel and Trevors (2005). | |||||||||||||||||||
Halbwertszeit Inactive Member |
No, but he did write 'Climbing Mount improbable' Yes of course. That was a rethorical question. And after reading the
|
|||||||||||||||||||
waqasf  Inactive Member |
Spam deleted.
Edited by AdminModulous, : No reason given.
|
|||||||||||||||||||
Smooth Operator Member (Idle past 5144 days) Posts: 630 Joined: |
quote:Is there something wrong with that? quote:Why? Because they allowed something to be published that you do not approve of? quote:No, they do not "replace" the information theory. They build up on it. Shannon's description of information is the first and the lowest possible description of information there is. It deals ony with statistical aspect of information. There is still syntax, semantics, pragmatics and apobetics do deal with. quote:As you can clearly see, they actually made another paper where they actually do make mathematical calculations for functional biological information. So there is nothing wrong with their work in extending Shannon's notion of information. Which supports ID. Measuring the functional sequence complexity of proteins | Theoretical Biology and Medical Modelling | Full Text
quote:Well isn't it different? quote:I don't know how Crweation Sicence is unsuccessful, but I see nothing wrong in defining terms properlly. Do you? quote:Going a bit off topic here aren't we? quote:Just like naturalists were waiting fro their chance in the early 19th century? Right?
|
|||||||||||||||||||
Wounded King Member Posts: 4149 From: Cincinnati, Ohio, USA Joined: |
Hi Smooth Operator,
Thats an interesting extension of the previous work. And I'm glad to say much more understandable than the orignal Abel and Trevor's paper. I think it is a worthy attempt to try and integrate a bioinformatic approach to biological function into their research into sequence complexity. The problem as I see it with this approach is that the actual functionality component is so subjective. The functionality variable is pretty much the only thing to differentiate this approach from any other purely sequence comparison/conservation based approach. Unfortunately it is essentially left as an exercise for the user to decide what to do with that variable. There seems to be considerable scope for sequences chosen on the basis of some functional criteria to produce misinterpretation of results due to the fact that there are multiple functional criteria of the types discussed in the paper which could be chosen. For instance if we choose a set of sequences based on their functional ability to bind a specific DNA sequence how do we know that it isn't simply generic DNA binding ability or some entirely other commonality of function effect that is being identified? This is both the most important and by far the least satisfactory part of their methodology since in their own dataset analysis they don't actually tell us what the functionality variables for the different protein families were or what they were derived from. As far as I can tell from looking at the python program they include the function variable is simply derived from the sequence analysis. This seems to make the whole thing nothing more than an exercise in saying 'I think all these sequences share some common function' and you have to question how well such an approach would be suited to analysing anything other than exactly the sort of protein sequence sets they look at in the paper and even then the extent to which you could ascribe any similarities to the specifc shared functional criteria. Isn't it better to leave the imputation of functionality to specific elements to downstream analysis of sequences identified as significant through the initial analysis. If 'Fits' and 'FSC' are an effective way of identifying common functionally significant sites between sequences then that is all to the good. The presupposing of exactly what function you are looking for seems an unnecessary additional step, especially when using pfam family membership is apparently the only selection criteria used for discriminating function. They also leave open the problem they recognise in terms of only having a best guess at what the whole functional sequence space is based on the collated functional sequences. The possibility of unknown functional sequences means that an underestimation of the likelihood of any particular function arising is almost unavoidable. That is even before considering functionally equivalent but genetically/structurally distinct proteins. This definitely seems like a worthwhile endeavour if it could discriminate something more than traditional sequence analysis methods do but certainly doesn't offer much support for ID, at least beyond being the prelude to an argument from big numbers based on a calculation of something like the "FSC value for an entire prokaryotic cell where the genome has been sequenced and all translated proteins are known" which they mention in the discussion. Such a calculation would of course serve to compound any errors arising from the other more subjective elements of the process. There are other sophisticated information theoretic methods for identifying functional sequences from alignments, see for example Capra and Singh (2007). I'm not sure that this does anything more sophisticated or usable than those methods, or that it would discriminate a specific functionality any more readily. TTFN, WK
|
|||||||||||||||||||
Smooth Operator Member (Idle past 5144 days) Posts: 630 Joined: |
quote:It's not really subjective. Anyone who examines a specific part of the genome will find the same objective function there. Biological functions are objective. quote:We select them by which biological function they perform. Binding is not a function. Coding for a eye is. Coding for a flagellum is. Since those structures perform biological functions. quote:I really don't se where the problem is. If you know the protein performs a function, you just need to find it's sequence on the genome and you found teh functional part of the DNA sequence for that biological function. quote:But we are supposed to measure biological functions. If we don't find them in a given sequence than what are we measuring? quote:This just means we are not measuring nothing and claiming it's information. The more functions there are the better. With more work, we will be able to analyze more functions. quote:Any information based research is an indirect support for ID, becasue id is based on the idea that design is only the product of a designing mind that produces information.
|
|||||||||||||||||||
Wounded King Member Posts: 4149 From: Cincinnati, Ohio, USA Joined: |
It's not really subjective. Anyone who examines a specific part of the genome will find the same objective function there. Biological functions are objective. If our knowledge was perfect I might agree, but given the high possibility of our failing to recognise a function. The function is not subjective but our knowledge and understanding of it can be.
We select them by which biological function they perform. Binding is not a function. Coding for a eye is. Coding for a flagellum is. Since those structures perform biological functions. Have you even read the Durston et. al, paper? If you had surely you would understand that the very data they test their approach with was selected based on the way they were assigned to PFAM families which is based on similarities of functional structural domains for activites such as protein-protein or protein-DNA interactions, they are certainly not classified as 'Coding for a eye' or 'Coding for a flagellum'.
If you know the protein performs a function, you just need to find it's sequence on the genome and you found teh functional part of the DNA sequence for that biological function. That isn't information science, that is simply basic biology. You identify a functional change has occurred from a change in phenotype and then use genomic sequence comparisons or classical genetics to identify where the change occurred and therefore the functional sequence which was altered leading to the change. That isn't what Durston et al. did. Maybe we should see if we can get the program up and running and try it out in a whole set of sequences from something like a common developmental pathway? How does that sound? I'm skeptical myself since their method seems to rely so heavily on sequence alignment. I'm happy to accept that they can identify conserved functional sites within a protein family, so can many sequence analysis methods. I am doubtful their method will work for a heterogenous set of sequences linked by a function such as 'build and eye' or build a flagellum'. Do you think it would work?
Any information based research is an indirect support for ID, becasue id is based on the idea that design is only the product of a designing mind that produces information. It is only support if that is actually true, you keep missing the point that not everyone already accepts your assumptions, if they did we wouldn't be having a discussion. TTFN, WK
|
|||||||||||||||||||
Smooth Operator Member (Idle past 5144 days) Posts: 630 Joined: |
quote:But if someone does find a function and tells someone, he will also see it. It's not an invention it's really there. I mean, are you saying that the beating of the heart is subjective?
quote:But those sequences have biological functions by default. Those sequences code for proteins and flagellum is made from proteins. So where do you see the problem? quote:They don't ahve to do it. They already know, everybody does, that protein coding sequences have biological functions. quote:Why should in not work, since it is well known that protein coding sequences are coding for biological functions? quote:True, but the problem is, if someone does not accept it, than they are simply having blind faith. I have never seen information arising from a natural process without an intelligence to guide it. Have you?
|
|||||||||||||||||||
Wounded King Member Posts: 4149 From: Cincinnati, Ohio, USA Joined: |
I mean, are you saying that the beating of the heart is subjective? No I'm saying that different levels of understanding of the functioning of the heart can give rise to subjectivity in discussion of it, and that the same is true of functions at the molecular level.
So where do you see the problem? In you saying that it isn't protein-binding function that is being looked at when anyone who had read the paper would see that is exactly the sort of functions that they were looking at and 'Coding for a flagellum' was not.
They already know, everybody does, that protein coding sequences have biological functions. But they don't know which sequences have what function in all cases, even when they know what process the protein is involved in they may still not understand its molecular function.
Why should in not work, since it is well known that protein coding sequences are coding for biological functions? Because only an idiot would think that the point of the paper is to point at protein coding sequences and say 'these have some function'. The point is to identify regions in the proteins with functional information and from that to be able to derive a value for the functional information of the whole protein and possibly from there to higher levels of organisation up to whole genomes. If you don't understand the paper that is fine, but if you do understand it then I'm not sure why you are making such facile statements rather than a coherent argument based on the papers methodology. The reason I don't think it will work is because their method is heavily reliant on protein sequence alignment, which will obviously work for PFAM families, and all the proteins involved in a specific function will be hugely structurally diverse making such an alignment virtually impossible, how therefore would such an analysis proceed? Durston et al. unfortunately leave this question simply hanging. Perhaps their idea is that you derive a value for each constituent part based on an analysis of its close structural relatives. But then you are bringing into your analysis multiple genes which are not involved in your function of interest at all and your derived values will be almost totally divorced from that specific function. I have got the program up and running and recapitulated their results with their test data, unfortunately they use a very clunky format for the sequence data, and the paper does not make it very clear how it can be adapted for other sets of data. TTFN, wK
|
|||||||||||||||||||
Wounded King Member Posts: 4149 From: Cincinnati, Ohio, USA Joined: |
Smooth Operator doesn't particularly seem to care about actually discussing the details of the paper and whether or not this is actually research which supports ID, but the Durston et al. paper piqued my interest so I downloaded their programs to give it a go.
The 'program' consists of a series of python scripts and it is very easy to get those working. It is slightly harder to get the sequence data in quite the right format. The program expects a single line containing all the data, every sequence entry should be the same length and may include a name field of a set length. It took me a while comparing their test data, an alignment of P53 relatives from PFAM, with my own, an alignment of all the Helix-Loop-Helix proteins in PFAM, to work out that the length of the name is included in the length of the whole sequence. Apart from sequence length their are a few variables you can alter, these all need to be changed within the python script, even the data file needs to be put into the script as it stands. I'm sure it wouldn't take long to make the scripts take command line options and therefore be much more flexible, but as it stands changing the main program is the simplest way. The program runs fine and outputs a few things at the end including a fitness value for each residue in the sequence. I think a little bit of chnaging to the script that prints out these values might help make them more suitable for importing into a sequence alignment program but at the moment it takes a little while to hoik them out and reformat them. The program is also supposed to work with DNA sequences but it would take another little bit of re-writing to set up. The output from my HLH family was that it had 52 amino acid sites which had high enough conservation to be above the programs set cutoff. It has a total Hf of 93, I'm still not sure what this represents. The Hf is supposed to 'quantify the change in functional uncertainty between two biopolymeric states with regard to biological functionality'. It isn't clear how this relates to any specific sequence however, or to the entire alignment. The Shannon uncertainty for the alignment is measured as 224. The FSC value for the family comes out as 504 fits. With a little bit of text manipulation the per residue Fits value can be incorporated as an annotation track on the multiple sequence alignment program Jalview. Jalview runs from Javawebstart and can be accessed at https://www.jalview.org/download/ . My project file for the HLH family can be found here. If you load it in you will see the sequence alignment for the HLH family and the Fits annotation track. The Fits track seems to agree with the other measures, especially 'Quality'. In many ways Jalviews 'Quality' measurement is amore sophisticated method than that of Durston et al.. As well as taking into account the variation at the site 'Quality' is based on the Blosum26 matrix which takes into account the physicochemical make up of the amino acids and weights substitutions accordingly. This gives a similar measure of functional conservation at a site to that from the Fits calculation, but seems more tied into the actual biochemistry of the protein. I am trying to reconstitute the p53 data set they give as test data to get it into the sequence alignment program. If I have any luck I'll let you know. TTFN, WK
|
|||||||||||||||||||
Percy Member Posts: 22505 From: New Hampshire Joined: Member Rating: 5.4 |
I am fascinated, I would like to download the program and get into it, but I'm spreading myself a bit too thin these days.
But I do have a question. You said this in Message 38:
Wounded King writes: No I'm saying that different levels of understanding of the functioning of the heart can give rise to subjectivity in discussion of it, and that the same is true of functions at the molecular level. So given all we don't know about any particular protein's function, how can there realistically be any objective measure of FSC? Possibly I don't understand the proper definition of functional. Maybe it's a bit like information where the definition within information theory is much more constrained than in general usage. But the Abel/Trevors paper just jumped right in as if we all know what FSC is, and Durston's paper wasn't any better, e.g., this from the "Background" section:
Durston writes: As Abel and Trevors have pointed out, neither RSC nor OSC, or any combination of the two, is sufficient to describe the functional complexity observed in living organisms, for neither includes the additional dimension of functionality, which is essential for life [5]. FSC includes the dimension of functionality [2,3]. Szostak [6] argued that neither Shannon's original measure of uncertainty [7] nor the measure of algorithmic complexity [8] are sufficient. Shannon's classical information theory does not consider the meaning, or function, of a message. Algorithmic complexity fails to account for the observation that 'different molecular structures may be functionally equivalent'. Not very helpful. I'm reminded of the mousetrap argument by Ken Miller. The function of a mouse trap is to catch mice. But you can also use it as a tie tack. When a window's open in the office you could use one as a paper weight. And how do you quantify these functions? This is the way it feels to me, and that's why I've paid little attention to Abel and Trevors et. al., but if you're looking into this then it tells me you must think there's something to it. --Percy Edited by Percy, : Fix format.
|
|||||||||||||||||||
Smooth Operator Member (Idle past 5144 days) Posts: 630 Joined: |
quote:So you agree that there are objective biological functions? quote:I still get what you are trying to say. Are yous aying that sequences do not represent proteins that code for the flagellum, or any other biological machine? quote:It doesn't matter. If it has the function, it has one. The point is to measure the functions, not to explain what they do in detail. quote:But I never said that. quote:Exactly, that's what I've been saying the whole time. quote:And why exactly should their difference in sequence be an obstacle?
|
|||||||||||||||||||
Wounded King Member Posts: 4149 From: Cincinnati, Ohio, USA Joined: |
This is the way it feels to me, and that's why I've paid little attention to Abel and Trevors et. al., but if you're looking into this then it tells me you must think there's something to it. The thing is, I don't think there is anything to it hasn't been done before and done better by mainstream bioinformatics researchers, but I'm prepared to have a look in case I'm wrong. I've just been considering tying this into the other information thread and seeing what their program says about fitness in the DNA gyrases, but I still don't see how you can compare mutant strains when all the calculations are being performed on alignments. Does one swap out the wild type for the mutant in the alignment? THat might be a worthwhile experiment, but it still doesn't say anything about a change in function. I worry that Durston et al.'s approach suffers from the same sort of platonic thinking as SO's that there is some ideal sequence (or set of sequences) and anything outwith that represents a reduction in information, regardless of the actual effect, or lack thereof, on the function of the mutant. I think they are just re-inventing the wheel of using shannon entropies and amino acid conservation or characteristics to detect conserved functional structures or residues. But they don't seem to provide anything beyond that, other than the idea that you can add up all the values form various different functional parts of a whole system and come out with some meaningful overall measure opf fucntional complexity, which just sounds like another prelude to an IDist argument from big numbers by multiplying a whole lot of things together and saying, 'see how complex this is!! It couldn't possibly evolve!!' TTFN, WK Edited by Wounded King, : No reason given.
|
|||||||||||||||||||
Wounded King Member Posts: 4149 From: Cincinnati, Ohio, USA Joined: |
And why exactly should their difference in sequence be an obstacle? Because they only show how to measure FSC for a set of aligned sequences with a shared putative function. TTFN, WK
|
|||||||||||||||||||
Smooth Operator Member (Idle past 5144 days) Posts: 630 Joined: |
quote:And you thought that FSC was a perfect method that can meassure anything from it's first go? You didn't think that the scientists needed time to improve their models to be able to do more?
|
|||||||||||||||||||
Wounded King Member Posts: 4149 From: Cincinnati, Ohio, USA Joined: |
And you thought that FSC was a perfect method that can meassure anything from it's first go? They claim you can apply their method to similarly functional but molecularly dissimilar proteins. I think that if they make some claims they should be able to support them with something, I can see how you wouldn't think so though, since you absolutely refuse to support any of your own claims beyond bare repetition.
You didn't think that the scientists needed time to improve their models to be able to do more? Well have they? Its been 2 and a half years since the paper was published. TTFN, WK
|
|
|
Do Nothing Button
Copyright 2001-2023 by EvC Forum, All Rights Reserved
Version 4.2
Innovative software from Qwixotic © 2024