Register | Sign In


Understanding through Discussion


EvC Forum active members: 64 (9164 total)
2 online now:
Newest Member: ChatGPT
Post Volume: Total: 916,832 Year: 4,089/9,624 Month: 960/974 Week: 287/286 Day: 8/40 Hour: 0/4


Thread  Details

Email This Thread
Newer Topic | Older Topic
  
Author Topic:   Recent paper with an ID spin? Abel and Trevors (2005).
Wounded King
Member
Posts: 4149
From: Cincinnati, Ohio, USA
Joined: 04-09-2003


Message 58 of 85 (521392)
08-27-2009 10:20 AM
Reply to: Message 57 by Percy
08-27-2009 9:52 AM


Re: Durston et al . program
Since their measurement technique is actually based on sequence comparisons the question is rather what would they show for
  • A - A heterogenous set of randomly generated proteins.
  • B - A data set consisting of 1 randomly generated protein compared to itself multiple times.
  • C - A set of sequences all of which are randomly mutated forms of an initial randomly generated sequence.
    They have done one of these (A) as part of their paper ...
    To illustrate the FSC of sequences that only had RSC, an array of 500 uniformly random sequences were generated, each having 1000 sites. The array was input into the software to compute the value in Fits of the FSC of the set of random sequences. To illustrate OSC, the Fit value of a 50-mer sequence of polyadenosine produced on Montmorillonite clay was calculated according to Eqn.
    So they have looked at a set of randomly generated sequences and a single instance consisting of simply 1 repeating polyadenosine poypeptide. It is perhaps important to note that they don't use the same method to compare both, perhaps this is because if you put multiple sets of polyadenosine sequences into their program what actually comes out is that such sequences have a maximal Functional Complexity value.
    They are using Abel and Trevors divisions of sequence complexity into Ordered (OSC), Random (RSC) and Functional (FSC). But they only really seem to be determining conservation with a slight twist. They aren't actually detecting functionality any differently than all the standard bioinformatic methods. They certainly aren't showing they can make meaningful comparisons of a function on the level of 'eye development' rather than 'specific DNA binding site conserved in PFAM family'.
    TTFN,
    WK

  • This message is a reply to:
     Message 57 by Percy, posted 08-27-2009 9:52 AM Percy has replied

    Replies to this message:
     Message 59 by Percy, posted 08-27-2009 11:05 AM Wounded King has replied

      
    Wounded King
    Member
    Posts: 4149
    From: Cincinnati, Ohio, USA
    Joined: 04-09-2003


    Message 60 of 85 (521408)
    08-27-2009 11:36 AM
    Reply to: Message 59 by Percy
    08-27-2009 11:05 AM


    Re: Durston et al . program
    But this doesn't really apply to single proteins. They do compare sets of real world proteins, their PFAM analyses, with sets of randomly generated proteins of roughly similar length. Naturally the set of randomly generated proteins has much less FCS/conservation than the PFAM families, see Table 1 in Durston et al. (2007). But this isn't because ofthe magic of ID, its because of conservation and all the work PFAM has done categorising and aligning structurally related protein families.
    TTFN,
    WK
    Edited by Wounded King, : No reason given.
    Edited by Wounded King, : No reason given.
    Edited by Wounded King, : No reason given.

    This message is a reply to:
     Message 59 by Percy, posted 08-27-2009 11:05 AM Percy has replied

    Replies to this message:
     Message 61 by Percy, posted 08-27-2009 2:20 PM Wounded King has replied

      
    Wounded King
    Member
    Posts: 4149
    From: Cincinnati, Ohio, USA
    Joined: 04-09-2003


    Message 62 of 85 (521577)
    08-28-2009 4:52 AM
    Reply to: Message 61 by Percy
    08-27-2009 2:20 PM


    Re: Durston et al . program
    From the program what they do is calculate what they call Functional site entropy. This is based on the conservation of no just the most prevalent amino acid at a position but conserved alternative amino acids, i.e. traditional conservation will score a site which amongst 10 sequences has 8 tryptophans, and 2 alanines the same as one with 8 tryptophans, 1 cysteine and 1 alanine while the Durston et al program would score the 1st example as higher due to the recurrence of tryptophan.
    The thing is to calculate the FCS for any sequence you need a 'ground state' sequence to compare it to which is described as ...
    Durston et al writes:
    The ground state g (an outcome of F) of a system is the state of presumed highest uncertainty (not necessarily equally probable) permitted by the constraints of the physical system, when no specified biological function is required or present.
    They term a random sequence such as you describe as the 'null state' and state that it can be functionally substituted for the ground state since ...
    Durston et al writes:
    actual dipeptide frequencies and single nucleotide frequencies in proteins are closer to random than ordered
    You then calculate functional uncertainty for the sequence by comparing thee ground or null state to your actual state or states. This is the bit I have trouble understanding it is where they introduce actual biological function in to the mix ...
    Durston et al writes:
    Xf denotes the conditional variable of the given sequence data (X) on the described biological function f which is an outcome of the variable (F). For example, a set of 2,442 aligned sequences of proteins belonging to the ubiquitin protein family (used in the experiment later) can be assumed to satisfy the same specified function f, where f might represent the known 3-D structure of the ubiquitin protein family, or some other function common to ubiquitin. The entire set of aligned sequences that satisfies that function, therefore, constitutes the outcomes of Xf.
    I'm still very unclear what actual sort of variable they are deriving for their functions? Is it simply all the variability in a set of sequences which fulfill a specific functional criteria? How can you determine such a figure for a single sequence? The only example they give is for the poly-a sequence, which they say has no functional uncertainty since it is always the same.
    This seems to leave the determination of function so subjective as to allow one to plug in almost anything.
    If you have the time someone with a bit more mathematical /computational knowledge having a look would definitely be good. I'm only an incidental sort of bioinformatician.
    TTFN,
    WK

    This message is a reply to:
     Message 61 by Percy, posted 08-27-2009 2:20 PM Percy has replied

    Replies to this message:
     Message 63 by Percy, posted 08-28-2009 6:59 AM Wounded King has not replied

      
    Wounded King
    Member
    Posts: 4149
    From: Cincinnati, Ohio, USA
    Joined: 04-09-2003


    Message 66 of 85 (522971)
    09-07-2009 8:00 AM
    Reply to: Message 64 by Smooth Operator
    09-06-2009 9:39 AM


    What doy ou mean by this?
    I mean other than saying, 'this PFAM family are all likely to share a common function', they give no usable criteria for how one can investigate any other aspect of biological function.
    They give lots of examples of what they think it could be applied to, but none of them are doable the way they describe their process working. I can't derive an FSC value for the FGF signaling pathway by making an alignment of all the genes/proteins in the pathway, because they won't align. Am I supposed to make alignments for every element with whatever relatives I can find? Should they be within the same organism? From different species? They say you can compare functionally similar structurally distinct proteins, but they don't say how.
    The main problem with this approach is that your measures could be entirely wrong because you just don't know what sequences could perform a particular biological function, you only know which ones you have which putatively do perform that function. So you will always be overestimating the FSC.
    That is why they said that the genome of a certain organism has to be sequenced and studied first. That you can apply the method to other proteins.
    Except their examples use proteins from multiple different species, not just one genome. And the proteins haven't just been sequenced they have also been aligned and assigned to families based on structural similarity. It seems like all of the work has already been done here, what is Durston et al.'s program adding to the mix, except a slightly varied form of conservation metric, of which several already exist?
    What exactly does the BLOSUM measure? The functionality of the protein, or just the complexity?
    It is a measure of the conservation of amino acids which takes into account the nature of the substitutions which aren't most conserved, as the Durston et al. program does, but also takes into account the physiochemical/functional properties of the amino acids by scoring substitutions based on a matrix of amino acid substitutions scores derived from a number of highly conserved protein sequences. There are different Blosum matrices depending on the similarity of the sequences under investigation. Like Durston et al.'s FSC measure it is calculated for each residue.
    When you add up all the probabilities, you come up with a really small probability, and yes, random chances has to resolve it all.
    No it didn't. Random chance at worst had to generate all of the genetic variability involved, something that mutational mechanisms do all the time. Be that as it may it still doesn't obviate the grossly mistaken assumption that the way things are currently is the only possible functional conformation for things. So all you are calculating is the probability that exactly this form of biological complexity evolved. This sort of posterior probability calculation is completely meaningless, even if we were to accept that all the values you might wish to plug in were accurate.
    The only thing that we shoudl be really interested there is that it can tell the difference from OSC and RSC.
    There were already plenty of methods available for distinguishing totally random sequences (RSC) and highly repetitive sequences (OSC), like the examples that were given, from functional coding sequences and functional non-coding sequences. AS for the OSC example, they calculate that in a completely different way to all of the other ones, since if you actually use it with their program, i.e. give it a dozen sequences of poly-A from montmorillonite clay. it will give you a maximal FSC result.
    So we are basicly estimating where is an island of functionality in a sea of meaninglessness.
    Indeed, with the most conservative estimate possible based on highly structurally similar proteins. And you are then adding up all of these conservative estimates to make a number with essentially no scientific meaning. One has to wonder why? It doesn't show that the system couldn't evolve. It doesn't even show that it is impossibly unlikely. It certainly isn't positive evidence for intelligent design. The only purpose of this seems to be for rhetorical purposes to generate big numbers to impress people with.
    TTFN,
    WK

    This message is a reply to:
     Message 64 by Smooth Operator, posted 09-06-2009 9:39 AM Smooth Operator has replied

    Replies to this message:
     Message 67 by Smooth Operator, posted 09-09-2009 3:20 PM Wounded King has replied

      
    Wounded King
    Member
    Posts: 4149
    From: Cincinnati, Ohio, USA
    Joined: 04-09-2003


    Message 68 of 85 (523418)
    09-10-2009 6:40 AM
    Reply to: Message 67 by Smooth Operator
    09-09-2009 3:20 PM


    SO writes:
    Yes, well, you are supposed to measure the sequences of the functions you do know what they are for. What would be the point of measuring a sequence for which you do not know a function? Maybe it's totally useless.
    But while they claim you can derive an FSC value for a single biosequence they only show how to do it in the context of a pre-existing alignment. I suspect this is because outwith the modified conservation metric they have no way of setting the function variable that isn't totally arbitrary. I could identify 6 non aligning structurally diverse proteins with similar functions, would this method let me compare their FSC? The paper seem to claim it would but the method certainly doesn't and the paper doesn't make it clear how it can be used in such a way.
    SO writes:
    Hmm, well, that basicly seems like the same thing as the Durston's model. What's the measure of functionality in the BLOSUM uses?
    It's derived from the conservation of amino acids across multiple highly conserved proteins, the matrix is weighted so amino acids with similar functional physicochemical properties are scored higher than disimilar ones. This is why I think it is superior to Durston et al's technique. They treat all substitutions as equal with only the proportions at each individual site affecting the Fits for that site, the Blosum score on the other hand takes into account the likely functional effects of a particular amino acid substitution. You can go to the PFAM database directly and see an alignment of the Ubiquitin family (amongst others), simply click on 'Alignments' in the sidebar and then on the 'View' button in the first section 'View options'. This will, after clicking through another window, bring you up an alignment with scores for consensus, conservation and quality. Consensus just shows what proportion of the sequences have the most common amino acid at that site and what it is. Conservation shows a measure similar to BLOSUM but based directly on the known physico-chemical properties of the amino-acids rather than on substitution rates. If you go here you can get an idea of the matrix of physico-chemical properties used. The Quality track is the one base off the BLOSUM62 matrix (this was derived from looking at substitution rates among aligned proteins with >62% conservation of identity. If you can get the Durston et al. program running you could use that to generate a 'Fits' track as well, I still haven't worked out a good way to distribute the analysis I did.
    SO writes:
    Okay, I know. Now we have a new one, is that bad?
    No, just redundant. Why re-invent the wheel more crudely?
    SO writes:
    Here, look at this presentation by Durston himself. He gives out a formula in which M(Ex) is the number of different configurations that can perform a specific function. It's explained in the 01:30 into the video.
    I have to ask, why do you think he used the equation from the Hazen et al. (2007) paper rather than his own? I suggest that it is precisely because Hazen et al. clearly state how they derive their measure of functionality.
    Aside form that this is exactly what I suggested, simply an argument from big numbers where Duston plugs in lots of assumed values which are highly questionable, i.e. he uses the calculations from his paper for RecA and SecY even though he has no idea what the actual possible number of functional sequences is. He seems to have done a little bait and switch between his equation and the, albeit similar, one in the Hazen et al. paper. Durston is eliding over what Hazen et al. identify as a crucial step ...
    Hazen writes:
    In the preceding sections we demonstrated that the extension of functional information analysis to one-dimensional systems of letters or Avida computer code is conceptually straightforward, requiring only specification of the degree of function of each possible sequence.
    Hazen writes:
    The analytical challenge remains to determine the degree of function of a statistically significant random fraction of all possible configurations of the system so that the relationship between I(Ex) and Ex can be deduced.
    Durston et al.'s method skips over this step and just takes the conservation of amino acid sites in PFAM alignments as a good enough estimate, which naturally leads them to overestimate the degree of specification, I can't say how much because I have no idea of what all the sequences which fulfill a specific function are.
    It is the difficulty of doing this that lead Hazen et al. to use the Avida artificial life simulation as their main example, a system in which they could know in much more depth than than in an organismal system what the distribution of functional sequences was. They say ..
    Hazen writes:
    Note, however, that this type of random sampling is not possible with living organisms because the portion of genome space explored in an evolution experiment will be constrained by the topology of the underlying fitness landscape and the particular configuration of the environment maxima
    The Hazen et al, paper is very interesting, thanks for bringing it to my attention.
    SO writes:
    Do you think one part of genetic variability would come from nowhere?
    In as much as it is a stochastic process then yes. Genetic variability comes from errors in genetic replication and repair, from crossovers that swap domain around and from multiple other sources with no apparent source outside of the statistical nature of biochemistry and its interactions with the environment.
    SO writes:
    But they have not been seen to create new functions.
    You would have to define 'functions' quite clearly before w e could even agree to begin to discuss this. There are numerous instances of populations gaining new functions in in vitro experiments. Antibiotic resistance and other similar examples spring immediately to mind, but the RNA polymer experiments the Hazen et al. paper refer to shows that random mutation can generate and improve functionality.
    SO writes:
    If you look at the video I posted above you will also notice that there is a limit what natural processes can do.
    I understand that that is Durston's argument, the question is can we accept his estimates of where those limits are, I don't think we can given what he presents. Of course whether the correct response if this was true is to immediately leap to the conclusion of intelligent design is another matter which is still open for discussion.
    TTFN,
    WK

    This message is a reply to:
     Message 67 by Smooth Operator, posted 09-09-2009 3:20 PM Smooth Operator has replied

    Replies to this message:
     Message 69 by Wounded King, posted 09-11-2009 11:15 AM Wounded King has not replied
     Message 70 by Smooth Operator, posted 09-14-2009 4:31 PM Wounded King has replied

      
    Wounded King
    Member
    Posts: 4149
    From: Cincinnati, Ohio, USA
    Joined: 04-09-2003


    Message 69 of 85 (523594)
    09-11-2009 11:15 AM
    Reply to: Message 68 by Wounded King
    09-10-2009 6:40 AM


    Who would live in a functional sequence space like this?
    I'm not sure that I'm entirely convinced by the argument but Dryden et al. (2008) make an interesting case for the functional sequence space that life on earth actually needs to have explored is much reduced compared to how it is generally calculated.
    Dryden et al. writes:
    We conclude that rather than life having explored only an infinitesimally small part of sequence space in the last 4 Gyr, it is instead quite plausible for all of functional protein sequence space to have been explored and that furthermore, at the molecular level, there is no role for contingency.
    The areas where they argue for drastic reduction in required possible sequences are ...
    As an extreme method to reduce the size of sequence space, Dill (1999) suggested that only two types of amino acid were needed to form a protein structure, hydrophilic and hydrophobic, and that furthermore it was critical to define only the surface of the protein. These two suggestions reduce the size of sequence space to 2100 and 233, respectively (i.e. approx. 1030 and approx. 1010).
    ...
    The assumption that a protein chain needs to be at least 100 amino acids in length also rather inflates the size of sequence space when it is known that many proteins are modular and contain domains of as few as approximately 50 amino acids thereby reducing the space to 2050 or approximately 1065
    This presents an interesting counterpoint to Doug Axe's estimates of the likelihood of the evolution of functional protein folds.
    TTFN,
    WK

    This message is a reply to:
     Message 68 by Wounded King, posted 09-10-2009 6:40 AM Wounded King has not replied

      
    Wounded King
    Member
    Posts: 4149
    From: Cincinnati, Ohio, USA
    Joined: 04-09-2003


    Message 71 of 85 (524162)
    09-14-2009 4:55 PM
    Reply to: Message 70 by Smooth Operator
    09-14-2009 4:31 PM


    Hey SO,
    I'll try and address your post in some depth tomorrow. But just to remind you we already discussed the AIG page about beneficial mutations in some detail in [thread=-1588].
    And far as I can see from looking back there your 'fine tuning' just appears to be the name you give to beneficial mutations which can also encompass maintenance and presumably even increases in information. I assume you accept that if a fine tuning mutation increases the functionality that is being used in the fitness calculation, i.e. if it were to catalyse a specific reaction at an increased rate if that were the function of the particular enzyme in question, then it could be an increase in functional information? Or would you contend that whatever the initial enzymatic rate was was the optimal rate and would therefore represent the peak possible functional information?
    TTFN,
    WK

    This message is a reply to:
     Message 70 by Smooth Operator, posted 09-14-2009 4:31 PM Smooth Operator has not replied

      
    Wounded King
    Member
    Posts: 4149
    From: Cincinnati, Ohio, USA
    Joined: 04-09-2003


    Message 72 of 85 (524231)
    09-15-2009 6:09 AM
    Reply to: Message 70 by Smooth Operator
    09-14-2009 4:31 PM


    It probably doesn't work any other way.
    I agree, but that isn't what they say. They claim you can apply their metric to individual biosequences and use it to compare functionally similar but structurally distinct proteins.
    Durston et al. writes:
    Consider further, when a biosequence is mutated, the mutating sequence can be compared at two different time states going from ti to tj. For example, ti could represent an ancestral gene and tj a current mutant allele. Different sequences sharing the same function f (as outcomes of the variables denoted respectively as Xf, Yf) can also be compared at the same time t.
    Perhaps what they mean is that you can compare the FSCs of two distinct alignments having a common function. It isn't clear from the paper.
    How exactly is this better than Durston's model?
    It is better because it actually looks at the frequencies of substitutions in amino acids and identifies common substitutions that presumably allow functional conservation since they are maintained. In contrast as I said, Durston et al. simply look at the distribution of amino acids at each particular site, treating all amino acids as equal.
    And how do they tell apart functional and non-functional sequences?
    How do Durston et al.? I'm not sure if you are talking here about when the initial alignment is generated, in which case since I directed you to the PFAM database it is exactly the same functional criteria as Durston et al. use, i.e. whatever criteria PFAM used to define their structural families. If you mean how do they get the functionality criteria for individual amino acid substitutions then it is by looking at multiple highly conserved sets of sequences and looking at the distribution of tolerated amino acid substitutions and using that to infer functional physico-chemical similarities between amino acids.
    Simply because there is time to improve it and to become better.
    But they dont. How is their method an improvement on BLOSUM or other methods which actually consider the biological properties of the amino acids? Both of these generate a metric you could use for similar calculation to the ones Durston et al. perform.
    He is explaining how he got to his equation. His work is based on Hazen's.
    In this video perhaps but not in the Durston et al. paper. they don't reference Hazen's work at all, which is understandable since it apparently wasn't yet published when their paper was submitted. You might say that they both build on the work of Jack Szostak, but as I said I think that in that case Hazen did it the right way and Durston et al. failed to make their functional criteria in the least bit useful. Indeed if you look at what Szostak has written (Szostak, 2003)...
    Szostak writes:
    Approaches such as algorithmic complexity further define the amount of information needed to specify sequences with internal order or structure, but fail to account for the redundancy inherent in the fact that many related sequences are structurally and functionally equivalent. This objection is dealt with by physical complexity, a rigorously defined measure of the information content of such degenerate sequences, which is based on functional criteria and is measured by comparing alignable sequences that encode functionally equivalent structures. But different molecular structures may be functionally equivalent. A new measure of information functional information is required to account for all possible sequences that could potentially carry out an equivalent biochemical function, independent of the structure or mechanism used.
    His key points chime with my precise concerns with Durston's work, the failure to take into account structurally dissimilar but functionally equivalent sequences. The problem is that Durston et al. don't seem to have taken the extra step necessary to move beyond looking at functional complexity over aligned sequences.
    We know the number fo proteins the said structures have. We know what RecA does, so there is nothing left to assume.
    Yes there is, we need to assume that we know a high enough proportion of the extant functional sequences of RecA for our estimates based on those we do know to be meaningful.
    Not only that, but he cited Doug Axe, who dealt with the modifications to the proteins. What he actually did, was to modify proteins in such a way to show how much change they can take, but still perform the function they did. Now we know that there is a subset that is between 10^-64 and 10^-77 of all possible sequences that will still give you the same function in the modified protein.
    I'm familiar with Axe's work. He is extrapolating from one particular functional fold in one enzyme to all of the possible functional folds of all proteins. Not only that but he is doing so based on estimates derived from a highly proscriptive experimental set up using a protein variant already mutated to put it at the borderlines of functionality. As with Durston et al. one of the big flaws with Axe's approach is that it entirely ignores the existence of structurally dissimilar proteins which can perform the same function. The probability of evolving a particular functional fold is not so relevant if there are 10 other folds out there which can perform the same function.
    Didn't Durston actually mention that whaen they measure the AAs, that they specifically deal the cutoff to certain parts of the genome, so not to inflate the number of Fits?
    They have a cutoff value to eliminate stretches of indels which produce gaps in the alignment. But that in no way addresses what I am talking about. The sampling they analyse is only a small subset of the possible functional variants for the sequence but they effectively assume it represents the entire functional sequence space.
    You do know that errors may casue variability, but no functional variability.
    This is simply not true unless you are using the word 'functional' in a highly novel way. Of course the cause functional variability, even producing a loss of function is causing functional variability.
    We haven't actually observed, something liek ATP syntase arise de novo. Have you got any examples?
    Not of that specifically, but looking at Selex experiments will show you that randomly generated pools of RNA oligonucleotides produce multiple functional motif including binding and catalytic activities. Subsequently sequences encoding RNAs with similar structures have been found in many organisms.
    I'm not sure how you think one could force the de novo production of a catalytic activity like ATP synthase in the lab. Obviously the answer is you don't but I also don't see why you think this is relevant.
    A biological function is a process that takes in an input an gives an output within an organism. For an example. The turning of the flagellar motor is a function. Energy production form the ATP synthase is a function. Food degradation is a function.
    And to the extent that we can measure those functions we can incorporate them into an equation like Hazen's and maybe Durston's but I'm still not clear how. So if we found a mutation that improved motility of the flagellum would that be sufficient? What exact criteria would you use to measure flagellar functionality?
    But the point remains that there is a limit to what nature could have produced. And that is below 10^42.
    Even accepting that as the upper bound this still would allow the entire sequence space of a simplified amino acid repertoire to be explored for shorter sequence lengths, once functional sequences are extant their modification and recombination with other functional sequences is able to occur. Even once we have reached an agreement on upper bound we still need some agreement on the actual size of functional space that is needed to be searched, IDists tend to maximise this and perhaps evolutionists to minimise it, certainly the Dryden paper uses some pretty radical minimisation for its lowest estimates.
    TTFN,
    WK
    Edited by Wounded King, : Messed up quote formatting

    This message is a reply to:
     Message 70 by Smooth Operator, posted 09-14-2009 4:31 PM Smooth Operator has replied

    Replies to this message:
     Message 73 by Smooth Operator, posted 09-20-2009 3:14 PM Wounded King has replied

      
    Wounded King
    Member
    Posts: 4149
    From: Cincinnati, Ohio, USA
    Joined: 04-09-2003


    Message 75 of 85 (525202)
    09-22-2009 12:07 PM
    Reply to: Message 73 by Smooth Operator
    09-20-2009 3:14 PM


    The same happens with some mutations. Other mutations which are deleterious reduce the information in the genome. None of them makes a gain.
    This is an assertion with no evidence to support it. A mutation is not a light switch. Genes do not exist in a simple binary state of on or off like a light switch. You are simply making an assertion, surely you realise that the very equations we are discussing show that mutations can increase functional information, at least in theory. You seem to be denying even the theoretical possibility of an increase in information, but it isn't clear on what basis.
    So no, natural causes do not increase CSI.
    Again, a blank assertion with no evidence.
    They are talking about measuring differnet sequences with the same function. It can't be mutated enough to either lose, or change it's function. Because than, you would be measuring different functions.
    This is why the Durston et al. measure of function is meaningless, it doesn't measure anything as they use it. The Hazen paper gives specific examples of how to measure the function of specific sequences and use that to weight the functional information. For Durston et al, where 'function' is only a proxy measure of conservation in PFAM families it is true that a single novel de novo point mutation will not increase the FSC because maximal FSC in their scheme is represented by residues with 100% conservation. So only a mutation in a population which brought the sequence further into line with the consensus sequence, or made it more similar to another sequence in the alignment, could increase the overall FSC of the whole alignment. This is entirely divorced from the actual function of the protein however, and is merely a measure of conservation.
    It seems strange that the IDists who harp on about the importance of functional information don't want to produce a usable concept of function.
    Are you saying that some amino acids are more important than other?
    Yes, of course they are. Both in terms of biochemistry as a whole and in terms of specific amino acid sequences for individual proteins. Surely this is one obvious corrollary we can both agree on of Doug Axe's work. Some amino acid substitutions have larger functional effects than others and some amino acid positions are more sensitive to changes than others. All amino acid substitutions are far from equal. There are arguably some amino acids we could do without entirely, that is one of Dryden's main points when they say that you could produce the majority of known functional folds from a repertoire of only a handful of amino acids, in extreme cases possibly only 2.
    This is the same thing Durston's model does. I see no difference here.
    No it isn't if you see no difference then you are either blind or can't read. I spelled it out for you right in the paragraph you quoted. Durston et al. only look at the conservation within the PFAM alignment they are studying. BLOSUM does it on a large number of highly conserved aligned sequences to draw general functional relationships which they use to weight specific substitutions.
    But hey, I see no problem with it if it get improved later on.
    They aren't improving, they are ignoring the already existing methods which have improved on what they are doing.
    Well it's seems that other model's metric does not take into account the functional part of the sequence.
    It does to the same extent as Durston et al. do, by conservation, but they also taks into account the functional effects of the substitutions at the various amino acid positions, measured in two different ways.
    They all may be based on Shannon's information.
    Well so is Durston et al. look at the program they wrote and you will see that calculating shannon entropy is part of the algorithm. That doesn't mean they are only looking at shannon information. As I have argued they have more connection to functionality than the Duston et al. method, and the Hazen method has more again.
    Durston simply builds on Axe's work and plugs in Axe's numbers into his equation.
    And as I said, Axe's numbers are no more widely applicable than Durston's for the reasons I gave previously that you have yet to address.
    And what Szostak is actually saying here. Is that current way in which we are measuring information is not good enough. Becuse it does not take into account functional information.
    I agree that is what he is saying but eh is also actually sayin what it actually says in my quote. You can't just handwave away the fact that Szostak says that Functional information " is required to account for all possible sequences that could potentially carry out an equivalent biochemical function, independent of the structure or mechanism used". Which Durston et al.'s approach simply does not give us a framework for.
    What exactly is missing?
    Any knowledge about the full set of possible sequences functionally equivalent to RecA.
    You can extrapolate this on other proteins.
    Maybe you can, but that doesn't mean that such an extrapolation is reliable. Is 1 protein really a suitable proxy for all the possible proteins in existence? Not to mention it being a protein already mutated to be on the edge of functionality.
    If he mutates the proteins enough, he will show exactly how many different combinations, i.e. different sequences of proteins will work that same function.
    Except he won't, he isn't doing an exhaustive screen of all possible functional sequences even in that one protein. He is pushing some of the limits of functionality in one protein and extrapolating from them to the entire functionality space of all proteins.
    So yes, in this way you can calculate this one function even if there are 10 different protein that can do the same function.
    Please explain how he does this. We aren't talking about 10 sequence variants of one protein, we are talking about 10 totally distinct 1ary sequences with functional equivalence. How does Axe's work even begin to address the existence of these functionally equivalent proteins?
    It says in the paper that they deal the cutoff to those parts of the sequence so that they would not be counted as functional information. They don't assume that the whole sequence is functional.
    Saying this over and over again doesn't change anything. They quite explicitly say this is to remove indels because those indel regions could indeed inflate the FSC measure.
    This is completely divorced from the concept of the entire functional sequence space that I am talking about, which would be every single possible sequence, alignable or not, which would fulfill the functionality that is being used as the F criterion in the analysis. Durston et al. assume that the PFAM family alignments are a sufficient proxy for this, but they are making a massive and obviously wrong assumption.
    What I meant to say is that errors, i.e. mutations will not give you new biological functions. For an example, they will not produce a flagellum from something vastly different.
    Well those are two very distinct ideas. Evolutionary theory would generally hold that it would produce a flagellum gradually from elements that are similar. There are some cases of apparent radical de novo generation of new genes, but those are rare cases. As I said before your concept of biological function doesn't accord with either Szostak or Durston's.
    Binding and catalysis is not a biological function, it is a chemical process, and as such, a natural law.
    Again you make your definitions up for yourself. Szostak clearly considers both of these suitable biological functions since he presumably approves the use of them as examples of such in the Hazen paper. Indeed the functional sequence that Durston et al. are happy to have pulled out in their analysis of Ubiquitins is a DNA binding site.
    This is algorithmic information, as Szostak said where you quoted him.
    That isn't what he said, I recommend you read it again.
    Szostak writes:
    Approaches such as algorithmic complexity further define the amount of information needed to specify sequences with internal order or structure
    What does this have to do with catalysis and binding affinity?He talks about ...
    Szostak writes:
    sequences that could potentially carry out an equivalent biochemical function
    But you dismiss biochemistry as mere natural law and not connected to function. You aren't talking about function as any biologist understands the term.
    This kind of processes do not produce biological information.
    Except they do and the Hazen paper quantifies how much in at least one instance.
    It's very important if you want to be extrapolate changes in the biological organisms to account for all the diversity of life we observe today.
    You don't want us to extrapolate, you want us to recapitulate the evolution of one specific functionality.
    I like to use Dembski's CSI.
    Unfortunately this doesn't actually let you measure anything objectively. And it is surely a measure of information rather than functionality? The two things are distinct but related in terms of functional information, but one surely cannot substitute for the other?
    TTFN,
    WK

    This message is a reply to:
     Message 73 by Smooth Operator, posted 09-20-2009 3:14 PM Smooth Operator has replied

    Replies to this message:
     Message 81 by Smooth Operator, posted 09-23-2009 12:08 PM Wounded King has replied

      
    Wounded King
    Member
    Posts: 4149
    From: Cincinnati, Ohio, USA
    Joined: 04-09-2003


    Message 80 of 85 (525438)
    09-23-2009 11:49 AM
    Reply to: Message 79 by Theodoric
    09-23-2009 11:41 AM


    Hmmph!
    You kids get off my thread/lawn! If you are already having this discussion in another thread then why bring it here as well?
    TTFN,
    WK

    This message is a reply to:
     Message 79 by Theodoric, posted 09-23-2009 11:41 AM Theodoric has not replied

      
    Wounded King
    Member
    Posts: 4149
    From: Cincinnati, Ohio, USA
    Joined: 04-09-2003


    Message 82 of 85 (527171)
    09-30-2009 12:37 PM
    Reply to: Message 81 by Smooth Operator
    09-23-2009 12:08 PM


    Losing impetus
    Hi SO,
    Sorry to take so long to get back to you. To be honest I'm doubting the point in continuing this discussion, we don't seem to be progressing at all. From my side it seem like I raise points which directly address your claims, or rather those of Durston and Doug Axe, and you in turn simply repeat the claims and affirm that they are true because they are what Durston and Doug Axe said.
    Bacteria can alter ther diet when 300 permease proteins are found within the cell, the cell swithches it's diet with geentic regulation. The cell than gains the ability to digest sugar.
    You are getting confused here between genetic regulation and evolution. These genes aren't expressed in the absence of a certain environmental trigger, but they are present in the genome with all of the genetic sequences required for their expression. These cells always had the ability to digest sugar, they simply weren't expressing those genes when there wasn't any sugar in the environment.
    Now you claim it has been shown that CSI can increase with random mutations and natural selection. Please show me where.
    No I dont. I'm just saying that there isn't any evidence that they can't. At the moment I don't think CSI is a clearly enough defined concept for this to be done.
    That is simply how they do it. Measuring the conservation in different sequences is how they measure functionality.
    Indeed, they measure it by an indirect proxy method which tells us very little about the range of possible functional sequences. Why when Hazen et al. demonstrate how it can be tied directly into actual measurable functions? Surely the functionality is the whole crux of this approach so I still fail to see why Durston et al. approached it in such a perfunctory manner.
    That is true only after you ahve a protein. Before you have a protein all amino acides are equaly important, which is to say they are all not important in any way, since you have nothing to do with a bunch of ranom amino acids.
    You seem to be trying to slip into a discussion of abiogenesis here, which is totally beside the point.
    That's the same thing.
    No it isn't and I have described the differences in the conservation measures to you at least twice now.
    You know very well that you can't just mutate proteins to infinity.
    And I assume you know that that is a gigantic strawman and very far from being a cogent argument. The fact that you can't limitlessly mutate proteins and have them retain function does not mean that you can't substantially mutate them and retain function. In fact Doug Axe's own research showed that functionality could be maintained after 20% of a proteins surface residues had been altered.
    Yes it does. It's what Axe's work us used for. He did the work with proteins and changed them to where they stopped functioning as they should. This area of functionality, actualy the rpobability, is the number you need.
    Except as Hazen and Szostak both emphasise, you need to effectively know all of the possible functional sequences for this to actually work, whereas Axe focuses onto a very narrow range of functional states by making the protein minimally functional to begin with.
    Why can't his work be extrapolated to find the modificational possibilities of RecA?
    Because they are totally different proteins with totally different functions. Axe's work didn't even explore all the functional modificational possibilities of the protein he was studying.
    It's the best we got.
    It may the the best that ID has but there is a wealth of comparative genetic data showing functionally similar proteins with lots of highly divergent genetic and amino acid sequences. I may post something to let us look at this in more depth at some other point.
    Do you think there are soem proteins that can be coded for with infinite numbers of sequences?
    There is a very large amount of space between a handful of possible sequences and an infinite number, you are excluding the middle pretty strongly here.
    Do you think there are soem proteins that can be coded for with infinite numbers of sequences?
    You clearly don't understand my point. There are totally distinct amino acid structures that can perform equivalent functions. How can Doug Axe's approach ever identify these since as you say they are local modifications of a specific existing proteins structure?
    Well if they could inflate the FSC, than what are we arguing about? They are not wrong.
    That was never the point, as I have explained before. I didn't say they were using every possible method to inflate the FSC measure, I was pointing out that the fact that they assume that a PFAM alignment is a suitable proxy for all possible sequences with a given function will inflate the FSC measure.
    Durston agrees that there is algorithmic information which is describing natural law. Simple chemical reactions are not biological functions.
    Well he certainly doesn't say so in his paper.
    It is nto enough, that is the point. You can also use random measure of information liek RSC and get a measure from some biological function.
    This is arrant nonsense they are making direct measurements of biological function. To equate that to RSC is to essentially say, 'well sure you can evolve novel functions like that but they aren't real novel functions'.
    If that is true, than why does Durston, who des not agree with this, bases his work on Hazen's?
    You tell me, you are the one who is saying that this is Durston's position. He avoids giving a usable definiton of biological function in his paper, as I complained before.
    I'm just asking you, from what are you going to extrapolate to what?
    From the various existing genetic sequences we are familiar with, the research into hypothetical ancestral sequences and their functions and from a growing body of research into the functionality of large scale artificially generated random and evolved sequences, i.e. SELEX.
    As I say we currently don't seem to be going anywhere, maybe instead of these mammoth omnibus post to each other we could focus on one specific point and actually discuss it in enough depth to reach some sort of conclusion. My personal favourite would be the question of how well we can estimate a total functional sequence space and how it aligns with Doug Axe's approach.
    TTFN,
    WK

    This message is a reply to:
     Message 81 by Smooth Operator, posted 09-23-2009 12:08 PM Smooth Operator has replied

    Replies to this message:
     Message 83 by Smooth Operator, posted 09-30-2009 4:24 PM Wounded King has replied

      
    Wounded King
    Member
    Posts: 4149
    From: Cincinnati, Ohio, USA
    Joined: 04-09-2003


    Message 84 of 85 (527382)
    10-01-2009 8:03 AM
    Reply to: Message 83 by Smooth Operator
    09-30-2009 4:24 PM


    Re: Losing impetus
    As for Axe and Durston. I don't claim they're right, because they say so, I just say what they wrote in their papers. That's all.
    And when I bring up objections to what they wrote you just re-iterate it, that isn't how a debate is supposed to progress. If you don't think they are right then why continue to repeat their claims? If you do think they are right then why not present further supporting evidence?
    Never, obviously... Therefore, such changes are not evidence for evolution.
    No one ever said they were. I said that mutations weren't like like light switches simply turning genes on and off, I wasn't saying anything about changes in gene regulation during an organisms life.
    Mutations certainly can spontaneously disrupt a gene rendering it null, but such genes tend to degenerate subsequently they don't just sit in the genome waiting to be turned back on. In line with this there is no evidence that there are novel functional genes hidden in the genome just waiting for the right regulatory switch to evolve to allow them to be expressed. Ho wwould such genes be maintained?
    In the same way, when you put water under 0C, you will always get ice.
    This is part of your problem, you incorporate so many assumptions into what you consider to be 'natural laws' and ignore the importance of context. There are a number of conditions in which water will not produce ice at 0C, the obvious one is at high pressure at 13.35 Mega Pascals(MPa) the freezing point of water is -1C it is -21.985C at 209.9 MPa.
    This actually does not create new information, since "i" will always produce "j" when acted upon by "f". This simply means that the natural law has shifted the same amount of information from "i" to "j". The problem of where did the CSI come from is not resolved by this.
    I don't disagree with this. I think that the information is that of the complex environment in which the system is evolving. It is information about the environment as embodied by the environment. In this case the medium really is the message.
    The
    Oh, and you said CSI is not well defined. Please tell me why not?
    Because ID proponents can't agree on a clear usable definition or demonstrate how to calculate it? Because the calculations require several large assumptions or estimates of key probabilities the reliability of which make it essentially a guess?
    Their method is not supposed to tell us anything about all the possible functional sequences for that particular function. That is why they rely on Axe's work instead.
    But Doug Ace's work doesn't do this, and certainly not for distinct genetic sequences with equivalent functions.
    No, I'm not. I'm simply saying, that who cares about a bunch of amino acids. They are useless on their own. None of them is more important than the other. I mean, important to whome?
    This is meaningless nonsense. The physico chemical properties of specific amino acids are important to anyone who wants to understand biochemistry or protein evolution at any meaningful level. I pointed out that Durston et al.'s apporach is crude because it fails to take the functional similarities of certain amino acids into account and essentially treats them all as interchangeable. Given that there are already several extant methods which do take into account the functional/physicochemical similarities of the amino acids. This make Durston et al.'s approach a big step backwards in terms of investigating functionality.
    The only thing I can say I understood from you, is that you simply disagree with Durston's approach, simply because his method does not tell us about all the possible functions while measuring.
    This is why I think some more focused posts would bve beneficial. We are discussing several different issues and you seem to be getting them all mixed up. The differences in measuring conservation between Durston et al's method, Blossum and the other method are a quite distinct point from the fact that Durston et al. assume a PFAM alignment is a suitable proxy for all possible sequences with a given function.
    But than again, Axe's work comes into play here, and all gets sorted out.
    No it doens't and just keeping on saying that it does won't change that. Sure Doug Axe gives them a figure to use, but unless that figure is actually meaningful then he might as well have just made it up.
    I said "infinity" just to make a point. I know no sane person thinks you can mutate them to infinity. The point I was trying to make is that there is a limit. Which means not all sequences will work.
    A point hardly worth making which you yet try to make several times. No one has ever argued otherwise, hence my pointing out that this is a strawman.
    Well, if we extrapolate from what Axe has done, than I think we have no problem in determining the number of possible sequences.
    You may have no problem in coming up with a number, but there is no reason to think that that number has any relevance outside of the specific protein Axe was studying (if that given the already existing mutations befor his work even started). Axe's work, as I have now pointed out multiple times, never addresses distinct structures with similar functions, it doesn't even address the full functional space of the protein it is focused on.
    If he mutated it beyond functionality than that is a good enough estimate.
    No it isn't. If he had tried single nucleotide substitutions and found 1 single nucleotide substitution that rendered the protein completely non-functional would that be sufficient to extrapolate that all proteins must be 100% conserved to maintain functionality? If not then why is his slightly larger scale approach sufficient to make sweeping generalisations of the complete functional sequence space for every possible function? the answer is that it isn't. Perhaps we should discuss Axe's work on a separate thread.
    If they show that proteins are even more deformable, and can still perform their function, than that's fine with me. Simply because my point remains. And that is, that there is a limit. And if we can estimate the limit, that there is nothing to stop us from measuring the functional information in the genome.
    I agree with you here, although obviously we might disagree on exactly what the criteria of functional information is since we don't agree on what constitutes a usable concept of biological function. I think the main point of contention is that you think we can make these estimates using Doug Axe's approach and I don't, or at least I don't think we can from what he has produced so far, not even for the protein he was studying.
    If his estimate includes those sequences, than it can. Otherwise it can't. That's obvious.
    Well it doesn't, there is no way for his work to address these sequences as it stands. So it appears you now agree with me that it is insufficient? Yes?
    No, that is just the conservation rate among all the proteins that are found. I can't see this inflating the FSC rate.
    But it isn't, the measure of FSC also implicitly factors in the functionality component and if you are only using a small fraction of all the possible functionally equivalent sequences then your measure of FSC is going to be inflated because to get an accurate measure you would need an alignment with all possible sequences that satisfy the functional criteria. Would you not agree that such a data set would constitute the ideal situation? Obviating the need for estimates such as Doug Axe's work provides?
    He is basing his work on Abel and Trevor's work, which I think you know about. The one where they talk about three subsets of sequence complexity.
    But that doesn't address the issue either. This still doesn't give any reason why we should discount demonstrable measurable functional outcomes, such as enzyme catalysis rates or binding affinity, as measures of biological function. You yourself were quite happy to put forward a loss of binding affinity as a change in biological function, albeit a loss, when we were discussing anti-biotic resistance.
    I just wanted to say that RSC can measure anything, not that it is good enough to use as a measure of biological functions. Simply because it can not tell apart from biological functions, and simple natural sequences. It will always give you out some measure.
    Then you were talking about something totally irrelevant to this discussion.
    I for one, would disagree on that simply because SELEX is actually not so random. The amount of intelligent input is to much to simply call it natural selection in action. As noted below.
    And how surprising it is that you quote Abel and Trevor's agreement with your position . Except of course when they say ... "All of the impressive selection-amplification-derived ribozymes that have been engineered in the last fifteen years have been exercises in artificial selection, not natural selection.", a sentiment your argument echoes, they miss the point that for such selection to work the functional change must have been produced by the random mutations. No one has ever considered selection to be a random process. Natural selection is no more random than artificial selection, although obviously artificial salection is usually more focussed on one specific trait than natural selection so it may act as a more intense directional pressure.
    The point you miss is that I am not putting SELEX forward as a model for evolution but as an example of a system where you can almost fully explore the functional space of a sequence or even function and produce some real measures of FSC. If you wish to consider it a product of ID that is fine, it still provides a baseline method against which we can compare our attempts to measure FSC in real proteins.
    Long posts are just fine with me. The more we talk about, the more we can learn.
    You'ld hope so, I still get the feeling that having 3 concurrent topics runnning in one set of posts is leading to some confusion.
    TTFN,
    WK

    This message is a reply to:
     Message 83 by Smooth Operator, posted 09-30-2009 4:24 PM Smooth Operator has replied

    Replies to this message:
     Message 85 by Smooth Operator, posted 10-01-2009 6:52 PM Wounded King has not replied

      
    Newer Topic | Older Topic
    Jump to:


    Copyright 2001-2023 by EvC Forum, All Rights Reserved

    ™ Version 4.2
    Innovative software from Qwixotic © 2024