|
Register | Sign In |
|
QuickSearch
EvC Forum active members: 65 (9164 total) |
| |
ChatGPT | |
Total: 916,916 Year: 4,173/9,624 Month: 1,044/974 Week: 3/368 Day: 3/11 Hour: 0/0 |
Thread ▼ Details |
|
Thread Info
|
|
|
Author | Topic: Information for MPW | |||||||||||||||||||||||||||
Percy Member Posts: 22506 From: New Hampshire Joined: Member Rating: 5.4 |
Hi, :ae:!
I wasn't able to follow your argument, even after looking at your link. It wasn't clear to me how natural selection can increase information. I approach the problem from a slightly different angle. The total information for any gene in a population of organisms is equal to, keeping it simple, the log base 2 of the number of alleles (the "keeping it simple" part means I'm assuming equal probability for all alleles, and that each allele is a piece of information). The total information in the population for this gene can only change if the number of alleles increases or decreases. The selection by any individual reproductive act of one particular allele neither increases nor decreases the number of alleles in the population. An increase can only happen through mutation, and a decrease can only happen when no offspring in a generation inherits the allele (call this allele death- there's probably a correct term for it, but I don't know what it is). Natural selection without mutation or allele death can neither increase nor decrease information. Even when you consider multiple genes working in concert permutationally, information cannot be considered to have increased or decreased during the gene mixing of reproduction because the log base 2 of all possible permutations is the total information in the genome, and individual expressions of these permutations do not affect it. At least that's the way I see it, but let me know what you think. --Percy
|
|||||||||||||||||||||||||||
Percy Member Posts: 22506 From: New Hampshire Joined: Member Rating: 5.4 |
Hi, :ae:!
I'm afraid I'm still not reading you.
This all seems fine except for where you state that "each allele is a piece of information". I was trying to use informal wording. In Shannon information terms (see Shannon's Original Paper), the total alleles for a gene in a population represent the total set of messages that can be copied to offspring. Each allele represents a single member of the total message set of all possible alleles for that gene in the population.
The greater the diversity of alleles, the greater the information gained if one were selected for. You're thinking of the individual offspring as a receiver of information received from the parents, but it is only relevant to talk about the information in a population, and not of the information flow during the reproductive act - that's just copying. If it were true that the mere act of reproduction increased the information in a population, then a population could increase genomic information simple by increasing the size of the population. But having multiple copies of the same set of alleles doesn't increase the amount of information a population possesses, no more than possessing two copies of Shannon's paper increases the amount of information available to you.
But what if a mutation occurs to an allele that also passes the selection filter? Say, a5 is a mutation of a2 yet also passes the selection filter. Then we have: -log2(2/5) = 1.32 bits I don't know if this is just too profound for me, or if you're just making it up as you go along, but this makes no sense to me. You're going to have to explain this one.
However, anywhere we have a finite number of alleles at one time, and then at a subsequent time we have a smaller number of alleles which survived based on fitness, selection has operated upon that set of alleles and the remaining ones carry an increase in information. Clearly wrong, even just intuitively - just try creating information by ripping pages out of a book and burning them, and then by your logic when the book is empty it contains more information than it ever did. You can demonstrate this for yourself with a simpel example. If your population begins with 64 alleles and then later has only 32 alleles, then just do the math. The information present at the beginning for this gene was 6, and later it was 5.
It is hypothetically possible to have X number of alleles, pass them all along (i.e. do not select for any) and then add, say, 2 mutations to that set of alleles, and the result would be a information decrease. Again, clearly wrong. Keep in mind that when you add alleles to a gene you're increasing the message set size M, and that information, which is log2M (or -log2(1/M), whichever you prefer), increases with increasing M. --Percy
|
|||||||||||||||||||||||||||
Percy Member Posts: 22506 From: New Hampshire Joined: Member Rating: 5.4 |
Hi, Tagless,
I don't think I can be any more clear than what I originally said:
The total information for any gene in a population of organisms is equal to, keeping it simple, the log base 2 of the number of alleles (the "keeping it simple" part means I'm assuming equal probability for all alleles, and that each allele is a piece of information). The total information in the population for this gene can only change if the number of alleles increases or decreases. The selection by any individual reproductive act of one particular allele neither increases nor decreases the number of alleles in the population. An increase can only happen through mutation, and a decrease can only happen when no offspring in a generation inherits the allele (call this allele death- there's probably a correct term for it, but I don't know what it is). I don't know why you ask "Are you referring to chromosome counts as information?", because I'm clearly talking about alleles for a single gene. --Percy
|
|||||||||||||||||||||||||||
Percy Member Posts: 22506 From: New Hampshire Joined: Member Rating: 5.4 |
1 gene = 1 protein = 1 function = something I didn't say
--Percy
|
|||||||||||||||||||||||||||
Percy Member Posts: 22506 From: New Hampshire Joined: Member Rating: 5.4 |
In Shannon information terms (see Shannon's Original Paper), the total alleles for a gene in a population represent the total set of messages that can be copied to offspring. Each allele represents a single member of the total message set of all possible alleles for that gene in the population. This all makes sense. However, the I don't see how this supports your labeling each allele a piece of information. As I already said, I was trying to use non-technical terms. Calling an allele a piece of information was not intended to convey anything more than that the allele is one message of a set of possible messages. While you might not like my word choices when expressing this informally, I was trying to consider the wider audience, and I *do* think they brought the right images to mind when read by those unfamiliar with information theory. Keep in mind that the Creationist argument is that evolution (meaning in this case reproduction and mutation) cannot increase information. This is clearly wrong. If the allele set size for a gene in a population is 8, then I=3. If there's a mutation and the allele set size grows to 9, then I=3.12. Creationist argument falsified. But you're making a different argument. You've even moved the argument from one of the amount of genomic information in a population to one of the probability of particular alleles being inherited. While this is certainly extremely relevant to population genetics, it is not the same issue of information usually raised by Creationists. I think I said a couple of times that I was keeping things simple (in keeping with the wider audience again) and assuming equal probabilities for all alleles, but your example uses unequal probabilities. I'll speak to this, but we may lose the Creationist audience:
Now say all of the members of the first set pass along their traits to one descendent and then die, except one member has two offspring and a new allele appears (the offspring are an a1 and an a5). Meaning now we could have 1 of the a5, 10 of a1, 10 of a2, 10 of a3, and 10 of a4. The probabilities shift so that the probability of a5 is 1/41, the probability of a1 is 10/41, a2 is 10/41 and so on. Pass them through the filter, and suppose that again only the a1's pass through. Now we have: You're measuring the information (number of bits) *after* it's arrived. You must instead look at the total number of messages in the set *before* the message is sent, which is -log2(1/5) = 2.32 bits. This is the minimum number of bits necessary to communicate messages of set size 5. Your answer of 2.04 bits is incorrect because the receiver of the information did not know which allele he would receive, and so there has to be provision that he could receive any of the 5 alleles in the message set.
So we can see that even the mere existence and deselection of the odd mutated allele increases the incremental information gain compared to before it was among the population in the previous generation. An information gain or loss occurs when the allele set grows or shrinks in size. You're incorrectly equating the specific information communicated to offspring with information measures.
Disregarding the meaning of the printed text on the pages, the selection of 1 page out of the entire set of pages would result in an increase in information. It seems intuitively wrong because you're conflating meaning with information. No, I'm definitely not "conflating meaning with information" - keep in mind I'm assuming Creationists are trying to follow this thread. That I long ago understood the difference between meaning and information can be seen in Message 74 of the old thread Information and Genetics where I discuss this with Dillan. I think there are any number of ways to consider a book as containing information (not meaning) - let me know if examples would help.
If there were 64 in the message set, and 32 were selected, then the probability that those 32 were selected is (32/64). The equation would thus state:
Or a 1 bit increase in information upon the selection of 32 elements of the message set out of 64. All you've actually done is calculated the difference between the number of bits needed to communicate messages of set size 64 versus 32:
-log2(32/64) = -(log2(1/32) - -log2(1/64)) = -(5 - 6) = 1 bit That one bit is in no way a measure of any information actually communicated, or of the information in the population for that gene. The idea that you can just magically place the number selected in your numerator does not appear to have a justification or correspond to any real world situation that I can see. --Percy
|
|||||||||||||||||||||||||||
Percy Member Posts: 22506 From: New Hampshire Joined: Member Rating: 5.4 |
Hi, :ae:!
I'm afraid what you're saying still makes no sense to me. I'll just focus on a small part:
It could be argued, I suppose, that simply defining the message set reduces uncertainty and therefore supplies information, but if we were to define a message set of 5 elements out of the literally infinite number of possible elements that exist in the universe's message set prior to definition, our calcluation would be:
When you say, "Simply defining the message set reduces the uncertainty," I think it reflects a fundamental misunderstanding of information theory. The message set is *always* predefined. What the sender wants to communicate to the receiver is not the message set, because they have pre-agreed on that, but individual messages from the message set. It isn't that the receiver doesn't know the message set, because he most certainly does! What he doesn't know, what is uncertain, is which message of the message set will be sent next. You also need to explain why you keep taking the log base 2 of any number that strikes your fancy, in this case 5 over infinity. --Percy
[Minor phrasing improvement. --Percy] [This message has been edited by Admin, 02-12-2004]
|
|||||||||||||||||||||||||||
Percy Member Posts: 22506 From: New Hampshire Joined: Member Rating: 5.4 |
It is possible that you're trying to make one point while I'm making another, but most aspects of your position seem wrong to me:
Selecting an allele for a gene during reproduction is equivalent to sending one message. The number of alleles in the population is the size of the message set for that gene. As Shannon says in his paper right on page 1, "The significant aspect is that the actual message is one selected from a set of possible messages." It is significant that Shannon goes on to deny your post facto claim in the very next sentence when he says, "The system must be designed to operate for each possible selection, not just the one which will actually be chosen since this is unknown at the time of design." Common sense tells us the same thing. In other words, the communication channel is designed to send one message from the message set. The communications channel must be of a size capable of sending one message at a time, and you can send consecutive messages in order to build more complex higher-level messages. An example: the ASCII character code. It contains 256 different characters, and requires log2256=8 bits of channel. A message consists of a single character. You can send consecutive messages (characters) in order to send higher level messages of greater complexity. The alleles for a gene of a population can be thought of in the same way. The message set size is equal to the number of different alleles in the population, call it a. The minimum number of bits of communications channel necessary to communicate a single message of message set size a is log2a. That means each organism needs at least log2a bits of capacity to store a single allele, and we can think of the minimum bit capacity as a measure of the amount of information present in the entire population for that gene location. We can think of this gene location as a communication channel for transmitting (through the reproductive process) the allele to the next generation. The analogy with the ASCII character code breaks down after that point. With heredity, more complex messages are sent through the addition of more genes with their own set of alleles, and not by sending more alleles of the same gene. But the principle should still be clear. --Percy
|
|||||||||||||||||||||||||||
Percy Member Posts: 22506 From: New Hampshire Joined: Member Rating: 5.4 |
Read what I said again. I'm talking about population allele set size:
Percy writes: Keep in mind that the Creationist argument is that evolution (meaning in this case reproduction and mutation) cannot increase information. This is clearly wrong. If the allele set size for a gene in a population is 8, then I=3. If there's a mutation and the allele set size grows to 9, then I=3.12. Creationist argument falsified. It doesn't matter how many individuals in the population have the "this" allele, it still only counts one toward allele set size. --Percy
|
|||||||||||||||||||||||||||
Percy Member Posts: 22506 From: New Hampshire Joined: Member Rating: 5.4 |
I was thinking about this some more, and I think I see now where I'm having the biggest problem with your approach. This is from your Message 12 (note that I've changed terminology form Shannon's "message" to your link's "symbol", which seems better suited):
Now say all of the members of the first set pass along their traits to one descendent and then die, except one member has two offspring and a new allele appears (the offspring are an a1 and an a5). Meaning now we could have 1 of the a5, 10 of a1, 10 of a2, 10 of a3, and 10 of a4. The probabilities shift so that the probability of a5 is 1/41, the probability of a1 is 10/41, a2 is 10/41 and so on. Pass them through the filter, and suppose that again only the a1's pass through. Now we have: What you describe in qualitative terms, both here and elsewhere, about the probability of symbols in our symbol set is correct, but your math is nonsense. Check out Shannon's original paper, or reread your own link, especially section Information, Entropy, and Uncertainty. You won't see anything like the games you're playing with your numerator. Determining the necessary channel width in bits for a symbol set of unequal probability is not a straightforward operation. It certainly isn't equal to the log base 2 of the probability of the particular symbol just sent. You may be doing this because you're confusing the information actually transmitted (eg, alleles inherited by an offspring) with communication channel capacity. You are correct that the minimum number of bits is much smaller for a very likely allele, but it is *not* the same thing as the minimum channel width necessary to transmit any allele of the set. Concerning unequal probabilities of symbols, this portion from your link is applicable:
Shannon and R.M. Fano independently developed an efficient encoding method, known as the Shannon-Fano technique, in which codeword length increases with decreasing probability of source symbol or source word. The basic idea is that frequently used symbols (like the letter E) should be coded shorter than infrequently used symbols (like the letter Z) to make best use of the channel, unlike ASCII coding where both E and Z require the same seven bits. Obviously you know this already, but note that heredity doesn't really work this way, at least not at the simple level that we should initially approach it. Just because a particular allele overwhelmingly dominates the rest and is 99% likely has no impact on the code length necessary to represent that allele. The gene of an individual must be capable of representing the most unlikely allele in the population. The gene of an individual is a static record, not a symbol in a message stream. We cannot gain the efficiencies of special codings because that cannot be done across generations (ie, that the only way you could completely decode the allele of an organism is to know the coding for the alleles of the ancestors back one or more generations - coding is a science in itself - many is the time that Shannon's equations have said that some symbol set with some probabilities can be represented in some incredibly small number of bits, but the equations don't tell you the encoding, only that it's possible, and there's the catch). The gene location doesn't change its size in each individual organism based on the likelihood of the inherited allele (in reality I wouldn't put it past nature to contain examples of doing just this, but we want to stick with a simple model for now). So we can only conclude that the most reasonable measure of information in a population of organisms is the log base 2 of the number of alleles for each gene, independent of the probability of those alleles because our channel capacity is fixed, at least for the scenarios we are considering at this time. --Percy
|
|||||||||||||||||||||||||||
Percy Member Posts: 22506 From: New Hampshire Joined: Member Rating: 5.4 |
Saviourmachine writes: Then the number of alleles increases from 2 to 3. And 'information' is added. I doubt that, I doubt the use of Shannons definition of information in this context. That's your rebuttal - you doubt it?
So, if you want to disprove this, you've to look at the kind of information for at least the analogy above. Not some pretty simple thing like Shannon's definition. If you really think Shannon's definition is simple, or that it is insufficiently nuanced to represent genetic information, then I suggest you read his paper. Your approach in Message 21 of that other thread lacks rigour, has a confused presentation, and I'm not convinced time and energy invested in its decipherment would be repaid. --Percy
|
|||||||||||||||||||||||||||
Percy Member Posts: 22506 From: New Hampshire Joined: Member Rating: 5.4 |
Saviourmachine writes: Now you're talking about genetic information. Doesn't it matter how the genes are decoded? In your claim you were talking about information in general. I said that each allele of a gene in a population of organisms is a symbol of the symbol set, and the symbol set size a is equal to the number of unique alleles for that gene in the population. The information measure for that gene is I=log2a. That seems fairly specific. --Percy
|
|||||||||||||||||||||||||||
Percy Member Posts: 22506 From: New Hampshire Joined: Member Rating: 5.4 |
Hi, :ae:!
Saviourmachine's messages prompted another thought. Shannon information isn't really a measure of information, but only a measure of the minimum bits required to represent or transmit members of a symbol set. You tend toward examples where the set members have unequal probabilies, and this can yield very small values for I relative to symbol set size. But allele frequency in a population changes over time, while the genetic code for the alleles of a gene is fixed, so we must pessimistically assume equal probabilities for all set members, which always yields the largest values for I. We must also keep in mind that the Shannon equations yield a minimum possible value. Genetic codes are nowhere near so dense, though some of the redundancy probably contributes to error tolerance. --Percy
|
|
|
Do Nothing Button
Copyright 2001-2023 by EvC Forum, All Rights Reserved
Version 4.2
Innovative software from Qwixotic © 2024