|
Register | Sign In |
|
QuickSearch
Thread ▼ Details |
|
Thread Info
|
|
|
Author | Topic: Introduction to Information | |||||||||||||||||||||||||||
DNAunion Inactive Member |
....."Hey, guess what?"
What, you don’t know? Of course you don’t (at least not if this is the first time you’ve read this). You are completely uncertain about what I am going to tell you because I have given you no information at all. In other words, your uncertainty is maximum and the information you have is minimum. ....."Hey, guess what my dog just did?" Ah, now your uncertainty is not as great. Now you know that I wasn’t going to say something about our solar system, or about earthquakes, or about cars, or about a chair, or about a slew of other things. Your uncertainty has decreased because I have given you more information this time: it’s about my dog. Still, you have little idea exactly what I will say. ....."Hey, my dog just shook hands for the first time!" Now you know exactly what I was going to say. Your uncertainty is now minimum and the information you have is maximum. If you noticed, uncertainty decreases as information increases. The opposite holds true: if your uncertainty increases, the amount of information you have must decrease. Therefore, information can be defined as a reduction in uncertainty. Let’s see another example of uncertainty decreasing as information increases, this time keeping track of the amount of uncertainty (and therefore, information) we have at each step. Suppose I shuffle a normal deck of cards and you pick a card at random. How many yes/no questions must I ask to find out what card you have picked? 52? 26? You might be surprised. Let’s suppose you picked the ace of spades (it really doesn’t matter what card you picked: all cards require at most the same number of questions). Before I ask the first question I have complete uncertainty and no information: your card could be any of 52. Hearts: 2, 3, 4, 5, 6, 7, 8, 9, 10, J, Q, K, ADiamonds: 2, 3, 4, 5, 6, 7, 8, 9, 10, J, Q, K, A Clubs: 2, 3, 4, 5, 6, 7, 8, 9, 10, J, Q, K, A Spades: 2, 3, 4, 5, 6, 7, 8, 9, 10, J, Q, K, A So the process of eliminating possibilities begins. Question 1: Is your card a red card? Because you have the ace of spades, your answer is, No, which eliminates all red cards from consideration. Clubs: 2, 3, 4, 5, 6, 7, 8, 9, 10, J, Q, K, ASpades: 2, 3, 4, 5, 6, 7, 8, 9, 10, J, Q, K, A Instead of your card being any one of 52, it is now any one of 26: a reduction in uncertainty due to an increase in information. Question 2: Is the suite of your card clubs? Your answer is, No, which eliminates all clubs from consideration. Spades: 2, 3, 4, 5, 6, 7, 8, 9, 10, J, Q, K, A Now, I know the suit of your card: it must be spades. And, instead of your card being any one of 26, it is now any one of just 13. Question 3: Is your card greater than 7? Your answer is, Yes, which eliminates all spades from 2 to 7 from consideration. Spades: 8, 9, 10, J, Q, K, A Now, instead of your card being any one of 13, there are only 7 possibilities. Question 4: Is your card greater than 10? Your answer is, Yes, which eliminates the 8, 9, and 10 of spades. Spades: J, Q, K, A Now, instead of your card being any one of 8, it is any one of just 4. Question 5: Is your card greater than a Queen? Your answer is, Yes, which eliminates the Jack and Queen of spades. Spades: K, A This leaves only two possibilities: your card is either the King of spades or the Ace of spades. Question 6: Is your card the King of spades? Your answer is, No, which eliminates the King. Spades: A I now know for sure that your card is the Ace of spades. At this point, after just 6 questions, I have no uncertainty whatsoever and complete information. That’s it: any card can be determined with a maximum of just 6 yes/no questions, if those questions are chosen correctly. The idea here —and in other information gathering processes — is to eliminate half of the remaining possibilities with each question. That gives you the maximum amount of information, or equivalently, reduces your uncertainty as much as possible, for each question and therefore allows you to zero in on the solution as quickly as possible. For example, randomly guessing cards, Is it the 2 of clubs?, wouldn’t guarantee success until the 52nd question (and that is only if you kept track of all your guesses to avoid repeating any). Measuring Information: Bits It was stated abpve that the most information is obtained per question by devising questions that cut the number of remaining possibilities in half. If we use some math, we can see why it takes at most six questions to figure out which card is selected. If there are two possibilities — such as red or black — then a single question can determine which one of those two possibilities is the case. 1 out of 2 possibilities = 1 question If there are four possibilities — such as hearts, diamonds, clubs, or spades — then two questions can determine which one is the case. 1 out of 4 possibilities = 2 questions This is because the first question cuts it down from 1 out of 4 to 1 out of 2, and then the second question narrows it down from there to just one. The same logic applies to larger numbers. For 32 possibilities: 1 out of 32 possibilities = 5 questions The fist question cuts it down from 1 out of 32 to 1 out of 16; the second question cuts it down from 1 out of 16 to 1 out of 8; the third question cuts it down from 1 out of 8 to 1 out of 4; and as we saw above, two more questions will narrow it down to just 1. There is a mathematical relationship between the number of halving questions needed and the number of beginning possibilities. In simple form, all we do is multiply 2 times itself as many times as needed to end up with a number that is at least as big as the number of possibilities: each 2 used in the multiplication represents one question. 2 = 22 x 2 = 4 2 x 2 x 2 = 8 2 x 2 x 2 x 2 = 16 2 x 2 x 2 x 2 x 2 = 32 2 x 2 x 2 x 2 x 2 x 2 = 64 So, as the last equation above indicates, if one out of 64 possibilities is selected, you need a maximum of 6 questions to find which one was chosen. And that is why it takes a maximum of 6 questions to figure out what card was selected: 52 is more than 32 (so 5 questions won’t do), but is less than 64 (so 6 questions will do). If you could somehow ask partial questions, then it wouldn’t take a full 6 questions to figure out a card; but you can’t, so it does. When gaining information, each division of possibilities in half corresponds to one bit of information (the term bit here being a technical term referring to base two or log2 numbers, not merely a small piece). In other words, each halving question answered reduces your uncertainty by half and consequently gives you an additional one bit of information. Note that a more mathematical way of expressing the above method of calculating how many bits of information (halving questions) is required to figure something out is as follows: I = -log2(1/N) where I represents information (in bits) and N represents the number of equally likely possibilities. If partial questions (that is, fractional bits) are not allowed under the given circumstances, you need to round the calculated value for I up to the nearest whole number.
|
|||||||||||||||||||||||||||
DNAunion Inactive Member |
I gave you the tools to do that yourself.
You just need to estimate how many equally probable possibilities there are in the Universal set and how many of those remain after each statement is given. Of course, these will be ballpark figures only. *********************************************** Shoot, I'll give a rough idea of how it could be done. ....."Hey, guess what?" It could be any of a trillion different things. Let's assume that's a good number, thus setting the cardinality of the Universal set to 10^12. At this point, after so far not narrowing it down at all, if I were to give you the one thing I was talking about I would be handing you I = -log2(1/10^12) bits of information. That comes to 40 bits of info. But this one statement gives you 0 bits as it eliminates no uncertainty. ....."Hey, guess what my dog just did?" From the Universal set, we could assume that maybe 1,000 things could relate to my dog (these are ballpark figures). Thus, after this more descriptive - more informative - statement, if I were to now give you the one thing I was talking about I would be handing you just I = -log2(1/10^3) bits of information; just 10 bits of info. Therefore, the statement "Hey, guess whay my dog just did?" would contain the amount of information associated with the difference, that is, associated with the reduction in uncertainty. Using the previous ballpark figures: I = 40 bits - 10 bits = 30 bits The final statement, which actually does give the one thing I was going to say, eliminating all possibilities but that one, gives an additional 10 bits of information. So under the numerical assumptions used to get these ballpark figures, the statements would give 0, 30, and 10 bits of information respectively. [This message has been edited by DNAunion, 12-13-2003]
|
|||||||||||||||||||||||||||
DNAunion Inactive Member |
quote: That's a flawed "counter". The person asked me a question for which accurate numbers cannot be assigned. Then, when I assumed certain values to demonstrate the method you "gripe" that I had to make assumptions, implying that therefore the methodology is flawed. Warped logic. (What's the probability that the Earth will be hit by a mile-wide asteroid within the next 5 years? Oh, you can't give me an accurate number so probability theory is invalid???? See how silly that kind of reasoning is). I CAN give accurate information calculations for things to which accurate numbers can be applied. For example, the main demonstrative example I gave involving cards.
quote: Because of the specific question posed to me, which cannot have accurate numbers assigned. It's kind of the old GIGO - if the input into a calculation cannot have accurate numbers assigned, then one shouldn't be at all surprised to learn that the output is not an accurate number. That doesn't mean the processing is invalid.
quote: Which is not what my post was about in the first place. It was about information: an introduction to what it is and how it can be calculated.
quote: Wrong. 1) "My" method of calculating information is correct (a bit simplified, but correct) 2) The specific question someone then asked me about cannot have accurate numbers assigned so assumptions were used. That does not make the method flawed. 3) I made no argument that could be considered circular. You've read more into my statements than were there, then attacked your own interpretation.
quote: Sure there is, but until I can get you to see that the method I used to calculate information is valid, there's no point going off on other tangents that rely upon this foundation. [This message has been edited by DNAunion, 12-14-2003]
|
|||||||||||||||||||||||||||
DNAunion Inactive Member |
Here's a calculation of information that uses accurate numbers, based on my primary example of calcuating information that I gave originally.
quote: To start with, before any cards were eliminated, the one card chosen could have been any one of 52 equally likely possibilities. Thus, if you had told me the card you selected at this point you would have handed me 5.700439718141 bits of information. I = -log2(1/52) = 5.700439718141 bits of information The answser to question 1 eliminated half of the possibilities, giving me 1 bit of information according to what I said about halving uncertainties. That checks out. I’ll show you. After question 1 had been answered, the one card you selected could be any one of just 26; so if you were then to tell me what card you chose you would be handing me 4.700439718141 bits of information. I = -log2(1/26) = 4.700439718141 bits of information So that halving question dropped the number of bits of information needed to determine the one card selected from its starting value of 5.700439718141 to 4.700439718141. That is, the answer to that question gave me exactly 1 bit of information, which matches what I said above. [This message has been edited by DNAunion, 12-14-2003]
|
|||||||||||||||||||||||||||
DNAunion Inactive Member |
For those who want a link to a primer on information...
http://www.lecb.ncifcrf.gov/...paper/primer/latex/index.html [This message has been edited by DNAunion, 12-14-2003]
|
|||||||||||||||||||||||||||
DNAunion Inactive Member |
quote: quote: It’s NOT a bad example of my methodology. It shows quite clearly the link between a reduction in uncertainty and an increase in information; it does exactly what it was intended to do — introduce the main ideas in a very simple way. Then, as the MAIN example, I dealt with cards and probability, which involved more math and thus was less suited for use as an opening example.
quote: It’s not a bad examplethe person who asked the question chose the wrong example of two to ask about in terms of calculating information (at least if one wants numbers other than ballpark figures based on assumptions). The one with the dog shaking hands merely introduces, in a simple manner that even a child could grasp, the link between reduction in uncertainty and gain of informationthat’s it. When I discussed calculating information in my original post, I used the other example that deals with cards and probability — the MAIN example I discussed. If one wanted to see accurate numbers calculated the OBVIOUS choice would have been to ask about the cards example.
quote: No. First, it’s not my methodology. Second, it works even when accurate numbers cannot be assigned: it just doesn’t work quantitatively.
quote: It’s not my methodology. And I assumed the person who selected the wrong example of two to ask about did so for a reason. I assumed the idea was to see if the method could be applied to a wide range of situations, instead of to just those that involve obvious numerical values. It can be, but the resulting numeric values can’t be expected to be accurate since the input numbers cannot be accurately assigned.
quote: Wrong, information theory relies upon probability theory.
quote: quote: To reiterate...it’s not my information theory, and it is valid.
quote: It IS a valid example of increasing information linked to a reduction in uncertainty. But I never claimed or implied that it was a good example of how information can be measured quantitatively, and in fact I repeatedly qualified my statements with words such as ballpark and assume, and, also used a different example when discussing calculating information.
quote: It’s not circular at all. It doesn’t matter what reasonable numbers you use for that example. When you know nothing, your uncertainty is maximum and your information is minimum. When I reduce the number of possibilities by giving you information — telling you that it relates specifically to my dog — I have reduced your uncertainty. ANY legitimate numbers you assign to this example will show that uncertainty drops and information rises. Again, this is NOT my methodology: this is straight-forward (though simplified) information theory.
|
|||||||||||||||||||||||||||
DNAunion Inactive Member |
quote: I think you need to explain exactly how that link shows this very thing, as opposed to it being a case of your interpreting the statements the way you want. Getting off track a bit, but...an organism’s DNA does contain information, stored in the sequence of bases (analogous to the way that these sentences store information in the sequence of characters). This cellular information is encoded instructions used by the cells to direct the synthesis of their constituents and to maintain themselves. That information, stored in the DNA of countless organisms. is there regardless whether or not humans exist or read it. If it weren’t there, living cells and organisms could not exist.
|
|||||||||||||||||||||||||||
DNAunion Inactive Member |
quote: quote: You still haven't explained, and in fact, you seem confused.
quote: No, silly, it is up to YOU to support YOUR statements.
quote: I'm getting the distinct impression YOU don't know what the author is saying. [This message has been edited by DNAunion, 12-15-2003]
|
|||||||||||||||||||||||||||
DNAunion Inactive Member |
quote: Another assertion you must support.
|
|||||||||||||||||||||||||||
DNAunion Inactive Member |
quote: quote: Not who, what. I understand your desire to put a conscious spin on everything — WHO instead of WHAT - trying to twist my position into something it isn’t. It’s typical — substituting tricks for logic and evidence. Information doesn’t require consciousness to exist, as I have made clear in numerous of my preceding posts.
quote: KNOW what to do? Again with the conscious-distorting of my statements. Information doesn’t require consciousness to exist, as I have made clear in numerous of my preceding posts.
quote: And why are those specific biologically important molecules made instead of others? For example, why do our cells make phophoglucoisomerase, triose phosphate isomerase, pyruvate decarboxyolase, etc., instead of only random, nonfunctional polymers? Are those enzymes molecules that are found in nature outside of cells? Nope. Yet your cells, and my cells, and CrashFrog’s cells, and Rei’s cells, and NosyNed’s cells — in fact, all humans’ cells — make these specific molecule. How? FROM THE INFORMATION STORED IN CELL’S DNA USING BASE SEQUENCES. We all have genes that encode the instructions needed for making those specific molecules: the genes specify the exact order that amino acids need to be hooked up in order to end up with phosophoglucoisomerase, triose phosphate isomerase, pyruvate decarboxyolase, etc.
quote: Do enzymes select out of a myriad of molecules they encounter just the one(s) they interact with? Yes. And how do they do that? By their specific three-dimensional conformation. And what determines their conformation? Ultimately, the DNA (genes) that encodes them: they supply the information needed for the cell to synthesize those specific enzymes.
quote: I’m not. DNA really contains information. You are the one who seems to have problems with models. You seem to be confusing the use of a model with a belief that only the model is real: that that which is being modeled does not exist. Do you believe that atoms exist? Maybe you don’t: after all, scientists use models to explain them.
|
|||||||||||||||||||||||||||
DNAunion Inactive Member |
quote: quote: By which I mean exactly what I said. It works, just not quantitatively.
quote: By showing the link between a gain of information and a reduction in uncertainty.
quote: quote: No, I don’t have to show that at all since that is not what I said.
quote: I was correct.
quote: Which isn’t what I said.
quote: Maybe, but it can still tell us that something contains information, even if we can’t accurately measure it. And with DNA, we can actually MEASURE the information contained in some regions. [This message has been edited by DNAunion, 12-15-2003]
|
|||||||||||||||||||||||||||
DNAunion Inactive Member |
quote: quote: Not at all. I’ll explain to you below.
quote: Which is not what you originally said. Look again.
quote: Where does the section you linked to state that information is not an inherent property of DNA, but rather is a clever device for human understanding? Where does it say that accuracy with regard to information is only for human consumption? [This message has been edited by DNAunion, 12-15-2003]
|
|||||||||||||||||||||||||||
DNAunion Inactive Member |
quote: One reason is probability. Under undirected, nonbiological conditions, the probability that carbon will bond with 4 other atoms or groups of atoms is very high, but the probability that hemoglobin will form is virtually zero. For hemoglobin to exist - not just one copy, but copy after copy, and in cell after cell, and in human after human, and in generation after generation - there has to be some information that stores the instructions needed to produce it. Can you guess where that information is stored?
|
|||||||||||||||||||||||||||
DNAunion Inactive Member |
quote: I explained why the information stored in DNA is huge compared to that of carbon bonding with 4 other atoms. Think about the probability of hemoglobin forming again, and that it doesn't occur just one time but rather time after time in cell after cell in human after human in generation after generation, and you should be able to grasp this. Here, let's do a quick BALLPARK calculation (I gotta get back to work so I am throwing this together quickly). What is the probability that a carbon atom will bond to 4 other atoms or groups of atoms? Well, it basically does this all the time except when double or triple bonds are formed. In a hurry, so let's ASSUME that carbon forms 4 bonds 80% of the time. Using the information calculation I = -log2P(e) we get about 0.322 bits of information. What about hemoglobin? Well, its over 500 amino acids long and each of 20 amino acids could be found in any position of a polypeptide thrown together randomly. Let's assume - to be very safe - that any 10 of 20 amino acids could suffice at every position (a HUGE concession) and that hemoglobin is just 500 amino acids long. The probabilty of spontaneous formation, considering only getting an functional sequence, would be about 1 in 2^500, or about 1 in 10^150. I = -log2(1/2^500) = 500 bits of information. That's an astronomically huge difference in information content. And that would be for just ONE gene of thousands in a genome.
|
|||||||||||||||||||||||||||
DNAunion Inactive Member |
quote: No, that is not waht he is saying. What he appears to be saying in the section you linked to is that people shouldn't confuse a CONSENSUS SEQUENCE - which is an abstraction (model) - with actual BINDING SITES - which are actual sequences. In simple terms, if 100 organisms' sequences for a given binding site are aligned and compared, the consensus sequence is the "average" sequence. That is, each site in the consensus sequence has the nucleotide that occurs most frequently at that site in the 100 samples. But it may be that not one of the 100 organisms actually has that sequence for that binding site. So a consensus sequence is an abstraction - a model. Note that the author does NOT use consensus sequences when calculating the information found in DNA binding sites - he uses actual sequences. I don't see any statement - explicit or implicit - by the author in that section you linked to that indicates information in DNA is only for us humans and doesn't actually exist. [This message has been edited by DNAunion, 12-16-2003]
|
|
|
Do Nothing Button
Copyright 2001-2023 by EvC Forum, All Rights Reserved
Version 4.2
Innovative software from Qwixotic © 2024