|
Register | Sign In |
|
QuickSearch
Thread ▼ Details |
|
Thread Info
|
|
|
Author | Topic: On the proportion of Nucleotides in the Genome and what it can tell us about Evolutio | |||||||||||||||||||||||||||||
slevesque Member (Idle past 4667 days) Posts: 1456 Joined: |
I love giving long 'old-School' titles
Anyhow, as I hope everyone that will join this discussion knows, statistics is a very interesting subject. One of the properties of a random sequence is that, given enough repetitions, it will always tend to go towards a certain %. I may not be clear, so here is an example: _ If I flip a coin for a very long time, the amount of heads and tails I should register should be close to 50% each. Now, if we start with a simplistic model of mutations where they are a totally random and apply this fact, then given enough time and mutations, the % of each nucleotides in the genome should tend towards 25%. Natural selection, of course, gives no advantage to either of the nucleotides and therefore should not influence this ratio. I have searched for data on this ratio in different animals and I have come close to nothing in terms of actual numbers. From my biology book: - In humans: A=30,3%. T=30,3% G=19,5% C=19,9% - In E.Coli: A = 26%. (And so probably T=26%, but it is not explicitly said). I intend this discussion to be more of a knowledge learning for me, and so it is NOT a debate. Please try to keep this in mind everyone. I'll start off with a couple questions: 1- Can mutations be approximated as totally random, and if not, can/does this change the expected proportions of the nucleotides in the Genome ? 2- Does anyone have more information on such proportions from other species ? The more information we have, the more interesting this will be. 3- Has such an analyses of proportions been done and published in the past ? 4- In the hypothetical situations that it turns out that there is a common trend in the different species to favor the A and T base, what could be the possible explanations, from an evolutionnary perspective, for this ? I consider that if there would be a natural mechanism that favors significantly some nucleotides over others, than this would be a powerful mechanism analog to natural selection in a certain way, and that it's discovery would merit the Nobel Prize ... haha Hopefully, with all the brilliant minds on this forum who have worked in biology and genetics all their life, I will be able to obtain moe information on this idea I had while eating my cereals 2 weeks ago. PS I keep the copright on this idea and anything that stems from it lol
|
|||||||||||||||||||||||||||||
AdminNosy Administrator Posts: 4754 From: Vancouver, BC, Canada Joined: |
Thread copied here from the On the proportion of Nucleotides in the Genome and what it can tell us about Evolutio thread in the Proposed New Topics forum.
|
|||||||||||||||||||||||||||||
New Cat's Eye Inactive Member |
Natural selection, of course, gives no advantage to either of the nucleotides and therefore should not influence this ratio. Why not?
statistics is a very interesting subject. barf
|
|||||||||||||||||||||||||||||
Wounded King Member Posts: 4149 From: Cincinnati, Ohio, USA Joined: |
As CS notes, it is a massive assumption to suggest that Natural Selection has not played an active part in forming this distribution.
That aside there are other factors which affect these ratios. One major factor is the fact that methylated Cytosines, specifically 5-methylcytosine, can be readily converted to Thymine by de-amination. This is one of the most common single nucleotide substitutions observed. This alone could account for a large proportion of the discrepancy. TTFN, WK
|
|||||||||||||||||||||||||||||
slevesque Member (Idle past 4667 days) Posts: 1456 Joined: |
I start with the supposition that each nucleotides has the same amount of information, and also that none of them is statistically favored to be a beneficial mutation. Therefore, natural selection, which favors beneficial mutations, has an equal amount of chance to favor one nucleotides or another, and so its effect on the long term proportion of each in the genome is zero.
Of course, if we were to discover that natural selection favors some nucleotides over others it would probably also merit a Nobel prize ...
|
|||||||||||||||||||||||||||||
slevesque Member (Idle past 4667 days) Posts: 1456 Joined: |
Very interesting, any numbers attached to this ?
Also you seem to be the one most knowledgeable on these subjects on EvC, any chance you have other proportions from other species ?
|
|||||||||||||||||||||||||||||
Wounded King Member Posts: 4149 From: Cincinnati, Ohio, USA Joined: |
You seem to be being obtuse. Your approach would also presumably predict that the frequency of use of all amino acids should also be equal and unaffected by natural selection.
You can't uncouple the fact that functionality is determined by amino acid sequence form the DNA triplets that encode those amino acid sequences. If particular functional domains or structures are persistently coopted to form other proteins then that will affect the proportions of nucleotides dependent on the exact amino acid sequence involved. It is worth noting though that in vertebrates coding sequences actually have a higher GC content than the genomic background so Natural selection appears to be acting to maintain GC sequence that might otherwise be lost. TTFN, WK Edited by Wounded King, : No reason given.
|
|||||||||||||||||||||||||||||
New Cat's Eye Inactive Member |
I start with the supposition that each nucleotides has the same amount of information, How much?
and also that none of them is statistically favored to be a beneficial mutation. Therefore, natural selection, which favors beneficial mutations, has an equal amount of chance to favor one nucleotides or another, None of them being statistically favored doesn't necessitate that NS has an equal chance to favor one nucleotides or another. Beneficial mutations are going to be made of different nucleotide sequences which is where we get the non-random "connection". We can't assume that the nucleotide sequences, themselves, are not going to have an affect on how beneficial a mutation may be.
Of course, if we were to discover that natural selection favors some nucleotides over others it would probably also merit a Nobel prize ... It seems obvious that it does. A prize should be offer to the ones who find out how that 'connection' can be predicted.
|
|||||||||||||||||||||||||||||
Wounded King Member Posts: 4149 From: Cincinnati, Ohio, USA Joined: |
I don't know other proportions off the top of my head. You can presumably find thes out for any organism which has had its genome sequenced, and there are probably rough estimates for many other species.
I'll see if I can find out some more tomorrow, but bed calls for now. In the meantime doing a search for terms like 'GC content' on Entrez/pubmed could turn some interesting stuff up. Another important consideration is that a lot of the genome is made up of highly repetitive sequences, so the composition of those repeated elements could heavily influence the makeup of the whole genome. TTFN, WK Edited by Wounded King, : No reason given. Edited by Wounded King, : No reason given.
|
|||||||||||||||||||||||||||||
Dr Adequate Member (Idle past 311 days) Posts: 16113 Joined: |
Another important consideration is that a lot of the genome is made up of highly repetitive sequences, so the composition of those repeated elements could heavily influence the makeup of the whole genome. I was thinking that, but I couldn't find anything that tells me what the repeated sequences are.
|
|||||||||||||||||||||||||||||
aboutandy Junior Member (Idle past 5333 days) Posts: 1 Joined: |
I don't think that the proportion of nucleotide bases as much to do with natural selection as the actual sequences themselves. DNA codes RNA which then codes for protein. The information for what amino acid to make when is determined by the three nucleotide sequence called the codon. There are only 20 common amino acids but there are 64 different codons so there is redundancy in the genetic code. A mutation can be caused from a single nucleotide being deleted, added or changed to another nucleotide. For example, you may have a DNA sequence like...
TACAAAGCGTTGAAACGCCGG. When spread out into triplets you get TAC-AAA-GCG-TTG-AAA-CGC-CGG. These triplets will determine what amino acids will be made. If we delete a nucleotide, TACAAAGC-TTGAAACGCCGG, then the triplets will be, TAC-AAA-GCT-TGA-AAC-GCC-GG. This is a different sequence of codons and the new codons may code for a new amino acid, or they may not because almost all of the amino acids have more than one codon. The same thing can happen if a nucleotide is added, or a single nucleotide is changed. If the mutation causes a change in the protein function, then I would imagine it could impact natural selection. But I'm not sure if just by looking at the base proportions you can learn much about the organism and how it evolves. There is some interesting things though. You guys talked some about G and C. I assume you all know that in DNA, A-T and G-C. Well what is cool is that G and C have a strong bond than A and T. So if you were to look at a bacteria who lives in thermal vents, I would expect you to find that its GC content was higher than the AT content since G-C bonds are more stable. And there is a lot of repeated sequences in the genome. I forgot what they mean, but I remember learning that you can find CG repeating regions before some genes. Wounded Kind said that cytosines are often times methylated and its in these regions where a lot of that methylation occurs. Methylation actually causes genes to be inactivated, though.
|
|||||||||||||||||||||||||||||
Dr Adequate Member (Idle past 311 days) Posts: 16113 Joined: |
2- Does anyone have more information on such proportions from other species ? The more information we have, the more interesting this will be. I found something that might interest you. Here's a table of 899 bacterial species. The figures in the row on the far right of the table are the AT content of the bacteria given as a percentage. This ranges from 83.4% to 25.1%. And here are 67 species of archaea. The AT content ranges from 72.4% to 34.1%. They don't seem to have eukaryotes, which is a shame. Edited by Dr Adequate, : No reason given.
|
|||||||||||||||||||||||||||||
Dr Adequate Member (Idle past 311 days) Posts: 16113 Joined: |
Anyhow, as I hope everyone that will join this discussion knows, statistics is a very interesting subject. One of the properties of a random sequence is that, given enough repetitions, it will always tend to go towards a certain %. I may not be clear, so here is an example: _ If I flip a coin for a very long time, the amount of heads and tails I should register should be close to 50% each. Now, if we start with a simplistic model of mutations where they are a totally random and apply this fact, then given enough time and mutations, the % of each nucleotides in the genome should tend towards 25%. No, wait a moment. Given your assumptions about mutation and selection (which, as WK and others have pointed out, might be incorrect) you're still doing the statistics wrong. Because what you're describing is a random walk. Think about coin-tossing again. It is true that if you keep tossing the coin, the ratio of heads divided by tails will tend to 1. But it is not true that the difference of heads minus tails will tend to 0. It doesn't particularly tend to anything. In your selectively-neutral model, then, the AT content will simply go on a random walk as time goes by. We expect its average value over time to be 50%, but we have no grounds for thinking that it will have this value at any particular time (such as now) still less that it will tend to this value, or indeed to any value.
|
|||||||||||||||||||||||||||||
Peepul Member (Idle past 5045 days) Posts: 206 Joined: |
quote: I think it does Dr A - the expectation value of the excess of heads or tails is the square root of n, where n is the number of tosses Edited by Peepul, : No reason given. Edited by Peepul, : No reason given. Edited by Peepul, : No reason given. Edited by Peepul, : No reason given.
|
|||||||||||||||||||||||||||||
Dr Jack Member Posts: 3514 From: Immigrant in the land of Deutsch Joined: Member Rating: 8.3 |
You'll find G-C figures easier to get hold of, probably, because they're the one's traditionally used. G-C %ages were long used as a means of classifying microbial species, so there's a lot out there about them. They're increasingly being replaced by more sophisticated phylogenies based on actual sequences.
As others have noted your assumption that the %ages are not adaptive is simply false. A-T pairs have a double bond, G-C pairs have a triple bond; this means that A-T pairs are easier to seperate, and G-C are harder. So, for example, the origins of replication on genomes (of which there are many on eukaryotic chromosomes, and some archael chromosomes and one on bacterial and most archael chromosomes) have a very high proportion of A-T pairs because these can be easily seperated. Thermophillic bacteria have higher levels of G-C pairings that their mesophilic relatives (that is, bacteria adapted to higher temperatures have more of the triple bonding G-C pairs which is thought to aid DNA stability). However, among archaea (the true kings of high temperature living), this link between temperature and G-C content doesn't hold. Species such as Pyrolobus fumarii (which can survive at up to 113 oC and grows best at 106) use protein and enzyme chaperones to mphilic relA stability instead. Finally, there are also specific sequences that perform non-coding functions that can be higher or lower in the two - e.g. the TATA box which indicates gene starts. So, you see, the picture is much more complicated than a simple random variation between the letters.
|
|
|
Do Nothing Button
Copyright 2001-2023 by EvC Forum, All Rights Reserved
Version 4.2
Innovative software from Qwixotic © 2024