Understanding through Discussion

QuickSearch

 Welcome! You are not logged in. [ Login ] EvC Forum active members: 85 (8942 total)
 26 online now: AZPaul3, Gospel Preacher, jar, kjsimons (4 members, 22 visitors) Newest Member: LaLa dawn Post Volume: Total: 863,840 Year: 18,876/19,786 Month: 1,296/1,705 Week: 102/446 Day: 102/64 Hour: 0/3

EvC Forum Science Forums Biological Evolution

# On the proportion of Nucleotides in the Genome and what it can tell us about Evolutio

Author Topic:   On the proportion of Nucleotides in the Genome and what it can tell us about Evolutio
slevesque
Member (Idle past 2929 days)
Posts: 1456
Joined: 05-14-2009

 Message 1 of 61 (524288) 09-15-2009 5:13 PM

I love giving long 'old-School' titles

Anyhow, as I hope everyone that will join this discussion knows, statistics is a very interesting subject. One of the properties of a random sequence is that, given enough repetitions, it will always tend to go towards a certain %. I may not be clear, so here is an example:

_ If I flip a coin for a very long time, the amount of heads and tails I should register should be close to 50% each.

Now, if we start with a simplistic model of mutations where they are a totally random and apply this fact, then given enough time and mutations, the % of each nucleotides in the genome should tend towards 25%. Natural selection, of course, gives no advantage to either of the nucleotides and therefore should not influence this ratio.

I have searched for data on this ratio in different animals and I have come close to nothing in terms of actual numbers. From my biology book:

- In humans: A=30,3%. T=30,3% G=19,5% C=19,9%

- In E.Coli: A = 26%. (And so probably T=26%, but it is not explicitly said).

I intend this discussion to be more of a knowledge learning for me, and so it is NOT a debate. Please try to keep this in mind everyone. I'll start off with a couple questions:

1- Can mutations be approximated as totally random, and if not, can/does this change the expected proportions of the nucleotides in the Genome ?

2- Does anyone have more information on such proportions from other species ? The more information we have, the more interesting this will be.

3- Has such an analyses of proportions been done and published in the past ?

4- In the hypothetical situations that it turns out that there is a common trend in the different species to favor the A and T base, what could be the possible explanations, from an evolutionnary perspective, for this ?

I consider that if there would be a natural mechanism that favors significantly some nucleotides over others, than this would be a powerful mechanism analog to natural selection in a certain way, and that it's discovery would merit the Nobel Prize ... haha

Hopefully, with all the brilliant minds on this forum who have worked in biology and genetics all their life, I will be able to obtain moe information on this idea I had while eating my cereals 2 weeks ago.

PS I keep the copright on this idea and anything that stems from it lol

 Replies to this message: Message 3 by New Cat's Eye, posted 09-15-2009 5:44 PM slevesque has responded Message 4 by Wounded King, posted 09-15-2009 5:58 PM slevesque has responded Message 12 by Dr Adequate, posted 09-15-2009 11:15 PM slevesque has not yet responded Message 13 by Dr Adequate, posted 09-15-2009 11:50 PM slevesque has not yet responded Message 15 by Dr Jack, posted 09-16-2009 6:30 AM slevesque has not yet responded

Posts: 4754
Joined: 11-11-2003

 Message 2 of 61 (524293) 09-15-2009 5:40 PM

Thread Copied from Proposed New Topics Forum

New Cat's Eye
Inactive Member

 Message 3 of 61 (524295) 09-15-2009 5:44 PM Reply to: Message 1 by slevesque09-15-2009 5:13 PM

 Natural selection, of course, gives no advantage to either of the nucleotides and therefore should not influence this ratio.

Why not?

 statistics is a very interesting subject.

barf

 This message is a reply to: Message 1 by slevesque, posted 09-15-2009 5:13 PM slevesque has responded

 Replies to this message: Message 5 by slevesque, posted 09-15-2009 6:00 PM New Cat's Eye has responded

Wounded King
Member (Idle past 2384 days)
Posts: 4149
From: Edinburgh, Scotland
Joined: 04-09-2003

 Message 4 of 61 (524299) 09-15-2009 5:58 PM Reply to: Message 1 by slevesque09-15-2009 5:13 PM

As CS notes, it is a massive assumption to suggest that Natural Selection has not played an active part in forming this distribution.

That aside there are other factors which affect these ratios. One major factor is the fact that methylated Cytosines, specifically 5-methylcytosine, can be readily converted to Thymine by de-amination. This is one of the most common single nucleotide substitutions observed. This alone could account for a large proportion of the discrepancy.

TTFN,

WK

 This message is a reply to: Message 1 by slevesque, posted 09-15-2009 5:13 PM slevesque has responded

 Replies to this message: Message 6 by slevesque, posted 09-15-2009 6:04 PM Wounded King has responded

slevesque
Member (Idle past 2929 days)
Posts: 1456
Joined: 05-14-2009

 Message 5 of 61 (524300) 09-15-2009 6:00 PM Reply to: Message 3 by New Cat's Eye09-15-2009 5:44 PM

I start with the supposition that each nucleotides has the same amount of information, and also that none of them is statistically favored to be a beneficial mutation. Therefore, natural selection, which favors beneficial mutations, has an equal amount of chance to favor one nucleotides or another, and so its effect on the long term proportion of each in the genome is zero.

Of course, if we were to discover that natural selection favors some nucleotides over others it would probably also merit a Nobel prize ...

 This message is a reply to: Message 3 by New Cat's Eye, posted 09-15-2009 5:44 PM New Cat's Eye has responded

 Replies to this message: Message 7 by Wounded King, posted 09-15-2009 6:25 PM slevesque has not yet responded Message 8 by New Cat's Eye, posted 09-15-2009 6:47 PM slevesque has not yet responded

slevesque
Member (Idle past 2929 days)
Posts: 1456
Joined: 05-14-2009

 Message 6 of 61 (524301) 09-15-2009 6:04 PM Reply to: Message 4 by Wounded King09-15-2009 5:58 PM

Very interesting, any numbers attached to this ?

Also you seem to be the one most knowledgeable on these subjects on EvC, any chance you have other proportions from other species ?

 This message is a reply to: Message 4 by Wounded King, posted 09-15-2009 5:58 PM Wounded King has responded

 Replies to this message: Message 9 by Wounded King, posted 09-15-2009 7:02 PM slevesque has not yet responded

Wounded King
Member (Idle past 2384 days)
Posts: 4149
From: Edinburgh, Scotland
Joined: 04-09-2003

 Message 7 of 61 (524308) 09-15-2009 6:25 PM Reply to: Message 5 by slevesque09-15-2009 6:00 PM

You seem to be being obtuse. Your approach would also presumably predict that the frequency of use of all amino acids should also be equal and unaffected by natural selection.

You can't uncouple the fact that functionality is determined by amino acid sequence form the DNA triplets that encode those amino acid sequences. If particular functional domains or structures are persistently coopted to form other proteins then that will affect the proportions of nucleotides dependent on the exact amino acid sequence involved. It is worth noting though that in vertebrates coding sequences actually have a higher GC content than the genomic background so Natural selection appears to be acting to maintain GC sequence that might otherwise be lost.

TTFN,

WK

Edited by Wounded King, : No reason given.

 This message is a reply to: Message 5 by slevesque, posted 09-15-2009 6:00 PM slevesque has not yet responded

New Cat's Eye
Inactive Member

 Message 8 of 61 (524312) 09-15-2009 6:47 PM Reply to: Message 5 by slevesque09-15-2009 6:00 PM

 I start with the supposition that each nucleotides has the same amount of information,

How much?

 and also that none of them is statistically favored to be a beneficial mutation. Therefore, natural selection, which favors beneficial mutations, has an equal amount of chance to favor one nucleotides or another,

None of them being statistically favored doesn't necessitate that NS has an equal chance to favor one nucleotides or another. Beneficial mutations are going to be made of different nucleotide sequences which is where we get the non-random "connection". We can't assume that the nucleotide sequences, themselves, are not going to have an affect on how beneficial a mutation may be.

 Of course, if we were to discover that natural selection favors some nucleotides over others it would probably also merit a Nobel prize ...

It seems obvious that it does. A prize should be offer to the ones who find out how that 'connection' can be predicted.

 This message is a reply to: Message 5 by slevesque, posted 09-15-2009 6:00 PM slevesque has not yet responded

Wounded King
Member (Idle past 2384 days)
Posts: 4149
From: Edinburgh, Scotland
Joined: 04-09-2003

 Message 9 of 61 (524315) 09-15-2009 7:02 PM Reply to: Message 6 by slevesque09-15-2009 6:04 PM

I don't know other proportions off the top of my head. You can presumably find thes out for any organism which has had its genome sequenced, and there are probably rough estimates for many other species.

I'll see if I can find out some more tomorrow, but bed calls for now. In the meantime doing a search for terms like 'GC content' on Entrez/pubmed could turn some interesting stuff up.

Another important consideration is that a lot of the genome is made up of highly repetitive sequences, so the composition of those repeated elements could heavily influence the makeup of the whole genome.

TTFN,

WK

Edited by Wounded King, : No reason given.

Edited by Wounded King, : No reason given.

 This message is a reply to: Message 6 by slevesque, posted 09-15-2009 6:04 PM slevesque has not yet responded

 Replies to this message: Message 10 by Dr Adequate, posted 09-15-2009 8:35 PM Wounded King has not yet responded

Member
Posts: 16107
Joined: 07-20-2006
Member Rating: 8.0

 Message 10 of 61 (524322) 09-15-2009 8:35 PM Reply to: Message 9 by Wounded King09-15-2009 7:02 PM

 Another important consideration is that a lot of the genome is made up of highly repetitive sequences, so the composition of those repeated elements could heavily influence the makeup of the whole genome.

I was thinking that, but I couldn't find anything that tells me what the repeated sequences are.

 This message is a reply to: Message 9 by Wounded King, posted 09-15-2009 7:02 PM Wounded King has not yet responded

 Replies to this message: Message 11 by aboutandy, posted 09-15-2009 10:01 PM Dr Adequate has not yet responded

Junior Member (Idle past 3595 days)
Posts: 1
Joined: 09-15-2009

 Message 11 of 61 (524326) 09-15-2009 10:01 PM Reply to: Message 10 by Dr Adequate09-15-2009 8:35 PM

I don't think that the proportion of nucleotide bases as much to do with natural selection as the actual sequences themselves. DNA codes RNA which then codes for protein. The information for what amino acid to make when is determined by the three nucleotide sequence called the codon. There are only 20 common amino acids but there are 64 different codons so there is redundancy in the genetic code. A mutation can be caused from a single nucleotide being deleted, added or changed to another nucleotide. For example, you may have a DNA sequence like...

TACAAAGCGTTGAAACGCCGG. When spread out into triplets you get

TAC-AAA-GCG-TTG-AAA-CGC-CGG. These triplets will determine what amino acids will be made. If we delete a nucleotide,

TACAAAGC-TTGAAACGCCGG, then the triplets will be,

TAC-AAA-GCT-TGA-AAC-GCC-GG. This is a different sequence of codons and the new codons may code for a new amino acid, or they may not because almost all of the amino acids have more than one codon. The same thing can happen if a nucleotide is added, or a single nucleotide is changed. If the mutation causes a change in the protein function, then I would imagine it could impact natural selection. But I'm not sure if just by looking at the base proportions you can learn much about the organism and how it evolves.

There is some interesting things though. You guys talked some about G and C. I assume you all know that in DNA, A-T and G-C. Well what is cool is that G and C have a strong bond than A and T. So if you were to look at a bacteria who lives in thermal vents, I would expect you to find that its GC content was higher than the AT content since G-C bonds are more stable.

And there is a lot of repeated sequences in the genome. I forgot what they mean, but I remember learning that you can find CG repeating regions before some genes. Wounded Kind said that cytosines are often times methylated and its in these regions where a lot of that methylation occurs. Methylation actually causes genes to be inactivated, though.

 This message is a reply to: Message 10 by Dr Adequate, posted 09-15-2009 8:35 PM Dr Adequate has not yet responded

Member
Posts: 16107
Joined: 07-20-2006
Member Rating: 8.0

 Message 12 of 61 (524333) 09-15-2009 11:15 PM Reply to: Message 1 by slevesque09-15-2009 5:13 PM

 2- Does anyone have more information on such proportions from other species ? The more information we have, the more interesting this will be.

I found something that might interest you. Here's a table of 899 bacterial species. The figures in the row on the far right of the table are the AT content of the bacteria given as a percentage. This ranges from 83.4% to 25.1%.

And here are 67 species of archaea. The AT content ranges from 72.4% to 34.1%.

They don't seem to have eukaryotes, which is a shame.

Edited by Dr Adequate, : No reason given.

 This message is a reply to: Message 1 by slevesque, posted 09-15-2009 5:13 PM slevesque has not yet responded

Member
Posts: 16107
Joined: 07-20-2006
Member Rating: 8.0

 Message 13 of 61 (524341) 09-15-2009 11:50 PM Reply to: Message 1 by slevesque09-15-2009 5:13 PM

Statistics --- ur doin it rong
 Anyhow, as I hope everyone that will join this discussion knows, statistics is a very interesting subject. One of the properties of a random sequence is that, given enough repetitions, it will always tend to go towards a certain %. I may not be clear, so here is an example:_ If I flip a coin for a very long time, the amount of heads and tails I should register should be close to 50% each.Now, if we start with a simplistic model of mutations where they are a totally random and apply this fact, then given enough time and mutations, the % of each nucleotides in the genome should tend towards 25%.

No, wait a moment.

Given your assumptions about mutation and selection (which, as WK and others have pointed out, might be incorrect) you're still doing the statistics wrong. Because what you're describing is a random walk.

Think about coin-tossing again. It is true that if you keep tossing the coin, the ratio of heads divided by tails will tend to 1. But it is not true that the difference of heads minus tails will tend to 0. It doesn't particularly tend to anything.

In your selectively-neutral model, then, the AT content will simply go on a random walk as time goes by. We expect its average value over time to be 50%, but we have no grounds for thinking that it will have this value at any particular time (such as now) still less that it will tend to this value, or indeed to any value.

 This message is a reply to: Message 1 by slevesque, posted 09-15-2009 5:13 PM slevesque has not yet responded

 Replies to this message: Message 14 by Peepul, posted 09-16-2009 6:06 AM Dr Adequate has responded

Peepul
Member (Idle past 3307 days)
Posts: 206
Joined: 03-13-2009

 Message 14 of 61 (524354) 09-16-2009 6:06 AM Reply to: Message 13 by Dr Adequate09-15-2009 11:50 PM

Re: Statistics --- ur doin it rong
quote:
Think about coin-tossing again. It is true that if you keep tossing the coin, the ratio of heads divided by tails will tend to 1. But it is not true that the difference of heads minus tails will tend to 0. It doesn't particularly tend to anything.

I think it does Dr A - the expectation value of the excess of heads or tails is the square root of n, where n is the number of tosses

Edited by Peepul, : No reason given.

Edited by Peepul, : No reason given.

Edited by Peepul, : No reason given.

Edited by Peepul, : No reason given.

 This message is a reply to: Message 13 by Dr Adequate, posted 09-15-2009 11:50 PM Dr Adequate has responded

 Replies to this message: Message 16 by Dr Adequate, posted 09-16-2009 8:03 AM Peepul has not yet responded

Dr Jack
Member (Idle past 394 days)
Posts: 3507
From: Leicester, England
Joined: 07-14-2003

 Message 15 of 61 (524355) 09-16-2009 6:30 AM Reply to: Message 1 by slevesque09-15-2009 5:13 PM

You'll find G-C figures easier to get hold of, probably, because they're the one's traditionally used. G-C %ages were long used as a means of classifying microbial species, so there's a lot out there about them. They're increasingly being replaced by more sophisticated phylogenies based on actual sequences.

As others have noted your assumption that the %ages are not adaptive is simply false. A-T pairs have a double bond, G-C pairs have a triple bond; this means that A-T pairs are easier to seperate, and G-C are harder. So, for example, the origins of replication on genomes (of which there are many on eukaryotic chromosomes, and some archael chromosomes and one on bacterial and most archael chromosomes) have a very high proportion of A-T pairs because these can be easily seperated.

Thermophillic bacteria have higher levels of G-C pairings that their mesophilic relatives (that is, bacteria adapted to higher temperatures have more of the triple bonding G-C pairs which is thought to aid DNA stability). However, among archaea (the true kings of high temperature living), this link between temperature and G-C content doesn't hold. Species such as Pyrolobus fumarii (which can survive at up to 113 oC and grows best at 106) use protein and enzyme chaperones to mphilic relA stability instead.

Finally, there are also specific sequences that perform non-coding functions that can be higher or lower in the two - e.g. the TATA box which indicates gene starts.

So, you see, the picture is much more complicated than a simple random variation between the letters.

 This message is a reply to: Message 1 by slevesque, posted 09-15-2009 5:13 PM slevesque has not yet responded

 Date format: mm-dd-yyyy Timezone: ET (US)