I have some questions and would appreciate any non-complex replies (or at least not too complex!). First off: I understand that the majority of genes code for proteins. What do *most* of the non-protein coding genes code for?
Non-protein DNA doesn't "code" for anything, but it does have a variety of roles. Before each gene there are patches of DNA (promoter regions) that the DNA transcription proteins can attach to and begin transcription (transcription being the creation of mRNA from DNA, this mRNA is then translated into an amino acid sequence which later becomes a protein). Other sections of DNA act to increase or decrease the quantities in which a gene is transcribed.
The centromere consists of particular sequences whilst at origins of replication (the points where DNA copying begins) there are large numbers of A=T bonds which are easier to separate than C=G bonds. At the start of a chromosome there are telomeres, repetitive sections of DNA that act as a protective buffer and account for that fact that DNA copying misses the first few base pairs each time it makes a copy.
Then there are pseudogenes which code for proteins, but those proteins are malformed and never achieve any functional role and are tagged with ubiquitin and broken down again; and pseudogenes which are missing promoter regions and thus never get transcribed. Dead copies of transposons missing their inverted repeat regions, inactive retroviruses copies and the like make up some of the rest. And then there are numerous repeat sequences of differing lengths that pepper the genome.
In summary then: some of the non-coding DNA is functional, in that it performs a vital role in determining how coding DNA operates or fulfils a role during meiosis or DNA copying but most of it is simply junk. However, if you remove the junk from the DNA it would no longer function. Why? Because many of the other functions of DNA require spacing between the elements in order that the DNA can be twisted back on itself; this is particular true of transcription control elements, which can be tens of thousands of base pairs from the gene they control and require the intervening sections of "junk" in order to reach their active position.
My second question is: It seems to me that proteins are far more interesting in trying to get at the heart of life than genes. If proteins are the building blocks of living things (akin to, say, orgainic Lego bricks), then what makes millions of proteins organise themselves so exquisitely? Whilst I accept that self-organisation according to chemical and physical laws (charge affinities and such) must be crucial here, I cannot for the life of me grasp how conglomerates of proteins further and further organise themselves - and how, on a macroscopic level, it all hangs together so to speak.
The first thing you need to realise is that proteins don't just slop out willy-nilly. In fact the pathways that a protein takes from gene to destination (so-called protein targeting) is a complex process. In most cases what happens is that the polypeptide chain produced by translation has additional amino acid sequences at its terminal end that are recognised by other proteins and help guide the protein to its destination. So, for example, a particular amino acid sequence instructs that a protein be exported from a cell, other sequences will ensure its attachment to the outside of the cell, or the positioning of it so that it spans the membrane.
The second thing you need to realise is that cells are not simple blobs, inside the cell is a complex array of filaments (the cytoskeleton) which both maintain the shape of the cell, and provide "routes" along which other proteins "walk". The cell is highly ordered structure, and its this structure that allows the cell to work.
So, while in principle, you can understand an organism from its DNA; in practice DNA alone will not work to create an organism.
Finally, returning to your point about proteins being more interesting at getting to the heart of life than genes, there's a certain truth to that but, as it turns out, the two are very tightly interlinked so you cannot easily understand one without the other. Proteins are required to operate DNA, DNA controls not just the proteins that will be synthesized but often also when and in what numbers they will be. Also, because genes, unlike proteins, come located in one easy to access package, they provide a single source from which much can be deduced and which evolutionary lineages can be studied.