What is the genome and why are we interested?

Put simply, the genome is the genetic material (made of DNA) that provides the instructions for building a new organism. For example the human genome comprises around six billion pieces of individual chemical information, arranged on 46 separate chromosomes like strings of beads. At any given position each “bead” can have one of four “colours” and the exact order of these colours can be critical for normal function. Locked within the genome are many secrets such as how humans evolved many differences from our closest living relatives (the two chimpanzee species), what makes humans differ from each other, and why we may be more or less likely to develop certain diseases.

How has our understanding of ‘junk DNA’ changed in recent years?

The part of the genome that we understand best is the bit that makes proteins. We know of around 21,000 such protein-making genes but these only comprise about 3% of the human genome. Another 5% shows enough similarity to other mammals that we can conclude that it is doing something, even if we don’t know what that is. But around half our genome consists of repeated sequences that have spread like parasites, so it has previously been thought that much of this and other “spacer” regions had no function – a kind of ancestral junk that has built up as the by-product of the haphazard progress of evolution.

What is the importance of collaborative genomic databases (such as ENCODE)?

ENCODE stands for “Encyclopedia of DNA elements” and this has recently been in the news because this project has announced the first attempt to peer beyond the genes at the scale of an entire human genome. This involved assigning no less than 1,640 measurable properties across the genome as a whole, and then trying to see how these properties correlated with one another to gain an overall picture of how the genome works. ENCODE found that 80% of the genome “lights up” for one or more of these properties, suggesting that a much higher proportion of the genome could be doing something than we previously thought.

How rapidly are research and sequencing techniques developing?

When the draft human genome sequence was announced in 2001, it had taken a 3-billion dollar international effort over the previous decade to obtain just one complete human genome sequence. Now the same work can be done in a single lab in under a week for around £2,000. Indeed, technologies for human genome sequencing have been developing so rapidly that for several years the rate of decrease in sequencing cost has even outstripped improvements in computer processing power (enshrined in Gordon Moore’s “law”). Being able to understand all these data has become as much of a challenge as generating it, so that computer-literate scientists (called bioinformaticians) now play a leading role in the field.

What’s next? Can we ever understand the entire genome?

Integrating all this information into a coherent picture is still a massive challenge. Some problems are simpler than others: for example, within 10 years we can anticipate having an extremely comprehensive picture of the inherited disorders that afflict mankind, and of the major genetic changes that lead to different cancers. This information can be used to design new treatments, but the challenges in making sure that these treatments are safe and effective will be the same as ever. We are still much further away from answering other questions such as why certain individuals get heart disease or diabetes and others don’t – although we know that environmental factors (things like smoking and an unhealthy diet) are at least as important as the genes. And – fortunately I think – we can peer into the genes as much as we want but they will never fully explain the human spirit!