Analysis and Models of Genomic Variation

Computational Biology Laboratory

The genomic code that constitutes DNA in all forms of life is responsible for the enormous variation in life that we see around us.

This phenotypic variation that is observed on the scale of the organism, such as the camouflage of fur or the shape of a capsid constituting a viral coat, is ultimately a function of the genomic sequence. This variation occurs because changes that occur in the genetic sequence have functional consequences that affect these phenotypes.

The genetic code changes largely due to either errors that occur during DNA replication, or due to environmental mutagens that induce these changes. This mechanism of change is vital; it allows species to adapt to changes in the environment, as described by Darwinian evolution, which has resulted in the vast complexities of life we see today. It also has negative consequences; species can be outcompeted and ultimately perish. Examples of this include the extinct Dodo, but cancer is also an example of a growing entity that is out-competing the pressures exerted by the immune system of the host. Analysing the variation in these systems and the correlation with the environment is the aim of genomics.

This variation can take a variety of forms. Firstly, the genome can vary at individual bases, producing single nucleotide variants – the BRAF mutation that contributes to a majority of melanomas is usually the single change V600E. Analysing such variation involves Mendelian statistics. Secondly, there can be larger scale changes, such as changes in the number of copies – wheat is a hexaploid genome for example, resulting from the fusion of three grass genomes. Analysing such behaviour involves stochastic process such as hidden markov models (HMMs). Thirdly, the genome can undergo a range of complex rearrangements - this leads to the chaotic genomes observed in cancer, the process of reassortment can produce vastly different progeny from parent viral clones leading to catastrophic events such as flu zoonoses (flu strains crossing host species). These events are discrete in nature and their analyses generally involve structures from graph theory. Fourthly, the proteins produced from DNA affect function and can be expressed at varying levels. Fifthly, (eukaryotic) dna is folded into complex twists of dna around protein blocks called histones, all of which can vary due to various modification processes such as methylation, a structure which also varies temporally and spatially within the cell.

All of this variation can be analysed with a range of techniques from next generation sequencing experiments that include paired-end sequencing, rna-SEQ, ChIP-seq and Hi-C. In particular, some of the projects we are interested in include:

Within and between host flu dynamics
Rearrangement phylogenies in cancer
Integrative approaches of chromosomal conformation capture
Association of variation with phenotype

Many of these projects are in partnership with The Genome Analysis Centre (www.tgac.ac.uk), a BBSRC institute specialising in plant, animal and microbial genomics.

References

Greenman C et al. Estimation of Rearrangement Phylogeny in Cancer, Genome Research, Accepted.
Greenman CD, et al., PICNIC: an algorithm to predict absolute allelic copy number variation with microarray cancer data. Biostatistics. 2010 Jan;11(1):164-75.
Bignell GR*, Greenman CD*, et al. Signatures of mutation and selection in the cancer genome. Nature. 2010 Feb 18;463(7283):893-8.
Greenman C, et al. Patterns of somatic mutation in human cancer genomes. Nature. 2007 Mar 8;446(7132):153-8.
Greenman C, et al. Statistical analysis of pathogenicity of somatic mutations in cancer. Genetics. 2006 Aug;173(4):2187-98.

Research Team

Dr. Chris Greenman