A complex genomic jigsaw puzzle

The human genome is not a monolithic entity but has been constantly changing throughout the  evolution of the species. The main reason behind is that when new copies of the genome of each individual are generated during reproduction, replication errors (mutations) introduced by the cellular replication machinery which can occur spuriously and be inherited from the next generation onwards.

From a genomic point of view, the magnitude of the error can range from a simple nucleotide change (called single nucleotide variant or SNV) to duplicating/deleting large fragments of the genome (called copy number variants or CNVs), as well as switching the orientation in the genome or rearranging a genomic fragment in a new genomic positions. The most common type of mutation in the human genome is SNV. However, humans also show extensive CNVs compared to other species.

From a functional point of view, all these types of changes can have important phenotypic consequences in the offspring and, ultimately, in the fate of the species when affecting functional genomic elements such as genes.

From an evolutionary point of view, new mutations that modify the phenotype of an individual are the substrate of natural selection. In the most simplistic model of selection, a mutation that confers a higher fitness to the carrier compared to non-carrier individuals will tend to non-stochastically rise in frequency in the population and, ultimately, reach fixation. Conversely, a mutation  that confers a smaller fitness to the carriers compared to non-carriers will be detrimental and erased from the population. Obviously, much more complex evolutionary patterns exist in nature (i.e. multiple genes contributing to a phenotype, ancient ongoing balancing selection, or selection on standing selection among others). However, detecting the fingerprint of these evolutionary events in the genome is more complex than in a simple selective sweep.

For SNVs, several examples of genomic regions have been reported in the literature (i.e. adulthood lactose tolerance and skin pigmentation among others). Nevertheless, little is known about the selective pressures acting on genomic rearrangements and CNVs and their role in the etiology of current complex phenotypes, including diseases.

In the first instance, the last statement certainly seems a counter-intuitive nonsense. How can something that has been selected for increasing the reproductive fitness and henceforth considered as beneficial for the carriers be associated to a disease? However, when digging a bit in the theory of Natural Selection, this scenario of positively selecting a variant that it is causal of current diseases is more than plausible. We must take into account that natural adaptation is result of genes and environment acting at individual level and mostly before and during reproductive ages. As a consequence, a functional change that increases the reproductive fitness of the carrier but has detrimental effects for the individual after reproductive age would still be under positive selective pressures and increase in frequency in the population. However, this is not a sine qua non condition for a genetic variant under positive selection in the past and showing detrimental effects in the present. Natural Selection does not work following an established master plan, but acting on the available genetic diversity and environmental conditions at the time. This time dimension has dramatic effect in the interpretation of positive selection:

  • A genetic variant that was ascertained in the past for a given environment could be detrimental nowadays due to an environmental change. For example, the thrifty gene hypothesis proposes that genetic variants associated to metabolic efficiency and energy storage increased in frequency across the populations in the past as a response to recurrent famines. However, these variants could be harmful at present  in the rich food energy environment of occidental diet and associated with phenotypes such as obesity or diabetes.


  • By introgression with other related species such as Neanderthals or Denisovans, our ancestors could have incorporated genetic variants specific from these species. Since these species were well adapted to their environments at the time when anatomically modern humans arrived from Africa, humans could have enhanced their adaptation to the new environments by means of this archaic admixture. Nevertheless, although this scenario has been observed for some loci, the archaic hybridization has also a main negative impact in the genome of humans. Most of the introgression has been lost due to purifying selection and it has been shown that some introgressed genetic variants play a role in complex diseases.


  • A supported evolutionary genomic change by natural selection in the past could promote nowadays new disease-associated genomic changes that would unlikely to naturally happen otherwise. This scenario is particularly important in the case of rearrangements and CNVs. For example, a rearrangement allowing increasing or decreasing the dosage of a gene or genes could have been selectively advantageous in the past. However, further favourable modifying the gene dosage modification by the pattern of the rearrangement could have negative side effects lately.


The latest point is the scenario that Nuttle et al reported in http://www.nature.com/nature/journal/v536/n7615/full/nature19075.html

Nuttle and colleagues have recently studied the evolutionary history of the 16p11.2 region in humans and the homologous region in other primate species. In previous studies, it was shown that recurrent copy number variation (CNV) at chromosome 16p11.2 accounts for approximately 1% of cases of autism.

Their analyses show that this region has undergone a large number of complex chromosomal rearrangements and duplications during primate evolution and particularly at the human lineage. In particular, the authors have shown that these rearrangements at the primate lineage have provided the genomic scenario for further human-specific rearrangements and fragment duplications. Interestingly, these human-specific duplications have provided the substratum for the rise of  a CNV region with a block size of 102-kbp cassette, containing a set of genes — BOLA2, SLX1 and SULT1A3 —. involved in autism. The authors have shown that the number of copies of BOLA2 modifies the degree of expression of the gene and protein levels, thus providing evidence of functional involvement for the CNV.

If these results are interesting per se for understanding the evolutionary history of this genomic region, more astonishing information could be concluded  while analyzing the genetic variation present at this locus. Based on the number of copies of BOLA2 in current populations (four or more in 99.8% of humans), the presence of even a higher number of copies in an ancient human sample from ~45,000 years ago, the absence of polyploidy in Neanderthals and Denisovans, the lack of evidence of archaic introgression in this region and the presence of a high frequency of rare variants, Nuttle and colleagues conclude that the presence of such large number of copies in humans is not by a stochastic process, but by the action of positive selection.

What are the implications of these findings for autism risk? According to the authors, human evolution would have directionally promoted the increase in the number of copies of the gene at expenses of creating genomic regions (breakpoints) flanking the CNV of high-identity. A collateral side effect of such high-identity breakpoints would be an increased probability of conducting recurrent unequal crossover during the creation of the gametes and the ultimate creation of microdeletions at the 16p11.2 region that have been associated to autism.