Under the neutral model, we can use the observed number of types to estimate which model of drift (i.e., which value of θ) best fits the data. Neutral theory provides a good approximation to the expected number of types given θ (see equation 11 in Ewens 1972), but population genetic theory also lets us generate the haplotype frequency spectrum as well (equation 6 in Ewens 1972)
$$ E(k; x_1, x_2) = \theta \int_{x_1}^{x_2} x^{-1}(1-x)^{\theta-1}dx $$ where $$ N^{-1} \le x_1 \le x_2 \le 1 $$
But what happens if the data are not neutral? In short, they look different!
$$ E(k; x_1, x_2) = \theta \int_{x_1}^{x_2} x^{-1}(1-x)^{\theta-1}dx $$ where $$ N^{-1} \le x_1 \le x_2 \le 1 $$
But what happens if the data are not neutral? In short, they look different!
In this simulation, we see the effects of haplotypes that are not neutral, but are instead under selection. All haplotypes start with fitness s = 1, and new mutations arise with a constant 5% probability of increasing the fitness of the haplotype they appear on by s = 0.1. While neutral haplotypes tend to arise and be lost quickly, haplotypes under selection can sometimes reach much higher frequencies than their neutral counterparts. The oscillations occur because as a high-fitness haplotype reaches higher frequency, the probability of yet another fitness-increasing mutation arising on that haplotype (and hence leading to a new haplotype with even higher fitness) also increases. As before, haplotype traces are colored red if they survive to the present.
These differences in behavior form the basis for testing for deviations from neutrality. We can detect a statistical departure from the haplotype frequency spectrum expected under neutrality (i.e., as described by the equation above) using Slatkin’s exact test. This test asks how unlikely the observed haplotype frequency spectrum is under neutrality, as compared to all other possible haplotype frequency spectra given the sample size n and the number of observed haplotypes k. To do this, we need to know the likelihood of seeing a given haplotype frequency spectrum, which can be calculated using the following equation (from equation 30 in Ewens 1972)
\[ Pr({n_1,…,n_k│k,n,neutrality}) = {\frac{n!}{{k! {l_k}{n_1}\cdot {n_2}\cdots {n_k} }}} \]
where n is again the number of chromosomes sampled and lk,n are (for those with a desire to know) unsigned Stirling numbers of the first kind.
To explore how likely datasets are to be neutral, click through the datasets from the different research fields below ('Simulate', 'Timor', 'Baby Names', 'Tree Species'), or upload your own ('Custom Data').
These differences in behavior form the basis for testing for deviations from neutrality. We can detect a statistical departure from the haplotype frequency spectrum expected under neutrality (i.e., as described by the equation above) using Slatkin’s exact test. This test asks how unlikely the observed haplotype frequency spectrum is under neutrality, as compared to all other possible haplotype frequency spectra given the sample size n and the number of observed haplotypes k. To do this, we need to know the likelihood of seeing a given haplotype frequency spectrum, which can be calculated using the following equation (from equation 30 in Ewens 1972)
\[ Pr({n_1,…,n_k│k,n,neutrality}) = {\frac{n!}{{k! {l_k}{n_1}\cdot {n_2}\cdots {n_k} }}} \]
where n is again the number of chromosomes sampled and lk,n are (for those with a desire to know) unsigned Stirling numbers of the first kind.
To explore how likely datasets are to be neutral, click through the datasets from the different research fields below ('Simulate', 'Timor', 'Baby Names', 'Tree Species'), or upload your own ('Custom Data').
Lansing JS, Watkins JC, Hallmark B, Cox MP, Karafet TM, Sudoyo H, Hammer MF. 2008. Male dominance rarely skews the frequency distribution of Y chromosome haplotypes in human populations. Proceedings of the National Academy of Sciences USA 105:11645-11650.
|
Note that the number of possible configurations of the haplotype frequency spectrum is determined only by the number of haplotypes k and sample size n. Slatkin's exact test works by calculating the probability of each possible configuration, and bins these according to whether the probability is smaller or larger than that of the observed configuration. The approximate probability value, describing how unusual the observation would be given the null hypothesis of neutrality (given θ, as estimated from k), is just the relative weight of probability contributed by configurations that are less likely than the one observed.
You can perform Slatkin's exact test on any appropriate dataset, including your own data if you can structure your dataset in a similar way.
Nearly neutral models and adaptive landscapes
Many more methods for distinguishing neutrality from selective processes followed in the wake of Kimura’s pioneering research. Kimura's student, Tomoko Ohta investigated 'nearly neutral' mutations that were subject to the effects of both selection and drift. Modeling a continuous distribution of selection coefficients for mutations, he showed that in cases where the selection coefficient is less than the probability of a mutation fixing due to drift, the mutation would behave as if it were neutral, even if its effects in a larger population were deleterious or beneficial (Ohta 1992, Ohta 2003).
This has important consequences for evolution, because it implies that neutral variation can easily accumulate. Neighbors on neutral landscapes have equal fitness and similar phenotypes, while nearly neutral neighbors may vary slightly in sequence space with no fitness effects.
As Fontana and others showed, this gives rise to a unique property of multi-dimensional neutral spaces – the formation of networks comprised of neighbors with structural similarity and identical or similar fitness. In such nearly neutral landscapes, an evolutionary walk in the form of a series of single point mutations can change every nucleotide sequence in a configuration without changing the phenotype or fitness of the entities involved, and the discovery of new configurations on neutral networks is approximately constant (Huynen, Stadler and Fontana 1996). Gavrilets (2004) showed that, under many conditions, local networks connect to form a giant, global network that extends right across the landscape. Thus, on high dimensional landscapes, "the scenario of a population trapped on a local hilltop vanishes" (Barnett 1998), as populations are, at least theoretically, able to evolve any high fitness configuration by drifting along neutral networks or along neutral ridges that connect peaks on rugged topography.
Studies of fitness – and the complexity it entails – are an active area of ongoing research.
You can perform Slatkin's exact test on any appropriate dataset, including your own data if you can structure your dataset in a similar way.
Nearly neutral models and adaptive landscapes
Many more methods for distinguishing neutrality from selective processes followed in the wake of Kimura’s pioneering research. Kimura's student, Tomoko Ohta investigated 'nearly neutral' mutations that were subject to the effects of both selection and drift. Modeling a continuous distribution of selection coefficients for mutations, he showed that in cases where the selection coefficient is less than the probability of a mutation fixing due to drift, the mutation would behave as if it were neutral, even if its effects in a larger population were deleterious or beneficial (Ohta 1992, Ohta 2003).
This has important consequences for evolution, because it implies that neutral variation can easily accumulate. Neighbors on neutral landscapes have equal fitness and similar phenotypes, while nearly neutral neighbors may vary slightly in sequence space with no fitness effects.
As Fontana and others showed, this gives rise to a unique property of multi-dimensional neutral spaces – the formation of networks comprised of neighbors with structural similarity and identical or similar fitness. In such nearly neutral landscapes, an evolutionary walk in the form of a series of single point mutations can change every nucleotide sequence in a configuration without changing the phenotype or fitness of the entities involved, and the discovery of new configurations on neutral networks is approximately constant (Huynen, Stadler and Fontana 1996). Gavrilets (2004) showed that, under many conditions, local networks connect to form a giant, global network that extends right across the landscape. Thus, on high dimensional landscapes, "the scenario of a population trapped on a local hilltop vanishes" (Barnett 1998), as populations are, at least theoretically, able to evolve any high fitness configuration by drifting along neutral networks or along neutral ridges that connect peaks on rugged topography.
Studies of fitness – and the complexity it entails – are an active area of ongoing research.
References:
Barnett L. 1998. Ruggedness and neutrality – the NKp family of fitness landscapes. In Artificial Life VI: Proceedings of the Sixth International Conference on Artificial Life, ed. Adami C, MIT Press: Cambridge, pp 18-27.
Ewens WJ. 1972. The sampling theory of selectively neutral alleles. Theoretical Population Biology 3:87-112.
Gavrilets S. 2010. High-dimensional fitness landscapes and speciation. In Evolution – The Extended Synthesis, edd. Pigliucci M, Müller GB, MIT Press: Cambridge, pp 45-80.
Huynen MA, Stadler PF, Fontana W. 1996. Smoothness within ruggedness: The role of neutrality in adaptation. Proceedings of the National Academy of Sciences USA 93:397-401.
Ohta T. 1992. The nearly neutral theory of molecular evolution. Annual Review of Ecology and Systematics 23:263-386.
Ohta T. 2003. Origin of the neutral and nearly neutral theories of evolution. Journal of Biosciences 28:371-77.
Barnett L. 1998. Ruggedness and neutrality – the NKp family of fitness landscapes. In Artificial Life VI: Proceedings of the Sixth International Conference on Artificial Life, ed. Adami C, MIT Press: Cambridge, pp 18-27.
Ewens WJ. 1972. The sampling theory of selectively neutral alleles. Theoretical Population Biology 3:87-112.
Gavrilets S. 2010. High-dimensional fitness landscapes and speciation. In Evolution – The Extended Synthesis, edd. Pigliucci M, Müller GB, MIT Press: Cambridge, pp 45-80.
Huynen MA, Stadler PF, Fontana W. 1996. Smoothness within ruggedness: The role of neutrality in adaptation. Proceedings of the National Academy of Sciences USA 93:397-401.
Ohta T. 1992. The nearly neutral theory of molecular evolution. Annual Review of Ecology and Systematics 23:263-386.
Ohta T. 2003. Origin of the neutral and nearly neutral theories of evolution. Journal of Biosciences 28:371-77.