Skip to main content
Open Access

Shared Patterns of Genome-Wide Differentiation Are More Strongly Predicted by Geography Than by Ecology

Abstract

Closely related populations often display similar patterns of genomic differentiation, yet it remains an open question which ecological and evolutionary forces generate these patterns. The leading hypothesis is that this similarity in divergence is driven by parallel natural selection. However, several recent studies have suggested that these patterns may instead be a product of the depletion of genetic variation that occurs as result of background selection (i.e., linked negative selection). To date, there have been few direct tests of these competing hypotheses. To determine the relative contributions of background selection and parallel selection to patterns of repeated differentiation, we examined 24 independently derived populations of freshwater stickleback occupying a variety of niches and estimated genomic patterns of differentiation in each relative to their common marine ancestor. Patterns of genetic differentiation were strongly correlated across pairs of freshwater populations adapting to the same ecological niche, supporting a role for parallel natural selection. In contrast to other recent work, our study comparing populations adapting to the same niche produced no evidence signifying that similar patterns of genomic differentiation are generated by background selection. We also found that overall patterns of genetic differentiation were considerably more similar for populations found in closer geographic proximity. In fact, the effect of geography on the repeatability of differentiation was greater than that of parallel selection. Our results suggest that shared selective landscapes and ancestral variation are the key drivers of repeated patterns of differentiation in systems that have recently colonized novel environments.

Online enhancements:   supplemental material. Dryad data: https://doi.org/10.5061/dryad.34q91f0.

Introduction

The evolution of the same phenotypic traits in independent populations inhabiting similar environments, known as parallel or convergent evolution, is generally accepted as evidence for the action of parallel natural selection (Endler 1986; Schluter 2000; Losos 2011; Bolnick et al. 2018) because chance processes are unlikely to yield repeated phenotypic evolution (Schluter and Nagel 1995). Using a similar logic, many recent studies have used genomic scans to search for evidence of parallel genetic differentiation among closely related species or populations adapting to similar environments (e.g., Fraser et al. 2015; Ravinet et al. 2016; Reid et al. 2016; Rougemont et al. 2017; Trucchi et al. 2017). In these studies, shared outliers and/or similar patterns of genetic differentiation (FST or DXY) across the genome have been taken as evidence of parallel adaptation to local ecological conditions.

Similar differentiation landscapes across the genome have also been found to evolve in the absence of ecological or phenotypic parallelism (e.g., Martin et al. 2013; Renaut et al. 2014; Burri et al. 2015; Vijay et al. 2016). This has led to the argument that perhaps parallel natural selection alone does not drive repeatable genomic differentiation (e.g., Bank et al. 2014; Cruickshank and Hahn 2014; Haasl and Payseur 2016; Burri 2017). Rather, shared patterns of genomic differentiation could be generated by long-term linked selection in a heterogeneous recombination landscape that is shared among taxa due to synteny (Bank et al. 2014; Cruickshank and Hahn 2014; Haasl and Payseur 2016; Burri 2017). Linked selection occurs when a mutation at one locus affects the allele frequencies of loci in linkage disequilibrium (reviewed by Barton 2000). Linked selection is referred to as genetic hitchhiking when selection on the focal locus is positive and as background selection when selection on the focal locus is negative (Charlesworth et al. 1993). It has been argued that background selection would more often lead to similar patterns of differentiation than genetic hitchhiking because the opportunity for positive selection to affect the same genomic region multiple times may be limited (Burri 2017). In support of this argument, theoretical work suggests that genomic parallelism may only be seen when the selection landscape is highly parallel (Thompson et al. 2019). However, computer models suggest that the effects of background selection on the divergence landscape may be modest (Zeng and Charlesworth 2011; Matthey-Doret and Whitlock 2018; Zeng and Corcoran 2018). Additionally, background selection has failed to explain some empirical patterns (e.g., Irwin et al. 2016, 2018), and it may be the case that there has not been sufficient time for drift and/or negative selection to influence differentiation in recently diverged species (Burri 2017; Delmore et al. 2018). Given these differing theoretical and empirical results, we used a comparative approach to disentangle the contributions of background selection and parallel positive selection to the repeatability of genomic differentiation in a recently diverged species.

We furthermore determined whether the source of genetic variation can influence the likelihood of observing shared genomic divergence (MacPherson and Nuismer 2017; Thompson et al. 2019). Shared standing genetic variation and/or introgression between populations experiencing parallel natural selection may more often facilitate the evolution of similar patterns of genomic differentiation compared with de novo mutation (MacPherson and Nuismer 2017), owing to the fixation advantages conferred by higher initial allele frequencies (Innan and Kim 2004). To evaluate the role of shared standing genetic variation, we tested whether closer geographic proximity predicted an increased similarity of genomic differentiation, as nearby populations likely shared more initial standing genetic variation.

The recent diversification of threespine stickleback (Gasterosteus aculeatus) across the globe provides an excellent system to test the hypothesis that parallel selection is the dominant force determining the degree of shared genomic differentiation. At the end of the last ice age (10,000–12,000 years ago), marine stickleback colonized newly formed freshwater habitats (Bell and Foster 1994). Phenotypically, these freshwater populations are more similar to one another than to their marine ancestors (Bell and Foster 1994), despite being independently derived. Among freshwater populations there has been more parallel phenotypic differentiation, with benthic, limnetic, stream, and solitary lake ecotypes arising multiple times independently (Bell and Foster 1994; Taylor and McPhail 2000). These freshwater populations occur in both the Atlantic and Pacific Ocean basins and span distances exceeding 20,000 km, with colonization occurring at similar times across the basins (Bell and Foster 1994).

We leveraged existing genomic data from independent populations of stickleback to assess the contributions of parallel positive selection, background selection, and shared standing genetic variation to patterns of genome-wide differentiation. First, we tested the prediction that population pairs with more similar ecology would exhibit a more similar pattern of genomic differentiation due to parallel positive selection. Second, we tested the prediction that in the absence of background selection, population pairs occupying the same niche, and thus diverging neutrally, would have largely dissimilar patterns of differentiation across the genome. Finally, we tested the prediction that geographically proximate populations that have evolved from more genetically similar marine founders would exhibit more similar patterns of genomic differentiation due to starting with more shared standing genetic variation.

Methods

Data Acquisition

We used a subset of the short-read data set for threespine stickleback compiled by Samuk et al. (2017). The data set consisted of individuals from 24 independent freshwater populations. These populations included solitary populations adapted to lakes (11) or streams (7) and sympatric benthic (3) and limnetic (3) species pairs (fig. 1A). As the Eastern Pacific marine population is generally considered to be panmictic, the marine reference population was an amalgamation of whole genome sequences from nine Pacific Ocean locations collected along the West coast of North America. Additional population details can be found in table S1.

Figure 1. 
Figure 1. 

A, Sampling locations for the 24 freshwater populations of stickleback. B, Marine-freshwater FST profiles for linkage group 4 for two representative populations of each of the four freshwater ecotypes. Shades of gray correspond between panels A and B.

Data Preparation and Variant Calling

We focused on single-nucleotide polymorphisms (SNPs) in our analysis, which we identified using a standard, reference-based bioinformatics pipeline consisting of custom R 3.2.2 (R Development Core Team 2016) and Perl scripts (all of which are available in the following GitHub repository: https://github.com/ksamuk/gene_flow_linkage; see also Samuk et al. 2017). Briefly, we demultiplexed the reads and used Trimmomatic 0.32 (Bolger et al. 2014) to filter out low-quality sequences and adapter contamination. We then aligned reads to version the stickleback reference genome (Jones et al. 2012) using BWA 0.7.10 (Li et al. 2009), followed by realignment with STAMPY 1.0.23 (Lunter and Goodson 2011). The GATK 3.3.0 (McKenna et al. 2010) best practices workflow (DePristo et al. 2011) was followed, except for the MarkDuplicates step, which we skipped when reads were derived from reduced representation libraries (RAD and GBS). We realigned reads around indels (RealignTargetCreator, IndelRealigner), identified SNPs in individuals using the HaplotypeCaller, and jointly genotyped the entire data set using GenotypeGVCFs. The results were then output as a variant call format containing all genotyped sites (variant and invariant) and converted to tabular format for downstream analyses. For more details on the pipeline, see the scripts referenced above.

Genomic Differentiation and Quantification of Repeatability

We estimated average genetic differentiation (FST) between the marine reference population and each freshwater population for SNPs within 150-kb windows across the genome. A windowed approach was used to facilitate the comparison of FST among populations sequenced using different technologies. We also estimated genetic differentiation between pairs of lake populations using Weir and Cockerham’s (1984) FST. We calculated these estimates by dividing the sum of the numerators of all SNPwise FST estimates within the window by the sum of their denominators. To estimate average FST accurately, we dropped windows if they contained fewer than three SNPs. To estimate repeatability, we correlated FST values of all windows across the genome between population pairs using Pearson’s correlation coefficients. Significance testing of individual correlation coefficients was done using the Hmisc package in R, and correction for multiple testing was done using the BH method (Benjamini and Hochberg 1995) with the p.adjust function. DXY (Nei and Li 1979) was also estimated for marine-freshwater comparisons in 150-kb windows, and these windows were used to estimate pairwise Pearson’s correlation coefficients. The results based on the correlation coefficients of marine-freshwater comparisons using DXY were qualitatively the same as those for FST and are reported in the supplemental material (see fig. S1). In recently diverged populations, segregating ancestral variation is an important determinant of DXY. Correspondingly, shared ancestral variation will result in overestimation of correlations in DXY for population pairs in the absence of divergent selection. Because we expected that the freshwater-freshwater comparisons used for the FST analysis are evolving neutrally, they were not a useful control for estimating the expected correlations in DXY.

Correlations of Genome-Wide Differentiation among Marine-Freshwater Population Pairs and Neutrally Evolving Lake Population Pairs

The genome-wide pairwise Pearson’s correlation coefficients of FST values were used to compare the effects of parallel selection (i.e., one marine-freshwater population pair compared with another marine-freshwater population pair) relative to the neutral expectation (i.e., one pair of freshwater lakes compared with another independent pair of freshwater lakes). For a schematic of these two types of comparisons, see figure 2. To test for the effect of the type of selection (positive or background), we used linear models with average pairwise correlations of genome-wide differentiation as the response variable and divergent selection as the predictor (divergent or nondivergent). Significance testing was accomplished by resampling divergent selection categorizations 10,000 times and recalculating the mean correlation to form a null distribution to estimate the significance. For simplicity, this analysis was restricted to populations from the Pacific basin. We also exclude pseudoreplicated population comparisons where the same lake population was included in both pairs (e.g., pair 1 = Boot Lake and Roberts Lake; pair 2 = Misty Lake, and Roberts Lake would be excluded). However, we report a version of the analysis including these pairs in the supplemental material to show that the pattern holds regardless of pruning.

Figure 2. 
Figure 2. 

Schematic outlining the structure of pairwise population comparisons.

Contribution of Parallel Natural Selection and Geographic Proximity to Genome-Wide Repeatability in Differentiation

To test the effects of niche similarity and geography on genome-wide repeatability, we again used pairwise Pearson’s correlation coefficients of windowed estimates of FST. We used multiple regression analyses implemented in the ecodist package in R (999 permutations, multiple regression on distance matrices [MRM] function) for this analysis, evaluating the contribution of distance matrices quantifying ecology and geography. First, we quantified ecology and geography using binomial variables, as “same freshwater niche” or “different freshwater niche” for ecology and “same ocean basin” (Pacific or Atlantic Ocean) or “different ocean basin” for geography. Second, we quantified ecology and geography using continuous or ordinal variables. Previous work has shown that the ecology and diet of stream populations are more similar to those of benthic populations, while the same factors in solitary lake populations tend to be more similar to those of limnetic populations (Berner et al. 2008, 2009). We gave populations occupying the same niche (e.g., benthic and benthic or stream and stream) a score of 3 (the maximum), populations occupying similar niches a score of 2 (e.g., benthic and stream or limnetic and lake), and populations occupying the most dissimilar niches a score of 1 (e.g., limnetic and stream or benthic and lake). We quantified geography as a continuous estimate of pairwise distance within the same ocean basin, determined by computing the Euclidean distance (square-root transformed for normality). Note that some population pairs included in these analyses come from the same watershed (e.g., stream and lake populations from Boot Lake). Accordingly, we randomly sampled one population from each of these pairs to run our analyses. We repeated this downsampling 512 times, which is all possible combinations of our pairs.

Results

Repeatability of Genomic Differentiation among Marine-Freshwater Population Pairs

There was considerable variation among marine-freshwater population pairs in the magnitude of genomic differentiation (see fig. S2 for a principal component analysis of the populations); mean genome-wide FST ranged from 0.25 to 0.71 (mean FST=0.47). There was also variation across the genome; genome-wide variance in FST ranged from 0.03 to 0.06 among population pairs (fig. 1B). Despite this variation, correlation coefficients (r) comparing windowed estimates of FST across population pairs ranged from 0.06 to 0.84 (mean r=0.38) and were significantly positive for all population pairs after correction for multiple testing (P<.05). Thus, the locations of genetic differentiation between marine and freshwater populations are often the same between independently derived population pairs.

Contribution of Background Selection to Repeated Genomic Differentiation

There was also considerable variation in the magnitude of genomic differentiation between the independent freshwater lake populations used as our neutral reference populations. Values of FST spanned a range from 0.12 to 0.73 (mean FST=0.48). However, FST was more similar between marine-freshwater population pairs (mean r=0.49) than between the lake-lake population pairs (mean r=0.07). The difference in the average magnitude of correlation between the two types of comparisons was significant using a permutation test (difference in mean r=0.42, P<.0001; fig. 3A). Thus, background selection does not account for the full extent of genomic repeatability.

Figure 3. 
Figure 3. 

A, Comparison of pairwise correlation coefficients for FST between marine-freshwater population pairs (evolving with parallel selection) or lake-lake population pairs (diverging neutrally). B, Comparison of pairwise correlation coefficients for marine-freshwater FST between population pairs found within the same ocean basins and in different ocean basins. Population pairs with the same ecology are indicated in light gray, and those with different ecology are indicated in dark gray.

Contribution of Parallel Natural Selection and Geographic Proximity to Repeated Genomic Differentiation

Both ecology and geography explain a significant proportion of the variation in correlation coefficients comparing windowed estimates of FST across marine-freshwater pairs. For example, an MRM including both ecology and geography (quantified as “same” or “different” ecology and geography, respectively) explained 56% of the variation in FST correlation coefficients (R2=0.56, P=.0001). There was a negative relationship between both matrices and correlation coefficients, suggesting that parallel natural selection is stronger when ecology is more similar (r=−0.07, P=.06) and populations are closer (r=−0.27, P=.0001 [averages over repeated downsampling; ecology significant predictor in 57% of samples, geography in 100%]). The regression coefficient for geography was much higher than that for ecology, suggesting that geography explains a larger portion of the variation. We obtained similar results when quantifying ecology (fig. S3) and geography (fig. S4) using continuous variables (MRM, geography quantified using Euclidean distance, R2=0.54, P=.0003; for ecology, r=−4.7×10−2, P=.08; for geography, r=−3.7×10−5, P=.0003; ecology significant in 45% of downsamples and geography 100%). Ecology and geography are not on the same scale in this analysis, preventing a direct comparison of correlation coefficients. However, when we reran this analysis excluding ecology, the model R2 was 0.49 (vs. 0.54 when ecology was included), suggesting that geography explains a larger portion of the variation in FST repeatability. When controlling for divergence time from the marine reference by limiting pairwise comparisons to only those within the same ocean basin, we still find that both more similar ecology and closer geographic proximity result in higher FST correlation coefficients (figs. S3, S4).

Discussion

Repeatability of Genome-Wide Differentiation Is Not due to Background Selection

In contrast to recent work in other taxa, we do not find strong evidence that similarity in genome-wide patterns of differentiation has been driven by background selection, as pairs of populations evolving in the absence of divergent selection show little similarity in the genomic locations of genetic differentiation. This finding was not an artifact of different magnitudes of divergence between population comparisons with and without parallel divergent selection, as average FST was essentially the same in marine-lake and lake-lake comparisons. This pattern also suggests that drift is unlikely to be a key player in generating strong correlations in genomic differentiation. Given our results, we argue that genome-wide correlations in differentiation are truly reflective of shared positive selection and likely parallel genetic evolution in some cases. Our results are in line with recent work suggesting that the effects of background selection on differentiation are modest (Zeng and Charlesworth 2011; Matthey-Doret and Whitlock 2018; Zeng and Corcoran 2018) and cannot explain the extreme patterns of differentiation documented in empirical studies (Irwin et al. 2016, 2018).

The natural history of this species is consistent with our findings. Since all freshwater populations examined are postglacial, and therefore less than 12,000 years old, it is unlikely that there has been sufficient novel mutation in all populations to generate large-scale parallel divergence due to linked background selection alone. The idea that there have been insufficient novel mutations in this system is supported by the key role of standing genetic variation in adaptation to freshwater (e.g., Jones et al. 2012). Other species where there has been rapid adaptation aided by standing genetic variation (e.g., African cichlids) may also have only weakly correlated patterns of genetic differentiation in the absence of parallel selection. Differences in the stage of speciation may explain the conflicting results observed across various taxa. Specifically, the populations of sticklebacks compared here are in the very early stages of speciation, and it was recently suggested that background selection will have a greater effect on genomic differentiation later in the speciation process. This suggestion derives from the fact that drift and negative selection will take time to influence differentiation (Burri 2017; Delmore et al. 2018). Regardless of the age of the study system, control comparisons, such as those we employ here with nondivergent lake-lake comparisons (and see Vijay et al. 2016), will provide an important reference point for researchers interested in measuring the contribution of parallel selection to the generation of genomic parallelism.

Parallel Selection and Geographic Proximity both Contribute to Repeated Genomic Differentiation

Populations of stickleback with more similar ecological niches and in closer geographic proximity were found to have more similar patterns of genomic differentiation. This finding suggests that repeatable genome-wide patterns of genetic differentiation are indeed predicted to some degree by parallel natural selection. Interestingly, we find that geographic proximity is a much better predictor of repeated genome-wide differentiation (e.g., correlation coefficient of −0.27 in quantitative MRM vs. −0.07 for ecology). This finding is consistent with mathematical modeling, which has suggested an important role for the geographic structure of populations in determining the probability of genetic convergence (Ralph and Coop 2015). The finding that ecological similarity explains less variation than geographic proximity is also consistent with previous empirical work emphasizing the importance of factors other than ecology. For example, Renaut et al. (2014) examined genomic repeatability between three sister pairs of sunflowers and found that while FST correlation coefficients were highest for pairs that diverged along same selective gradient (latitude), this factor explained a relatively small fraction of the variation (4%). Previous work in Littorina has also found limited support for ecological similarity explaining broad patterns of genomic repeatability (Ravinet et al. 2016). However, it is important to consider that our metric of ecological similarity is based only on niche type and therefore very coarse. Multidimensional variation in the selection landscape would perhaps explain more variation in the magnitude of repeatability exhibited among population pairs.

Similarity of abiotic agents of selection, gene flow, and initial pools of genetic variation could drive the substantial contribution of geographic proximity to repeated genetic variation. Geographically proximate populations likely experience more similar abiotic selective pressures—for example, temperature or mineral availability—which would conceivably lead to greater repeatability in genome-wide differentiation. In some watersheds there is also ongoing gene flow between marine and freshwater fish (e.g., Jones et al. 2012; Vines et al. 2016), which could lead to the sharing of adaptive alleles between geographically proximate freshwater populations. However, there is currently no evidence of direct gene flow between watersheds, as freshwater fish are often landlocked and are unlikely to survive migration through high-salinity ocean waters.

More similar pools of initial variation among geographically proximate marine colonizers may also promote increased repeatability given the importance of standing genetic variation in the marine ancestors for adaptation to freshwater in sticklebacks (Colosimo et al. 2005; Schluter and Conte 2009; Jones et al. 2012). Adaptation from standing genetic variation provides a reduced waiting time for fixation of beneficial alleles, relative to novel mutation, because beneficial alleles are immediately available and start at higher initial frequencies (Innan and Kim 2004). Theoretical work has also shown that the probability of fixation is higher for alleles drawn from standing variation (Orr and Betancourt 2001). During rapid adaptation, the fixation advantages conferred by standing genetic variation may lead to biases in the frequency of gene use over the course of evolution, where loci with standing genetic variation contribute to adaptation more frequently than those that require variation to be generated through novel mutation. Thus, when comparing the location of differentiation among independently derived population pairs, there may be greater repeatability when there is a more similar initial pool of variation.

Theoretical work also predicts that when comparing across populations founded by ancestors with similar pools of standing genetic variation, loci with standing genetic variation will exhibit more similar patterns of evolution (MacPherson and Nuismer 2017). Laboratory experiments conducted under parallel selective regimes are also consistent with this prediction, as adaptation from standing genetic variation has been shown to lead to greater genetic parallelism than novel mutation (Teotónio et al. 2009). Thus, shared standing variation likely plays a key role in generating the considerable levels of repeatability seen in the locations of differentiation among freshwater stickleback populations. More broadly, our findings may suggest that parallel selection alone is unlikely to generate strong patterns of genomic parallelism and that other genetic factors that bias evolutionary trajectories, such as the source of variation, may be important determinates of patterns of parallel evolution.

We are very grateful for the threespine stickleback research community, whose body of empirical work made this study possible. We also thank Catherine Peichel for useful comments on the manuscript. K.S., G.L.O., D.J.R., and K.E.D. were supported by Natural Sciences and Engineering Research Council graduate doctoral scholarships. All authors were supported by graduate fellowships from the University of British Columbia. This project received funding from the European Union’s Horizon 2020 research and innovation program under Marie Skłodowska-Curie grant agreement 794277-PLEVOCON (to D.J.R.).

Published genomic data sets: The original study references and accession numbers are listed in table S1. Referenced bioinformatic analysis code: https://github.com/ksamuk/gene_flow_linkage. Additional input files and R scripts can be found in the Dryad Digital Repository (https://doi.org/10.5061/dryad.34q91f0; Rennison et al. 2020).

Literature Cited

A freshwater male stickleback in the water. Photo credit: Thor Veen.

Associate Editor: Daniel R. Matute

Editor: Alice A. Winn