Skip to main content
Open AccessE‐Article

Phylogenetic Analysis of Trophic Associations

1. Department of Zoology, University of Wisconsin, Madison, Wisconsin 53706;2. Natural Environment Research Council Centre for Population Biology, Division of Biology, Imperial College London, Silwood Park Campus, Ascot, Berkshire SL5 7PY, United Kingdom


Ecologists frequently collect data on the patterns of association between adjacent trophic levels in the form of binary or quantitative food webs. Here, we develop statistical methods to estimate the roles of consumer and resource phylogenies in explaining patterns of consumer‐resource association. We use these methods to ask whether closely related consumer species are more likely to attack the same resource species and whether closely related resource species are more likely to be attacked by the same consumer species. We then show how to use estimates of phylogenetic signals to predict novel consumer‐resource associations solely from the phylogenetic position of species for which no other (or only partial) data are available. Finally, we show how to combine phylogenetic information with information about species’ ecological characteristics and life‐history traits to estimate the effects of species traits on consumer‐resource associations while accounting for phylogenies. We illustrate these techniques using a food web comprising species of parasitoids, leaf‐mining moths, and their host plants.

Why are some plant species attacked by many more species of herbivore than others (Strong et al. 1984)? And why do some parasite species have a much broader host range than more specialized species (Price 1980; Combes 2001)? These are examples of questions that can be asked of data sets that describe how two adjacent trophic levels in a food web are linked together. Not all food webs have species organized in distinct trophic levels, but many do, or do to a good approximation, and increasingly quantitative data are becoming available that describe large sets of interacting species (van Veen et al. 2005).

Most analyses of these types of questions have either assumed that data from different species are statistically independent or considered the possibility of phylogenetic correlation at one trophic level but not the other. As is now widely appreciated, there is a danger of assuming that data from species are statistically independent; closely related species may share traits simply because they have inherited them from a common ancestor (Felsenstein 1985; Harvey and Pagel 1991). For example, swallows and martins (Hirundinae) reuse their nests year after year and are attacked by a relatively large number of species of insect ectoparasites (Rothschild and Clay 1952). However, this may have nothing to do with nesting habits and just be due to the common lousy ancestor of the clade; some other unmeasured trait might cause the high ectoparasite burden, with all swallows and martins sharing this unfortunate trait due to their common phylogenetic history. As Charles Darwin pointed out, the traits of species often reflect their evolutionary history, and when species traits affect the range of hosts they attack or the range of parasites that attack them, phylogenetic patterns of host‐parasite associations are inevitable.

For analyses restricted to one trophic level, a wide range of statistical techniques is now available to estimate the extent of phylogenetic signal and to accommodate phylogenetic correlation in hypothesis testing. There are, however, few methods that simultaneously allow phylogenetic information from both trophic levels to be incorporated into the analysis of patterns of association. Although Page and colleagues (Page 1994, 2000; Charleston 1998; Page et al. 1998) developed methods to detect cocladogenesis (cospeciation) in ectoparasites of birds and similar problems (Charleston 1998), here the parasites are monophagous or restricted to small clades of hosts, and phylogenetic processes dominate. Their techniques are not designed to explore patterns in more connected assemblages, where ecological determinants of trophic links may be as strong as or stronger than the effects of phylogeny.

The aim of this article is to develop methods for assessing the strength of phylogenetic signal that determines the patterns of association between consumers and resources in simple food webs. Specifically, we ask whether a given resource species is more likely to be used by phylogenetically related species of consumers and whether a given consumer species is more likely to use phylogenetically related resource species. Our objective is to identify the presence of phylogenetic signal (Blomberg et al. 2003) without attempting to ascribe any adaptive explanation; identifying phylogenetic correlations is a statistical issue, whereas ascribing adaptive explanations goes far beyond statistics (Blomberg and Garland 2002). We also ask whether it is possible to predict how a novel introduced species will fit into the food web, based on its phylogenetic position. Our motivation here is to help predict which native resource species will probably be at risk from an invasive consumer and which native consumers may be effective control agents for invasive resource species. Finally, we develop methods for regression analyses in which the strength of consumer‐resource associations is regressed on life‐history or environmental factors of consumer and/or resource species while incorporating phylogenetic signals. Incorporating phylogenetic signals into regression analyses will lead to the most precise estimates of regression coefficients. Throughout, we take a statistical approach, so parameter estimates and predictions are made with confidence intervals.

To illustrate and test the techniques for analyzing the cophylogenies of resources and consumers, we explore the patterns of association in a quantitative host‐parasitoid food web. Parasitoids are insects that lay their eggs in or on the bodies of other insects, their hosts, which provide all the resources the developing parasitoid needs to reach maturity (Godfray 1994; Quicke 1997). Because of parasitoids' importance in biological control, there are abundant data on the parasitoid species complement of different hosts, and fewer but still numerous data on the number of host species attacked by different parasitoids. Analysis, largely without incorporating phylogenetic information, has suggested a variety of possible patterns, such as the effect of host geographical range on the number of parasitoid species that attack them and the effect of parasitoid feeding strategy (internally or externally on the host) on the host species they attack (Hawkins 1994). Recently, a number of quantitative host‐parasitoid food webs have been constructed in which not only are the suitable hosts established for each parasitoid species but the strength of interaction between each host‐parasitoid pair is quantified by the parasitism rates (e.g., Müller et al. 1999; Lewis et al. 2002). These quantitative food webs can be used to explore community structure and the potential for indirect interactions. Because these webs describe the densities of hosts, parasitoids, and their associations in the same units, they are ideal for the quantitative exploration of the strength of phylogenetic signal in determining food web structure.

The food web we study includes 12 leaf‐mining moths and 27 species of parasitoid wasps. We first ask whether there is phylogenetic signal in the observed associations between hosts and parasitoids and whether this signal acts through the host and/or parasitoid phylogeny. In addition to considering the phylogeny of the leaf miners, we also ask whether the host‐parasitoid associations could be explained by the phylogeny of the plant species used by the leaf miners. This would occur if the parasitoid host range and foraging strategy are determined, at least in part, by the plant species used by the leaf miners (Vinson 1997). Once parameters for the phylogenetic signals are estimated, it is possible to use the phylogenies (discounted by the estimates of signal strength) to predict the host range for a novel parasitoid from the parasitoid’s location on its phylogenetic tree and to predict the potential parasitoids of a novel host from the host’s location on its phylogenetic tree. Finally, we consider factors that may influence the susceptibility of hosts to parasitism and factors that may influence the efficacy of parasitoids searching for hosts. Specifically, we ask whether hosts with broader geographical ranges experience parasitism from a greater number of parasitoid species (Hawkins and Lawton 1987) and whether parasitoids that feed externally on hosts (ectoparasitoids) attack a greater number of host species than those that feed internally on still‐living hosts (endoparasitoids; Askew and Shaw 1986; Sheehan and Hawkins 1991).


Comparative Methods for Interacting Trophic Levels

Consider information on the strengths of association between n resource species and m consumer species in a food web or similar data structure. For narrative simplicity, we henceforth call the resources “hosts” and the consumers “parasitoids.” Let the association strengths (whose exact form we need not specify for the moment) be denoted Aik ($$i=1,\:\ldots ,\: n$$; $$k=1,\:\ldots ,\: m$$). The total number of associations (including zeros) is $$N=nm$$.

Assume that there is a host trait Xh and a parasitoid trait Xp that together affect the hosts’ susceptibility to parasitism and the parasitoids’ effectiveness at finding and attacking hosts. These traits are assumed to depend on phylogenetic relatedness so that closely related host or parasitoid species are more likely to have similar values of Xh or Xp, respectively. We assume that phylogenetic relatedness gives rise to covariance matrices V and U, whose elements $$v_{ij}$$ and ukl describe the covariance in the trait values of Xh and Xp for pairs of host and parasitoid species; note that for this indexing system, sequential letters are used for hosts (i and j) and parasitoids (k and l). For our analyses, we standardize the variances in the covariance matrices V and U by setting all diagonal elements equal to the maximum diagonal element and then dividing all elements in the matrix by this maximum; this is equivalent to extending the tip branch lengths for species to make them all contemporaneous in the phylogenetic tree. Contemporaneous tips imply that the variance in trait values is the same for all species in the phylogenetic tree (Grafen 1989; Harvey and Pagel 1991), thereby making V and U correlation matrices. This standardization, however, is not needed for any of the statistical computations; the techniques we present can be applied to any type of covariance matrices V and U. Therefore, if there is good statistical or biological reason to weight species differently or use phylogenetic trees with noncontemporaneous tips, the procedures can be implemented without modification.

We assume that the strength of the host‐parasitoid association is given by the product of the values of the host and parasitoid traits Xh and Xp, so the strength of association between the i‐k and j‐l host‐parasitoid pairs is $$v_{ij}u_{kl}$$. Therefore, for a given host species i, the phylogenetic correlation between attacks from parasitoids k and l is given by the parasitoid phylogeny ukl (since $$v_{ii}=1$$). Similarly, for a given parasitoid species k, the phylogenetic correlation between attacks on hosts i and j is given by the host phylogeny $$v_{ij}$$. If both hosts and parasitoids are phylogenetically related, then attacks between the i‐k and j‐l host‐parasitoid pairs are correlated. However, a positive covariance between the strengths of association of two host‐parasitoid pairs occurs only if both the host and parasitoid species are phylogenetically related ($$v_{ij}> 0$$ and $$u_{kl}> 0$$). This produces a joint evolutionary model because traits Xh and Xp must both evolve down the host and parasitoid phylogenies for strengths of association between different host‐parasitoid pairs to be correlated. Additional models for combining host and parasitoid phylogenies are possible, and the statistical techniques presented below can be deployed for any reasonable model.

The strength of phylogenetic association between the n host and m parasitoid species in the web can be modeled statistically as follows. First, take the $$n\times m$$ matrix with elements Aik and rearrange it into the $$N\times 1$$ vector A by “stacking” columns; $$\mathbf{A}=[ A_{1,\, 1},\: A_{2,\, 1\:},\:\ldots ,\: A_{n,\, 1},\:\ldots ,\: A_{n,\, m}] ^{\prime }$$, where the prime denotes transpose. Let

where b0 is the phylogenetically corrected mean of association strength, and ε is an $$N\times 1$$ vector of zero‐mean random variables (not necessarily normally distributed) with covariance matrix $$\mathrm{E}\,[ \varepsilon \varepsilon ^{\prime }] =\mathbf{W}$$. The phylogenies of hosts and parasitoids are incorporated into the matrix W. Specifically, for the evolutionary model described above, $$\mathbf{W}=\mathbf{U}\otimes \mathbf{V}$$, where $$\otimes $$ is the Kronecker (outer) product.

Measuring phylogenetic signal. The strength of phylogenetic signal depends on the magnitudes of the covariances in the host and parasitoid covariance matrices V and U, with greater values of $$v_{ij}$$ and ukl corresponding to stronger phylogenetic signal through the host and parasitoid phylogenies, respectively. A common assumption in phylogenetic analyses is that the evolution of a trait on a phylogenetic tree can be described as a Brownian motion process in which the value of a trait changes randomly and incrementally through time. In this case, the distribution of trait values among species follows a multivariate normal distribution, with the covariance in trait values between two species being proportional to the branch length of their shared phylogeny (Felsenstein 1985; Martins and Hansen 1997; Garland and Ives 2000). Here we use a more flexible model based on an Ornstein‐Uhlenbeck (OU) process that can incorporate stabilizing selection. Specifically, Blomberg et al. (2003) derived a version of the OU process in which there is a single optimum value of a trait for all species in the phylogeny. The stronger the selection toward the global optimum, the weaker the phylogenetic signal because phylogenetic correlations are degraded through time as species traits are dictated more by selection. This version of the OU process leads to a formula in which a parameter d determines the strength of phylogenetic signal, with $$d=1$$ (no selection) corresponding to the Brownian motion assumption, $$0< d< 1$$ representing stabilizing selection, $$d=0$$ depicting the absence of phylogenetic correlation (a “star” phylogeny), and $$d> 1$$ corresponding to disruptive selection, in which the phylogenetic signal is amplified (app. A). Although we use this formula to estimate phylogenetic signal, we do not mean to ascribe any observed phylogenetic signal to stabilizing selection; instead, we use the OU formulation to give a reasonable statistical description of phylogenetic signal that could be produced by any number of processes.

To allow for differing strengths of signal in the host and parasitoid phylogenies, we estimate dh and dp as separate parameters for host and parasitoid lineages to give the covariance matrices V(dh) and U(dp). While several estimation procedures are possible (e.g., maximum likelihood, restricted maximum likelihood), here we use a version of estimated generalized least squares that does not assume the distribution of association strength among hosts and parasitoids is normally distributed (Judge et al. 1985). We then used bootstrapping (Efron and Tibshirani 1993) to obtain confidence intervals of the parameter estimates (app. A).

Predicting the strength of novel associations. Suppose that the phylogenetic position of a host is known, but no information is available about the strengths of association with parasitoids in the community. In appendix B we show how these strengths can be estimated (with confidence limits), and similar formulas can be derived for novel parasitoids. Furthermore, it is possible to make more informed predictions. Suppose, for example, for a new host species, one knows the parasitoid attack rate by some but not all parasitoid species. The known information can be incorporated into the estimation procedure, and the strength of unknown associations can be estimated.

Analyses with covariates. The statistical model for host‐parasitoid associations in equation (1) depends simply on the phylogenetic signal. However, information about phylogenetically correlated or uncorrelated traits that may help explain the strengths of association can also be incorporated in the model, and their explanatory power may be tested statistically. This leads to a regression analysis in which host‐parasitoid association strength is regressed on independent variables, with phylogenetic signal included in the process error (residual) terms (e.g., Felsenstein 1985, 2004; Grafen 1989; Harvey and Pagel 1991; Garland et al. 1993; Martins and Hansen 1997). The statistical motivation for incorporating phylogenetic signal is that this produces the “best” (minimum variance) estimates of the regression coefficients (Judge et al. 1985; Rohlf 2001).

Suppose information is available about M measured traits. Let

where X is a $$N\times ( M+1) $$ matrix whose first column contains 1's and whose remaining M columns contain values of covariates (independent variables). The vector $$\boldsymbol{b}=[ b_{0},\: b_{1},\:\ldots ,\: b_{M}] ^{\prime }$$ contains regression coefficients, with b0 denoting the intercept and bq ($$q=1,\:\ldots \: ,\: M$$) denoting the coefficients for the M covariates. Parameter estimation and calculation of confidence intervals proceed in the same way as for equation (1) (app. A).

Tests with a Leaf Miner–Parasitoid Web

We explore the use of the comparative methods developed above using data from a food web that describes the associations between 12 leaf‐mining moths in the genus Phyllonorycter (Gracillariidae) and the 27 species of parasitoid wasps (Hymenoptera) that attack them. Full details of the construction of the web and the species involved are given in Rott and Godfray (2000). Briefly, the study site was a 10,000‐m2 area of woodland containing four species of trees. The trees were sampled twice each in 1992 and 1993 at times corresponding to the two annual generations of the leaf miners. A total of 8,242 miners were collected and reared in the laboratory, and the parasitoids were identified. The leaf miners collectively used all four tree species, Alnus glutinosa L., Salix cinerea L., Quercus robur L., and Betula pubescens L., but each leaf miner species was found on only a single tree species.

Phylogenetic data. Phylogenies were available for all three trophic levels in the food web. The four species of host plant belong to four different genera whose phylogenetic relationships are well supported by both morphological and molecular data (Soltis et al. 2000). The 12 host species are all included in a phylogeny of the genus Phyllonorycter constructed by Lopez‐Vaamonde et al. (2003) and based on molecular (28S rDNA) data. The parasitoids include 19 members of the family Eulophidae that were placed on a phylogeny largely using Gautier et al.’s (2000) study based on sequence data (again 28S rDNA) from members of all the genera found in the web. The other parasitoids were incorporated based on long‐standing morphological hypotheses about hymenopteran phylogeny (Gauld and Bolton 1988). We recognize that the topologies of the host and parasitoid phylogenetic trees are not known with complete certainty and that if the uncertainty were large, it might affect results of phylogenetic analyses (Purvis and Garland 1993; Garland and Díaz‐Uriarte 1999; Housworth and Martins 2001; Huelsenbeck and Rannala 2003). Nonetheless, we have high confidence in the topologies of our phylogenetic trees, and our results are unlikely to be affected by topological uncertainties.

Rott and Godfray (2000) noted that hosts attacking the same tree species often tended to have similar parasitoid complexes even if the hosts were not closely related. To test for a signal associated with the phylogeny of the tree species, we created a “pseudophylogeny” for the hosts based on the tree phylogeny. Specifically, we constructed the plant phylogeny with branch lengths selected to produce contemporaneous tips and then attached each host species to these tips to give a polytomy with branches equal to the shortest branch on the tree phylogeny.

A phylogenetic signal from the plant phylogeny on host‐parasitoid associations could be a spurious result: if host‐parasitoid associations have a phylogenetic signal from the host species, and if plant‐host associations have a phylogenetic signal from the plant species, then one would expect host‐parasitoid associations to be apparently related to the plant phylogeny. To guard against this, we compared the explanatory power of host and plant phylogenies. We assumed that the phylogenetic correlation between host species in trait Xh was given by the covariance matrix $$\mathbf{V}( w) =\mathbf{V}_{\mathbf{t}}+w\mathbf{V}_{\mathbf{h}}$$, where Vh and Vt are the covariance matrices derived from host and plant (pseudohost) phylogenies, assuming Brownian motion evolution, and w is a parameter that weights the importance of the two phylogenic signals. We used this base matrix to construct V(dh, w) as done previously under the assumption of an OU process acting on the aggregate phylogeny given by V(w). The relative explanatory power of the two phylogenies can then be tested by estimating w with its 95% confidence interval (app. A).

This procedure for separating the effects of host phylogeny from plant phylogeny is simple for our data set because each host species is only found on a single plant. Nonetheless, this approach can be modified for more general tritrophic systems as follows. Suppose each host was found on multiple plants. Thus, a given host i attacked by parasitoid k occurred on plants q and r. The association between host i and parasitoid k can be partitioned into the association strength of plant q, Aikq, and the association strength of plant r, Aikr, with $$A_{ikq}+A_{ikr}=A_{ik}$$. This partitioning of associations can be incorporated into the plant phylogenetic tree by introducing polytomies to each tip corresponding to a different host that is found on the plant; thus, the number of tips on the tree would equal the number of plant‐host associations. Similarly, the host phylogenetic tree could be modified by introducing polytomies to each tip corresponding to a different plant used by the host. The resulting covariance matrices Vt and Vh could then be used as described above.

Strength of association. We used three different measures of the strength of the association between host and parasitoid species: (1) the overall parasitoid attack rate on each host species, a measure of the total mortality inflicted by a parasitoid influenced by both parasitoid abundance and selectivity; (2) the per capita parasitoid attack rate, simply the first measure divided by the abundance of a parasitoid species and hence a measure of selectivity; and (3) a binary measure of whether the host‐parasitoid association had been observed. Which measure is most appropriate depends on the precise hypothesis that is being tested.

To derive the first measure, let Hi denote the number of hosts of species i (parasitized or not), and let Fik denote the number of hosts of species i parasitized by species k. Assuming that parasitoids search for hosts of a given species randomly, the proportion of hosts escaping parasitism is the zero term of a Poisson distribution, and therefore

where Aik is the net attack rate of parasitoid k on host i. Rearranging equation (3) gives
Thus, Aik is our first measure of association between hosts and parasitoids.

Because Aik depends on both parasitoid selectivity and parasitoid abundance, to obtain the second measure of association aik that depends only on selectivity, we divided Aik by the relative abundance of parasitoid species. We considered two methods of determining parasitoid abundance. The first is to use the average value of Aik from all host species i from which parasitoid k was reared. In this case, the zeros for host species that the parasitoid does not attack are not included in the average. This assumes that the number of host individuals parasitized by a species is a good indicator of the population size of the parasitoid species. The second way to determine parasitoid abundance is to use the average value of Aik calculated for all host species i, thereby including the zeros for host species that are not attacked by parasitoid k. Calculating the average value of Aik in this way would produce higher estimates of the per capita attack rate for parasitoid species that attack fewer host species because factoring zeros into the estimates of the average value of Aik and then dividing each Aik by this average will lead to higher values of aik. These high values of aik would in effect assume that individuals from parasitoid species attacking fewer host species are “lost” when searching for unsuitable hosts. Due to this unrealistic assumption when averaging Aik over all host species, we prefer the former procedure to obtain relative parasitoid abundances and calculate aik, and we present the results for this procedure here. Results for aik calculated by the alternative procedure are given in appendix C. Preliminary diagnostics (not presented) supported applying a square root transform to both Aik and aik at the onset of the analyses.

For the third measure of association, we simply assigned a value of one to host‐parasitoid pairs for which the parasitoid was reared from the host and zero otherwise. Using a binary dependent variable raises some technical problems with bootstrapping to obtain confidence intervals for parameter estimates, and these are discussed in appendix A.

Covariates. We investigated the influence of two covariates on the association between host and parasitoid species. We specifically wanted to test whether the two covariates affected whether parasitoid species attacked host species, and we consequently used the simple binary measure of association. The two covariates we investigated were the geographic range of a host species and whether the parasitoid was ecto‐ or endoparasitic. We anticipated a positive correlation between host geographical range and the number of parasitoids attacking the host. Broader geographical ranges expose host species to a greater diversity of parasitoids, and hence, hosts with broad ranges will probably have a greater number of parasitoids that have evolved to attack them (Godfray 1994; Hawkins 1994). Host geographical range was measured by the number of Watsonian vice‐counties (roughly equal units of area within Great Britain and Ireland) in which the host species was found in the British Isles by using range maps in Emmet et al. (1985). We anticipated that ectoparasitoids would have a broader host species range than endoparasitoids because they do not need to live within their still‐living hosts and therefore require fewer adaptations to their host’s physiology (Askew and Shaw 1986; Sheehan and Hawkins 1991; Godfray 1994).

Computer code written in Matlab (ver. 7.0, MathWorks 2005) for the core analyses in the article (table 1) is given in appendix D.

Table 1:

Phylogenetic signal in parasitoid‐host associations in which




measure the strength of signal from the host and parasitoid phylogenies, respectively

Dependent variable, host phylogenydhdpMSEdMSEstarMSEb
Vh.40 (.09, .77)0 (0, .14).010.011.012
Vt1.30 (.90, 1.78).19 (0, .46).0052.011.0059
Vh.45 (.25 .67).086 (0, .24).24.27.34
Vt.63 (.45, .79).19 (.04, .35).20.27.25

Note: Two phylogenies were used for the hosts: the true host phylogeny Vh and the pseudophylogeny constructed from trees they attack, Vt. Dependent variables are the parasitoid attack rate Aik and the per capita parasitoid attack rate aik, both square root transformed. Mean squared errors (MSE) are given for the cases when dh and dp have estimated values (MSEd), when $$d_{\mathrm{h}\,}=0$$ and $$d_{\mathrm{p}\,}=0$$ (MSEstar), and when $$d_{\mathrm{h}\,}=1$$ and $$d_{\mathrm{p}\,}=1$$ (MSEb). Ranges in parentheses are 95% approximate confidence bounds obtained from bootstrapping.

View Table Image


Phylogenetic Signal

For both the parasitoid attack rate Aik and the per capita parasitoid attack rate aik, we estimated the strengths of phylogenetic signal in the host and parasitoid phylogenies, dh and dp. For the host, we used both the host phylogeny to give the covariance matrix for host phylogenetic relatedness, Vh, and the phylogeny created by grafting host species onto the phylogeny of the plant species they use, Vt.

Both parasitoid attack rates Aik and per capita parasitoid attack rates aik showed moderately strong phylogenetic signals through the host phylogeny, whereas the signal through the parasitoid phylogeny was weak or nonexistent (table 1). The overall strength of the phylogenetic signal can be assessed by comparing the mean squared error (MSE) calculated for the full model (MSEd), the MSE derived under the assumption of no phylogenetic covariances (a “star phylogeny”; MSEstar), and the MSE derived assuming Brownian motion evolution (MSEb; app. A). Lower values of MSE indicate better fit of the specific model to the data, in that the model with the lowest MSE leaves the smallest unexplained variance (app. A). The MSE can also be interpreted heuristically as an estimate of the rate of evolution (Blomberg et al. 2003); the lower the MSE, the less evolutionary change is needed to explain the observed variance in the data.

The values of MSEd derived using the host plant covariance matrix Vt are smaller than those obtained from the host covariance matrix Vh (table 1). This suggests that there is a stronger signal from the plant phylogeny than the host phylogeny in the strength of host‐parasitoid association. In assessing the relative strengths of the two, values of dh cannot be compared directly. However, comparing the covariance matrices Vh(dh) and Vt(dh) at their respective optimal values of dh indicates the strength of covariances estimated from the data. Figure 1 gives a visual depiction of the strength of signal in the value of Aik from the host (fig. 1A) and plant (fig. 1B) phylogenies; the stronger signal from the latter is visible as longer shared branch lengths (higher covariances).

Figure 1:

A, Phylogeny for host species of the genus Phyllonorycter, and B, phylogeny created by grafting host species onto the tips of the phylogeny of the tree species they attack. Branch lengths of the phylogenies were computed from the covariance matrices Vh(dh) and Vt(dh), with values of dh fitted from the model of parasitoid attack rates, Aik (table 1). The longer shared branch lengths of second phylogeny (B) indicate greater covariances and hence stronger phylogenetic signal from the phylogeny of host species grafted onto the phylogeny of trees they attack.

We derived a statistical test to compare the signals from the host phylogeny and the plant phylogeny using the covariance matrix $$\mathbf{V}=\mathbf{V}_{\mathbf{t}}+w\mathbf{V}_{\mathbf{h}}$$ and estimating the value of w along with dh (table 2). For Aik, the estimate of w was 0, and for aik, the 95% confidence interval of w contained 0. These results show no statistical evidence for an effect of host phylogeny on the strength of host‐parasitoid association after accounting for an effect of plant phylogeny.

Table 2:

Phylogenetic signal from the tree pseudophylogeny versus the host phylogeny using the host covariance matrix

Dependent variabledhdpwMSEdMSEstarMSEb
$$\sqrt{A_{ik}}$$1.14 (.96, 1.31).19 (0, .45)0.0052.011.0078
$$\sqrt{a_{ik}}$$.87 (.71, .97).19 (.02, .33).092 (0, .33).20.27.27

Note: Dependent variables are the parasitoid attack rate Aik and the per capita parasitoid attack rate aik, both square root transformed. Mean squared errors are given for the cases when dh and dp have estimated values (MSEd), when $$d_{\mathrm{h}\,}=0$$ and $$d_{\mathrm{p}\,}=0$$ (MSEstar), and when $$d_{\mathrm{h}\,}=1$$ and $$d_{\mathrm{p}\,}=1$$ (MSEb). Ranges in parentheses are 95% approximate confidence bounds.

View Table Image

The presence of a signal from the host phylogeny when plant phylogeny is not included (table 1) could be the result of phylogenetic correlation in association between host and plant species. We analyzed the pattern of plant‐host association using the same approach as we used to analyze the strength of host‐parasitoid association, although we used a simple binary measure of whether the leaf miner fed on the tree species. There was no signal through the plant phylogeny ($$d_{\mathrm{t}\,}=0$$) and a weak signal through the host phylogeny ($$d_{\mathrm{h}\,}=0.12$$), although it was not statistically significantly different from 0. Therefore, there is no statistical evidence that the signal from the host phylogeny in the strength of host‐parasitoid associations is driven solely by a phylogenetic association between hosts and plants. The relatively small sample size (four tree species), however, limits the statistical power of this test.

Predicted Associations for Novel Hosts or Parasitoids

When phylogenetic signals are strong, phylogeny can be used to predict the association between new hosts or parasitoids that are introduced into a community. To illustrate these predictions, consider the case of predicting the attack rate by all parasitoids in a community on a novel host species z, given by the vector Az. Because the signal of the plant phylogeny is stronger than the signal of the host phylogeny, we used the covariance matrix Vt to predict associations for the novel host. To assess the accuracy of the predictions, we let each host species in turn play the role of the novel species by removing it from the data set, fitting the values of dh and dp to the reduced data set, and then computing the estimates of association $$\hat{A}_{\mathrm{z}\,}$$ (eq. B1) using only information about the reduced assemblage of host species. We then computed the correlation between the estimates $$\hat{A}_{\mathrm{z}\,}$$ and the observed attack rates Az. Repeating this by allowing each host in turn to be novel produced 12 sets of predictions and correlations. We then repeated this procedure for novel parasitoids, removing each of the 27 species, computing predicted per capita attack rates $$\hat{a}_{\mathrm{z}\,}$$, and correlating these to the observed per capita attack rates az. Note that we predicted Aik for novel hosts to give the overall effect of parasitism, and we predicted aik for novel parasitoids to give their per capita attack rates.

In figure 2A, 2B we plot Az against $$\hat{A}_{\mathrm{z}\,}$$ for the two host species with the highest (0.96; Phyllonorycter salicicolella) and lowest (0.49; Phyllonorycter kleemanella) correlation coefficients. For all 12 species, the median value of the correlation coefficient was 0.91 (fig. 2C). Thus, predictions of the parasitoid attack rates on novel host species based on the phylogenetic relationship of their host plants are quite good, as is expected from the strong phylogenetic signal of host plants.

Figure 2:

Observed versus predicted values of the parasitoid attack rates $$( A_{ik}) ^{1/2}$$ on the host species Phyllonorycter viminiella (A) and Phyllonorycter stettinensis (B). Predictions were made assuming that the only thing known about the host species was the phylogenetic relatedness of their host plants and the covariance matrix Vt. Correlations between predicted and observed values are listed in each panel. C, Histogram of correlations between predicted and observed values when each of the 12 host species is treated as a new species; the median correlation is 0.91. Predicted versus observed values of the per capita parasitoid attack rates $$( a_{ik}) ^{1/2}$$ for Pnigalio pectinicornis (D) and Achrysocharoides splendens (E). F, Histogram of correlations between predicted and observed values when each of the 27 parasitoid species is treated as a new species; median $$\mathrm{correlation}\,=0.05$$.

We used the same procedure to predict the per capita attack rates of a new parasitoid species z on the existing host species in the community, $$\hat{a}_{\mathrm{z}\,}$$. The correlations between az and $$\hat{a}_{\mathrm{z}\,}$$ ranged from −0.71 to 0.90, with a median of 0.05 (fig. 2D2F). These results reflect the signal of parasitoid phylogeny on the per capita parasitoid attack rates aik being weaker than that found for the plant phylogeny on the parasitoid attack rates Aik (table 1).

Analyses with Covariates

Using the regression equation (2), we analyzed the effects of host geographical range and parasitoid feeding mode (endo‐ vs. ectoparasitism) on the pattern of association between hosts and parasitoids, where the pattern of association was coded by the binary 0 or 1. Contrary to expectation, the analyses failed to reject the null hypothesis that host geographical range had no effect on number of parasitoid species attacking a host species (table 3). Similarly, contrary to expectation, the analyses failed to reject the null hypothesis that ectoparasitoids had a wider host range than endoparasitoids (table 3). However, as anticipated by prior analyses (table 1), there was a strong signal through the plant phylogeny and a weaker signal through the parasitoid phylogeny on the binary measure of association. This is demonstrated by statistically significant positive values of dh and dp and values of MSE considerably lower than those obtained using a star phylogeny with 0 phylogenetic covariance between species.

Table 3:

Phylogenetic regression of host‐parasitoid association on either host geographic range or parasitoid ecto‐ versus endoparasitism

Independent variable, phylogeniesb1dhdpMSE
Host geographic range:    
Vt, U1.04 (−.71, 2.68).923 (.667, 1.18).253 (.080, .403).152
 Stars.602 (−1.19, 2.39)00.242
Parasitoid ecto‐/endoparasitism:    
Vt, U.015 (−.189, .207).764 (.543, .950).258 (.078, .437).153
 Stars−.019 (−.126, .088)00.242

Note: The dependent variable is whether or not (1 or 0) a parasitoid species attacks a host species, and b1 is the regression coefficient. Hosts were assumed to have the pseudophylogeny given by the trees they attack, Vt. Either the phylogenetic signal was estimated or star phylogenies were assumed by setting $$d_{\mathrm{h}\,}=0$$ and $$d_{\mathrm{p}\,}=0$$. Ranges in parentheses are 95% approximate confidence bounds.

View Table Image


Patterns of association between adjacent trophic levels are one of the commonest types of data collected in ecology. Biologists and natural historians have amassed extensive data on the species of herbivores that attack different species of plants, the parasitoids that attack host insects, the parasites and pathogens found on different hosts, and many other types of association. In some cases, the consumer is highly specific, found attacking a single resource species and often with a phylogenetic history that is congruent or nearly congruent with that of the resource (Combes 2001). However, much more commonly, the consumer is polyphagous, and there is poorer matching between the structure of the consumer and resource phylogenies (Price 1980). Previous analyses of patterns of consumer‐resource association have either ignored phylogenetic correlations or considered just one trophic level, and they cannot accommodate information about the strength of association between consumers and resources beyond binary descriptions of whether an association exists. The aim of the work described here is to provide statistical methods to assess the strength of any phylogenetic signal and to account for it in the analysis of consumer‐resource associations.

Our approach is based on established statistics of phylogenetic comparative methods (Felsenstein 1985; Garland et al. 1993; Martins and Hansen 1997; Garland and Ives 2000; Felsenstein 2004). The methods are relatively easy to apply using phylogenetic data that are becoming ever more readily available. The basic approach is to structure the problem of consumer‐resource associations as a statistical model (eqq. [1], [2]) in which phylogenies are used to give the covariance structure of the “error” terms. Structured in this way, standard statistical formulas can be used to predict new values (association strengths between novel species) and estimate the effects of additional independent variables (ecological or life‐history traits). Thus, the approach is flexible and can be modified to perform the range of tasks often demanded of regression analyses.

We emphasize that the statistical toolbox we have derived is intended to identify correlations in data sets that are consistent with patterns expected from phylogenies, but these should not be interpreted as tests of specific hypotheses of evolutionary processes (Martins 2000; Orzack and Sober 2001; Hansen and Orzack 2005). We have used the term “phylogenetic signal” to refer to a description of phylogenetic pattern divorced from any specific mechanism (Blomberg et al. 2003). If a phylogenetic signal is identified, however, this can be used as a license to hunt for possible adaptive (or nonadaptive) explanations of the patterns. In regression analyses with covariates (independent variables), incorporating phylogenetic signals will lead to better estimates of the regression coefficients, and therefore on strictly statistical grounds, phylogenetic signals should be investigated in studies of species associations (Ives and Zhu 2005). Detecting phylogenetic signals will depend on the sample size (number of possible consumer‐resource associations); for example, for analyses of phylogenetic signals in single clades, Blomberg et al. (2003) found that at least 20 species were often needed to detect a phylogenetic signal with convincing statistical power. Our data set included 324 ($$=12\times 27$$) associations and had high statistical power. It is difficult to say how much smaller sample sizes could be while maintaining high statistical power because power will also depend on the strength of the true phylogenetic signal and the topology of phylogenetic trees; we suspect, however, that a minimum of six species in each clade is necessary.

For the specific example of leaf miners and their parasitoids that we analyzed, patterns of association showed a signal from the host phylogeny but only a weak or nonexistent signal from the parasitoid phylogeny. Furthermore, the signal through the phylogeny of tree species attacked by the leaf miners was stronger than that of the host phylogeny itself. This indicates that leaf miner–parasitoid associations are strongly influenced by plants. Moreover, this effect is not just due to the tendency for phylogenetically related leaf miner species to feed on the same host plant because when statistically testing the simultaneous effects of host versus plant phylogenies, no statistically significant effect was found for the host phylogeny (table 2). There are several possible explanations for the strong effect of tree phylogeny. First, parasitoids might find leaf miners using cues that are associated with the tree species rather than the hosts per se, as is well‐known for other host‐parasitoid systems (Vinson 1997). Second, certain tree species might make hosts particularly vulnerable or resistant to specific foraging strategies employed by specific parasitoids. Finally, associations between leaf miners and trees could have created the evolutionary setting for associations between leaf miners and parasitoids to develop. For example, a parasitoid that initially specialized on a single leaf miner species might have evolved host‐range expansion to another leaf miner species that uses the same habitat (tree species); selection on parasitoids for host‐range expansion should be greatest for hosts that are frequently encountered by the parasitoids (Quicke 1997).

The lack of signal through the parasitoid phylogeny surprised us; closely related parasitoid species were at most slightly more likely to attack the same leaf species. Thus, parasitoids in this lineage have evolutionarily labile host ranges, with little phylogenetic constraint on host shifts. This suggests that novel hosts entering the parasitoids’ geographical range might be quickly adopted by one or more of the parasitoid species (Godfray et al. 1995). Similarly, if introduced into novel geographical areas, the parasitoids might readily use hosts that do not occur in their current home ranges (Henneman and Memmott 2001). However, leaf‐mining insects are attacked by a particularly large number of parasitoid species of low specificity, compared with hosts in other feeding guilds (Hawkins and Lawton 1987), and we suspect that other host‐parasitoid systems will show stronger signal through the parasitoid phylogeny.

We also tested whether two covariates, the geographical range of the host and the feeding mode of the parasitoid, affected association strength. The geographic range of the host was not a significant determinant of the number of parasitoid species that attack it. We might have expected an effect of the host geographic range if strict host specificity among parasitoids was common because such associations might have evolved more frequently if they involved widespread hosts (Hawkins and Lawton 1987). However, low specificity by parasitoids is the rule, making the lack of effect of host geographic range unsurprising. More surprising is the absence of a significant effect of endo‐ versus ectoparasitism because ectoparasitoids in general are known to have a broader host range than endoparasitoids (Askew and Shaw 1986; Sheehan and Hawkins 1991; Godfray 1994). Our analyses failed to reject the null hypothesis that ectoparasitoids attack more (or fewer) species of leaf miners.

Beyond the analyses of our specific example, the methods developed here can test hypotheses about general patterns of consumer‐resource association while accounting for phylogenetic relationships. To give a concrete example, there has been much interest in the factors that determine the number of parasitoid species attacking different host species. Hawkins (1994) summarized a long series of studies and concluded that the manner in which the host species feeds on the host plant is the critical determinant, with leaf miners being attacked by the most species, followed by gall formers, species that live in webs and leaf rolls, external feeders, shoot borers, and finally root feeders. However, both feeding niche and the type of host attacked by parasitoids are strongly phylogenetically conserved. Thus, 80% of European leaf miner species occur in four monophyletic clades, and similarly, leaf miner parasitoids tend to come from a relatively small number of taxonomic groups (Askew 1980). There is almost certainly a very strong signal from both host and parasitoid phylogenies across the wide ranges of the hosts and parasitoids examined by Hawkins (1994), in contrast to our data set, which was limited to congeneric leaf miners. Unfortunately, the phylogenetic information is not available to test for phylogenetic signals in the broader collection of hosts and parasitoids, although this is likely to change in the next decade.

Our methods may also assist in applications surrounding intentionally or unintentionally invasive species. When a potential pest or harmful organism is recorded in a new area, it is of great importance to assess the risk of it going unchecked by natural enemies (Mooney and Drake 1986). Being able to predict from phylogenies which native consumers are likely to attack the new resource will help inform this risk assessment. Clearly, however, these predictions will need to be augmented by specific examination of the invading species' natural history and ecology (Williamson 1996). Similarly, biological control agents are frequently released to suppress nonnative pests, and there is growing concern about the possible side effects these releases may have on native organisms that are also attacked by the agent (Howarth 1991; Henneman and Memmott 2001). Careful risk assessments are now mandatory, and the techniques developed here may contribute toward a more objective assessment of the risks.


We thank T. Garland Jr., M. R. Helmus, and two anonymous reviewers for suggestions on the manuscript. This work was started as discussions at the National Science Foundation's National Center for Ecological Analysis and Synthesis, and it was supported in part by the National Science Foundation.

Appendix A Estimating Ornstein‐Uhlenbeck Process Parameters

Under the assumption of an Ornstein‐Uhlenbeck (OU) process, the covariance matrix V(dh) for hosts has elements $$v_{ij}$$(dh) given by

where values of $$v_{ij}$$ on the right‐hand side of equation (A1) are the covariances derived under the assumption of Brownian motion evolution (Blomberg et al. 2003, app. 2). A similar expression holds for the covariance matrix U(dp) for parasitoids under the OU evolutionary process.

Parameters dh and dp can be estimated statistically by using estimated generalized least squares (EGLS). This method does not assume that trait values Xh and Xp are normally distributed or that the strengths of association (eq. [1]) are normally distributed. For the parasitoid attack rate Aik, the EGLS estimates of dh and dp are obtained from equation (1) by constructing W(dh, dp) from V(dh) and U(dp). Letting $$\mathbf{W}( d_{\mathrm{h}\,},\: d_{\mathrm{p}\,}) =\sigma ^{2}\mathbf{C}( d_{\mathrm{h}\,},\: d_{\mathrm{p}\,}) $$, the EGLS estimates of dh and dp are those values that minimize the estimate of σ2 (the mean squared error) under the constraint that $$\mathrm{det}\,\mathbf{C}( d_{\mathrm{h}\,},\: d_{\mathrm{p}\,}) =1$$. Specifically, for given values of dh and dp (Garland and Ives 2000),

The value of $$\hat{\sigma }^{2}( \mathrm{d}\,_{\mathrm{h}\,}\mathrm{,}\,\:\mathrm{d}\,_{\mathrm{p}\,}) $$ can be minimized numerically with respect to dh and dp. If A is normally distributed, this procedure gives the maximum likelihood estimates of dh and dp (Judge et al. 1985). If A is not normally distributed, the EGLS estimates of dh and dp will nonetheless be asymptotically unbiased, of minimum variance, and consistent (Judge et al. 1985).

To obtain statistical confidence bounds for dh and dp, we used a bootstrap procedure. Under the assumption that the model given by equation (1) is correct, standardized residuals can be computed as $$\mathbf{D}( \mathbf{A}-\hat{b}_{0}) $$, where D is the matrix such that $$\mathbf{DCD}^{\prime }=\mathbf{I}$$. Because $$\mathrm{E}\,\{ \mathbf{D}( \mathbf{A}-\hat{b}_{0}) [ \mathbf{D}( \mathbf{A}-\hat{b}_{0}) ] ^{\prime }\} =\sigma ^{2}\mathbf{DCD}^{\prime }=\sigma ^{2}\mathbf{I}$$, the standardized residuals all have the same variance, and the covariance between each pair of values is 0. Therefore, under the hypothesis that the fitted model is correct, the standardized residuals can be resampled (with replacement) to give bootstrap replication data sets with the same statistical properties as the original data set. After fitting values of dh and dp using EGLS estimation, we produced bootstrap replication data sets from the standardized residuals, $$\mathbf{D}( \mathbf{A}-\hat{b}_{0}) $$ and reassembled a data set by back‐transforming over D. We then fit each replication data set by using the same EGLS procedure that we used to fit the original data. Performing this procedure for 2,000 replication data sets gave 2,000 estimates of the paired values of dh and dp, and the joint frequency distribution of these values gives the bootstrap approximation of the probability distributions of the estimators of dh and dp from which we computed approximate confidence intervals (Efron and Tibshirani 1993).

A technical issue arises in our bootstrapping procedure. Although the procedure guarantees that the replicate residuals $$\mathbf{D}( \mathbf{A}-\hat{b}_{0}) $$ have the same mean and variance as the standardized residuals from the original data, the back‐transformed replicate data sets have qualitative differences from the original data. In particular, while the original data set contains many zeros (host‐parasitoid pairs that do not interact), the replicate data sets do not. Similarly, when the binary measure of host‐parasitoid association is used, the back‐transformed data do not contain only zeros and ones. Thus, the bootstrapped data sets have different properties from the original data. Nonetheless, because the EGLS procedure is based on minimizing the variance among weighted residuals in the transformed space, the bootstrap procedure is likely to be insensitive to the distribution of data containing many zeros. Unfortunately, validating the bootstrap procedure is difficult because there is no way to generate simulation data sets having the same statistical properties of the original data. Therefore, we are forced to be cautious about the confidence intervals produced by our bootstrap procedure when many zeros occur in host‐parasitoid associations or when using the binary measure of host‐parasitoid association.

Appendix B Predicted Associations for New Hosts or Parasitoids

Consider a host species z whose phylogenetic position is known. Suppose we seek to estimate the strength of the association Azk between it and an arbitrary parasitoid k already present in the community. Let σ2C denote the $$N\times N$$ covariance matrix fitted to the existing host and parasitoid species in the community. Let σ2Czz be the $$m\times m$$ matrix containing the covariances for the predicted strengths of association between the new host and each of the m parasitoid species in the community. The elements of σ2Czz are $$v_{zz}u_{kl}$$ ($$k,\: l=1,\:\ldots ,\: m$$). Let σ2Cz be the $$m\times N$$ matrix containing the covariances between the new host‐parasitoid pairs and the remaining host‐parasitoid pairs for the new host; the elements of σ2Cz are $$v_{zj}u_{kl}$$ ($$j=1,\:\ldots ,\: n$$; $$k,\: l=1,\:\ldots ,\: m$$). Denote by $$\hat{A}_{\mathrm{z}\,}$$ the $$m\times 1$$ vector of expected values of the parasitoid attack rates by the m parasitoids on species z. Under the assumption that the distribution of A is multivariate normal (e.g., Fuller 1996),

Furthermore, the estimate of the covariance matrix of the estimator $$\hat{A}_{\mathrm{z}\,}$$ is (e.g., Fuller 1996)
from which standard errors of the estimates can be calculated. Although strict normality is unlikely to hold for real data sets, these results derived under the assumption of normality should provide reasonable approximations to the expected values of $$\hat{A}_{\mathrm{z}\,}$$ and their associated standard errors. This is demonstrated in figure 2.

Similar formulas can be derived in a straightforward way for associations involving a new species of parasitoid. Furthermore, partial information about a new species can be incorporated into the predicted parasitoid attack rates by including host‐parasitoid pairs with known attack rates in A (and matrix C) and calculating $$\hat{A}_{\mathrm{z}\,}$$ for the unknown attack rates.

Appendix C Alternative Definition of aik

For the analyses presented in the main text, we calculated the parasitoid attack rate aik by dividing Aik by the average value of Aik from all host species i from which parasitoid k was reared; this excluded zeros for host species from which parasitoids were not reared. Here, we give results for the alternative procedure for calculating aik by dividing Aik by the average value of Aik from all host species.

The alternative values of aik are positively correlated with the values used for the analyses in the text, although they are not extremely highly correlated (Pearson correlation, $$r=0.70$$). Table C1 gives the results corresponding to table 1 for the alternative aik. The alternative aik gives a weaker phylogenetic signal from the host phylogeny (either the true phylogeny, Vh, or the phylogeny constructed by grafting host species onto the tree phylogeny, Vt) and a slightly stronger signal from the parasitoid phylogeny. The increase in importance of the parasitoid phylogenetic signal suggests that there is a signal in the number of host species attacked by parasitoids; the two procedures for calculating aik are sensitive to parasitoid host range because the alternative definition gives higher values of aik for parasitoid species that attack fewer species of hosts. However, this suggestion is refuted by the fact that there is no statistically significant parasitoid phylogenetic signal in the number of host species attacked per parasitoid (analysis not presented). Therefore, we are uncertain why the two procedures for calculating aik give different results, although the differences are minor.

Table C1:

Phylogenetic signal in parasitoid‐host associations using the alternative procedure for calculating


; compare to

table 1
Dependent variable, host phylogenydhdpMSEdMSEstarMSEb
Vh.086 (0, .24).21 (.042, .41).65.70.94
Vt.35 (.19, .52).27 (.075, .44).60.67.77

Appendix D Matlab Code for Table 1

The Matlab code that we used for the core analyses in the article (illustrated in table 1) may be found with this link: Matlab code. The code has not been peer reviewed, and neither the journal nor the authors are able to provide support. Programs for the other analyses in the article are available from A.R.I. on request. The MATLAB files provided include code (IG_AmNat.m [base code] and IGfunct.m [called by IG_AmNat.m]) and data files (hostV.txt, paraV.txt, plantV.txt, and hostparadata.txt).

Literature Cited


Matlab zip file