# Chain Connection, Close-Knitness, and the Difference Principle

## Abstract

When distributing the benefits produced by social cooperation, Rawls’s difference principle targets a specific group (i.e., the least advantaged group) and requires its expectations to be maximized. One natural worry is whether the practical application of the difference principle comes with a significant cost to other groups in society. Rawls was quite aware of this potential worry and gave his earnest efforts to respond to it. His solution comes from his notions of chain connection and close-knitness. Rawls’s claim was that whenever society satisfies both chain connection and close-knitness, the practical implementation of the difference principle will (*a*) always lead to strict Pareto improvements, and, as a result, (*b*) the final state will be Pareto optimal. In this article, it will be shown that under close scrutiny neither of these claims holds even when society is both chain connected and close-knit.

Arguably, the difference principle is the most distinctive feature comprising John Rawls’s theory of distributive justice (a.k.a. justice as fairness). The difference principle requires social and economic inequalities to be arranged so that they give the greatest benefit to the least advantaged members of society compared to all other alternate social arrangements. A natural worry is whether the practical implementation of the difference principle will raise the expectations of not just the least advantaged group but also those of all other groups in society. That is, what if maximizing the expectation of the least advantaged group can be achieved only by significantly lowering the expectations of all other groups? If this is the case, then this would, at the very least, render the difference principle less practically appealing even if its moral justification remains intact. Let us call this the “potential worry” (for the difference principle).

**The Potential Worry (for the Difference Principle):**The practical implementation of the difference principle (which seeks to maximize the expectation of the least advantaged) may come at a cost of lowering the expectations of other groups.^{1}

*chain connection*and

*close-knitness*, which he defines as follows:

**Definition (Chain Connection):**A society is “chain-connected … if an advantage has the effect of raising the expectations of the lowest position, it raises the expectations of all positions in between. For example, if the greater expectations for entrepreneurs benefit the unskilled worker, they also benefit the semiskilled” (Rawls 1999, 69–70).

**Definition (Close-Knitness):**A society is “close-knit … [if] it is impossible to raise or lower the expectation of any representative man without raising or lowering the expectation of every other representative man, especially that of the least advantaged. There is no loose-jointedness, so to speak, in the way expectations hang together” (Rawls 1999, 70).

Note that chain connection says nothing about how the expectations of the other groups change when the least advantaged group does not gain; when the least advantaged group does not gain, the expectations of any other group may increase or decrease or stay constant. Hence, chain connection does not say that “all effects move together” (Rawls 1999, 70). Rather, chain connection characterizes the specific direction toward which the expectations of every other group must move when the expectation of the least advantaged group does in fact rise (as a result of raising the expectation of the most advantaged group); the expectations of every other group (in between) must rise as well. In contrast to chain connection, close-knitness does say that all effects move together; in other words, the change in the expectation of one group necessarily triggers changes in the expectations of all other groups. However, unlike chain connection, close-knitness does not characterize the specific direction toward which the expectations of all other groups must change when the expectation of one group changes; they may either rise or fall.

Rawls uses a simple graphical model to illustrate how the combination of chain connection and close-knitness may solve the potential worry. He introduces figures 1 and 2 (which correspond to figs. 9 and 10 in Rawls [1999]) for this purpose. Here, Rawls assumes that there are three representative groups in society denoted *x*_{1}, *x*_{2}, and *x*_{3}. “Let *x*_{1} be the most favored and *x*_{3} the least favored with *x*_{2} in between. Let the expectations of *x*_{1} be marked off along the horizontal axis, the expectations of *x*_{2} and *x*_{3} along the vertical axis. The curves showing the contribution of the most favored to the other groups begin at the origin as the hypothetical position of equality” (Rawls 1999, 70). In other words, the curves of *x*_{3} (the least advantaged group) and *x*_{2} (the middle group) respectively represent the expected gains of the least advantaged group and the middle group as a function of the expected gains of the most advantaged group *x*_{1}. The assumption here is that increasing the benefit of the most favored group *x*_{1} will have a trickle-down effect that affects the expectations of other less favored groups. We may write *x*_{3}(*x*_{1}), *x*_{2}(*x*_{1}) to make this functional relationship more explicit.

When applied to the graphical model, “chain connection means that at any point where the *x*_{3} curve is rising to the right, the *x*_{2} curve is also rising, as in the intervals left of the points *a* and *b* in figures 9 and 10,” while “close-knitness means that there are no flat stretches on the curves for *x*_{2} and *x*_{3}” (Rawls 1999, 70–71). We can see that all the curves illustrated in Rawls’s figures 9 and 10 are close-knit.

The main difference between Rawls’s figures 9 (our fig. 1) and 10 (our fig. 2) is in whether chain connection (in addition to close-knitness) holds. Figure 1 (Rawls’s fig. 9) illustrates a society that is both chain connected and close-knit. In figure 1 (Rawls’s fig. 9), the expectation of the least advantaged group *x*_{3} rises until point *a* after which it declines. Chain connection only restricts how the expectations of other groups should change when the expectation of the least advantaged group is in fact rising; they all must rise, which is confirmed by all curves in figure 1 (Rawls’s fig. 9). Chain connection does not restrict how the expectations of other groups must change when the expectation of the least advantaged group is either falling or staying constant. The expectation of the least advantaged group is falling after point *a*. So, chain connection does not restrict how the expectations of other groups must change after point *a*; they can rise, decline, or stay constant. Hence, both curves *x*_{2} (which rises after point *a*) and ${x}_{2}^{\prime}$ (which declines after point *a*) are consistent with chain connection. In short, all the curves in figure 1 (Rawls’s fig. 9) display chain connection in addition to close-knitness.

By contrast, figure 2 (Rawls’s fig. 10) illustrates a society that is close-knit but not chain connected. This can be confirmed by observing what is happening at the right-hand side of point *b*; there, the expectation of the least advantaged group *x*_{3} is rising as the expectation of the most favored group *x*_{1} increases, but the expectation of the middle group *x*_{2} is falling. This is inconsistent with chain connection.

One important observation is that when a society is close-knit but not chain connected (as in fig. 2; Rawls’s fig. 10), raising the expectation of the least advantaged group may come at a cost of lowering the expectation of some other group. Figure 2 (Rawls’s fig. 10) illustrates this as that the expectation of the middle group (*x*_{2}) starts to decline after point *b*, whereas the expectation of the least advantaged group (*x*_{3}) continues to rise. So, a society that is close-knit but not chain connected falls prey to the potential worry; that is, in such a society, raising the expectation of the least advantaged may come at the expense of lowering the expectation of other groups (e.g., the middle group *x*_{2} in fig. 2; Rawls’s fig. 10).

Rawls’s intention at that point of the book was to demonstrate that the practical implementation of the difference principle does not fall prey to the potential worry *if society is both close-knit and chain connected*. “I shall not examine how likely it is that chain connection and close-knitness hold. The difference principle is not contingent on these relations being satisfied. However, when the contributions of the more favored positions spread generally throughout society and are not confined to particular sectors, it seems plausible that if the least advantaged benefit so do others in between. … Thus it seems probable that if the authority and powers of legislators and judges, say, improve the situation of the less favored, they improve that of citizens generally” (Rawls 1999, 71). Rawls’s basic argument is that given both chain connection and close-knitness, any move toward raising the expectations of the least advantaged will always raise the expectations of all. As a result, each step in the realization of the difference principle under both chain connection and close-knitness will result in Pareto improvements.^{2} And the end state at which the difference principle eventually arrives will be Pareto optimal (or Pareto efficient).^{3}

Let’s go back to figure 1 (Rawls’s fig. 9), which exemplifies a society that is both chain connected and close-knit. There, the difference principle will choose point *a* at which the curve of *x*_{3} reaches its unique maximum—this is the point where the expectation of the least advantaged group is maximized. Starting from the point of origin (which is supposed to represent what Rawls calls the “hypothetical position of equality”; 1999, 70), each incremental move toward point *a* represents each step in the practical realization of the difference principle. We can see that, starting from the point of origin, every group gains in every step moving toward point *a*; that is, each subsequent move toward realizing the difference principle is a Pareto improvement over the previous social state. And once we arrive at point *a*, which is the point that the difference principle prescribes, there can be no more Pareto improvements as the curve of *x*_{3} has reached its unique maximum, and, hence, moving to a different point will necessarily make the value of *x*_{3} strictly lower. In other words, point *a*, which is prescribed by the difference principle, is Pareto optimal/efficient. This solves the potential worry.

So, Rawls’s main response to the potential worry was that given that society is both chain connected and close-knit, the practical implementation of the difference principle will not only maximize the expectation of the least advantaged, but it will also improve the situation of all groups in society resulting in a Pareto optimal/efficient social state.^{4} We can see that both chain connection and close-knitness together serve as the critical cornerstone of Rawls’s response to the potential worry. Then, how likely is it that society would satisfy both chain connection and close-knitness? Rawls thought that under the assumption of close-knitness, “*Chain connection may often be true, provided the other principles of justice are fulfilled*. If this is so, then we may observe that within the region of positive contributions (the region where the advantages of all those in favored positions raise the prospects of the least fortunate), any movement toward the perfectly just arrangement improves everyone’s expectation” (Rawls 1999, 71, emphasis added).

Here “the other principles of justice” refer to the principle of equal basic liberty and the principle of fair equality of opportunity, which are the other two principles comprising justice as fairness, which take strict priority over the difference principle in their practical implementations. So, what Rawls is proposing is in effect an internal solution to the potential worry. That is, in order to guarantee that the practical implementation of the difference principle will be free of the charge that it will maximize the expectations of the least advantaged at the expense of sacrificing the expectations of other groups, society must be both chain connected and close-knit. But this is no problem because we get both chain connection and close-knitness for free even before we implement the difference principle because once society implements the principle of equal basic liberty and the principle of fair equality of opportunity, whose practical implementations take strict precedence over that of the difference principle, society will already be chain connected and close-knit.

I believe that this internal solution, if successful, is a very powerful response to the potential worry; it, in effect, shows that Rawls’s theory of justice, namely, justice as fairness, preempts the very problem that it may potentially create. The problem is that the internal solution does not really work as Rawls intended. In the remainder of the article, I demonstrate how Rawls’s internal solution fails. What follows is Rawls’s main thesis, which I will demonstrate to be false:

**Rawls’s Main Thesis:**Suppose society is both chain connected and close-knit. Then (starting from the hypothetical position of equality) every step in realizing the difference principle will be a Pareto improvement, and the final social outcome reached through such a process will be Pareto optimal/efficient.

*x*

_{3}rises in the two sections (from the origin to point

*d*and from point

*e*to point

*f*). In these two sections, the expectation of every other group is rising as well. Hence, the society is chain connected. We can also see that there are no flat stretches in the curves for

*x*

_{2}and

*x*

_{3}. Hence, the society is also close-knit. Starting from the point of origin, the expectation of the least advantaged group

*x*

_{3}is maximized at point

*f*. Hence,

*f*is the point that the difference principle selects. But, starting from the point of origin, we can see that not every step in realizing the difference principle is a Pareto improvement over the previous state; there exists a section in which the expectation of some group decreases. In particular, in the section that starts from point

*d*and ends at point

*e*, the expectation of the least advantaged group

*x*

_{3}is decreasing. As a result, the moves from point

*d*to point

*e*do not constitute Pareto improvements even though such transitions are required to eventually reach the social state prescribed by the difference principle (i.e., point

*f*). In short, even when society is both chain connected and close-knit, this does not guarantee that the practical realization of the difference principle will always be Pareto improvements.

^{5}

The existence of sections like “from point *d* to point *e*” are not mere far-fetched possibilities. There could be sections in which the expectations of the least favored group temporarily decrease due to the time lag in the trickle-down effects after implementing a new policy that ultimately aims to maximize the expectations of the least advantaged by giving more incentives to the most favored group. To avoid staying at (what scholars have called) the “local maximum” and reach the “global maximum” prescribed by the difference principle (i.e., what Rawls calls the “perfectly just arrangement”), we may have to endure such temporary setbacks before society as a whole starts to enjoy the long-term benefits of a given policy measure. In any case, chain connection and close-knitness do not guarantee that the practical application of the difference principle will always be Pareto improvements.

Now, somebody might think that even if this is true, chain connection together with close-knitness at least guarantees that the final social outcome reached by the difference principle will be Pareto optimal/efficient. Let’s go back to figure 3. Again, the final outcome that the difference principle prescribes is point *f*. And, point *f* is indeed Pareto optimal/efficient as there is no way to raise the expectations of some group without lowering the expectations of another. Unfortunately, even this weaker claim—namely, that the final social outcome reached by the difference principle will be Pareto optimal/efficient—is false.

The reader can confirm that the society depicted in figure 4 is both chain connected and close-knit. Figure 4 looks similar to figure 3 except that in figure 4 (unlike fig. 3) there exist two points at which the expectation of the least advantaged group *x*_{3}(*x*_{1}) is maximized: point *h* and point *j*. Since the expectation of the least advantaged group is maximized at *h* and *j*, both points are compatible with the prescriptions of the difference principle. Yet, point *h* is not Pareto optimal/efficient. This is because there exists another social state, namely, point *j*, where the expectations of the least advantaged group are no worse but the expectations of the middle and the most favored groups are strictly higher. So, even when society is both chain connected and close-knit, this does not guarantee that the final social state(s) prescribed by the difference principle will be Pareto optimal/efficient.^{6}

One way out of this conundrum is to find a way to rule out point *h* and make point *j* the unique prescription of the difference principle. Note that point *j* also maximizes the expectation of the least advantaged group, but, unlike point *h*, point *j* is Pareto optimal/efficient. One way to rule out point *h* is to apply the difference principle lexicographically. This is what Rawls calls the “lexical difference principle,” which he defines as follows: “in a basic structure with *n* relevant representatives, first maximize the welfare of the worst off representative man; second, for equal welfare of the worst off representative, maximize the welfare of the second worst off representative man, and so on until the last case which is, for equal welfare of all the preceding $n-1$ representatives, maximize the welfare of the best off representative man. We may think of this as the lexical difference principle” (Rawls 1999, 72).

If we compare point *h* and point *j* in figure 4, we can confirm that even though the expectations of the least advantaged group *x*_{3} are the same in both points, the expectations of the middle group *x*_{2} and the most favored group *x*_{1} are greater in point *j* than those in point *h*. Hence, the lexical difference principle (unlike the difference principle) uniquely prescribes point *j* as the final social outcome. So, if we substitute the lexical difference principle for the difference principle, then it would seem that Rawls’s main thesis at least in the following modified form remains true.

**Rawls’s Main Thesis Modified:**Suppose society is both chain connected and close-knit. Then (starting from the hypothetical position of equality) every step in realizing the *lexical* difference principle will be a Pareto improvement, and the final social outcome reached through such a process will be Pareto optimal/efficient.

*d*to

*e*in figure 3 and

*h*to

*i*in figure 4. So, replacing the difference principle with the lexical difference principle, although achieving Pareto optimality/efficiency, does not completely solve the problem as it fails to achieve a Pareto improvement in each step of its implementation. Hence, even the modified version of Rawls’s main thesis that replaces the difference principle with the lexical difference principle is only partially correct.

However, even this partially correct solution comes with costs. The first is that the lexical difference principle fails to provide “continuous” ethical judgments over various distributional states. For instance, each distribution ${A}^{n}=\left(1+(1/n),1+(1/n),(1/n)\right)$ is strictly preferred to $B=\left(2,2,0\right)$ for all $n\in \mathbb{N}$ by the lexical difference principle. (Note that for all $n\in \mathbb{N}$ each step from *A ^{n}* to ${A}^{n+1}$ satisfies both chain connection and close-knitness.) But, in the limit (when $n\to \infty $), the evaluation over the two distributions abruptly changes, and $B=\left(2,2,0\right)$ is now strictly preferred to ${A}^{\infty}=\left(1,1,0\right)$ by the lexical difference principle. Such a discontinuity prevents the social preferences derived from the lexical difference principle to be represented by a continuous indifference curve. However, this is a rather technical concern that may not convince the unattractiveness of discontinuous ethical judgments for many people. Rather, the intuitive appeal of continuity in our ethical judgments derives from Aristotle’s maxim that says “like cases should be treated alike,” which is widely endorsed by legal theorists as a normative guide for making consistent judicial decisions. The point is that if the difference between any two cases is vanishingly small, then the normative evaluations over the two cases should not widely differ. The lexical difference principle fails to meet this basic standard. To some people, this might not be a deal-breaker, yet discontinuity in ethical judgments is still unattractive as it shows there can be “wild jumps” in our ethical reasoning even by extremely small perturbations of our ethical data, which may seem arbitrary.

Second, there is a sense in which this would defeat the very purpose of introducing the notion of close-knitness and why Rawls proposed the difference principle and not the lexical difference principle in the first place. According to Rawls:

Close-knitness is assumed in order to simplify the statement of the difference principle. It is clearly conceivable, however likely or important in practice, that the least advantaged are not affected one way or the other by some changes in expectations of the best off although these changes benefit others. In this sort of case close-knitness fails, and to cover the situation … we may think of … the lexical difference principle. *I think, however, that in actual cases this principle is unlikely to be relevant*. … The general laws governing the institutions of the basic structure insure that cases requiring the lexical principle will not arise. Thus I shall always use the difference principle in the simpler form.

Of course, as explained in the beginning of this article, the mere fact that the difference principle falls prey to the “potential worry” is not a conclusive reason to reject it. The difference principle may be justified on other weightier theoretical or normative considerations. As a matter of fact, although one of Rawls’s earlier arguments for the difference principle heavily relied on considerations of Pareto improvement/efficiency—that is, starting from a default equal distribution of income and wealth, we arrive at the difference principle by successively applying Pareto improvements while giving the least advantaged a veto power to reject any proposed distribution (Rawls 1999, 130–31)—later Rawls (2001) became less reliant on and attached less importance to these kinds of Paretian arguments.^{7} Nonetheless, whether the difference principle can successfully achieve Pareto improvements in each step of its practical application and can eventually reach Pareto efficiency is not an issue that can be ignored so easily because, if maximizing the prospects of the least advantaged group comes at the expense of lowering the prospects of other groups, then the fundamental criticism that Rawls raised against utilitarianism—namely, that “utilitarianism does not take seriously the distinction between persons” (1999, 24)—may equally be raised against his own difference principle as well. We have seen that unlike what Rawls had initially hoped, this problem cannot be preempted by assuming chain connection and close-knitness. If it is true that one important reason why Rawls proposed the difference principle and not the lexical difference principle stems from his belief that the lexical difference principle would be redundant under conditions of chain connection and close-knitness (as the difference principle would be sufficient to guarantee Pareto improvements and Pareto optimality/efficiency under these conditions), but it turns out that this belief is false (i.e., that the difference principle implies neither Pareto improvements nor Pareto optimality, and even the lexical difference principle fails to guarantee Pareto improvements under conditions of chain connection and close-knitness), then this, at the very least, gives us some reason to reconsider the status of the difference principle (as well as its lexicographic variant) as forming an integral part of justice as fairness.

## Notes

Hun Chung ([email protected]; http://hunchung.com) is an associate professor of the faculty of political science and economics at Waseda University, 1-6-1 Nishiwaseda Shinjuku-ku, Tokyo 169-8050.

1. A more fundamental worry would be that the practical implementation of the difference principle may worsen the situation of (not simply other groups but also) the least advantaged group themselves. For this type of argument, see Chung (2020, forthcoming), Gustafsson (2018), and Haslett (1985). See also Chung (2021) for a criticism of Gustafsson (2018).

2. In Rawls’s graphical model, a point *x* is a Pareto improvement over another point *y* if and only if, for each curve *x*_{1}, *x*_{2}, and *x*_{3}, its value is at least as great at point *x* as it is at point *y*, and there exists a curve *x _{i}* ($i=1$, 2, 3) whose value is strictly greater at point

*x*than it is at point

*y*.

3. In Rawls’s graphical model, a point *x* is Pareto optimal (or Pareto efficient) if and only if any other point will either have at least one curve whose value is strictly lower than what it is at point *x* or the values of all three curves will be no greater than those at point *x*.

4. Others have given a similar interpretation. For instance, Williams (1995) explains that “Rawls himself suggests … that if … the expectations of representative individuals are *chain-connected* as well as close-knit, then the distribution which maximizes the expectations of the least advantaged is the efficient distribution which is the most egalitarian” (260).

5. Note that this result does not depend on assuming that the expectation of each group changes in a continuous manner. For instance, in fig. 3, suppose the expectation of the least advantaged group *x*_{3} abruptly jumps down to *x*_{3}(*e*) at point *d* and then increases afterward. Then, the situation will still satisfy both chain connection and close-knitness, yet the realization of the difference principle will fail to guarantee Pareto improvements in such a case as well.

6. Some scholars have tried to solve the disconnection between the difference principle and Pareto efficiency by fiat, by building in the notion of Pareto efficiency into the very definition of the difference principle. For instance, Shenoy and Martin (1983) claim that a more plausible version of the difference principle would state that social economic inequalities should be arranged so as to produce the most egalitarian Pareto efficient distribution (see 127). Williams (1995) calls this the “revisionist difference principle” and ultimately rejects it.

7. I thank an anonymous reviewer for pointing this out.

## References

Chung, Hun. 2020. “Rawls’s Self-Defeat: A Formal Analysis.” *Erkenntnis*85:1169–97.Chung, Hun. 2021. “On Choosing the Difference Principle Behind the Veil of Ignorance: A Reply to Gustafsson.” *Journal of Philosophy*118 (8): 450–63.Chung, Hun. Forthcoming. “When Utilitarianism Dominates Justice as Fairness: An Economic Defence of Utilitarianism from the Original Position.” *Economics and Philosophy*.Gustafsson, Johan E. 2018. “The Difference Principle Would Not Be Chosen Behind the Veil of Ignorance.” *Journal of Philosophy*115 (11): 588–604.Haslett, D. W. 1985. “Does the Difference Principle Really Favour the Worst Off?” *Mind*94 (373): 111–15.Rawls, John. 1999. *A Theory of Justice*. Rev. ed. Cambridge, MA: Belknap Press of Harvard University Press.Rawls, John. 2001. *Justice as Fairness*. Cambridge, MA: Belknap Press of Harvard University Press.Shenoy, Prakash P., and Rex Martin. 1983. “Two Interpretations of the Difference Principle in Rawls’s Theory of Justice.” *Theoria*49 (3): 113–41.Williams, Andrew D. 1995. “The Revisionist Difference Principle.” *Canadian Journal of Philosophy*25 (2): 257–81.