Skip to main content

Uncertain Causation, Regulation, and the Courts

The George Washington University Regulatory Studies Center


Law-and-economics suggests principles for deciding how best to allocate rights, duties, and legal liability for actions that cause harm or that fail to prevent it. The same principles can be applied to suggest how to coordinate the overlapping activities of regulators and the courts in constraining and penalizing individual behaviors and business activities in an effort to increase net social benefits. This paper reviews law-and-economics principles useful for benefit-cost analysis (BCA) and judicial review of regulations and public policies; highlights the crucial roles of causation and uncertainty in applying these principles to choose among competing alternatives; and discusses how net social benefits can be increased by applying these same principles to judicial review of regulations that are based on uncertain assumptions about causation of harm. Real-world examples of air pollution regulation and food safety regulation illustrate that deference by the courts (including administrative law judges) to regulators is not likely to serve the public interest when harmful effects caused by regulated activities are highly uncertain and are assessed primarily by regulatory agencies. In principle, responsibility for increasing net social benefits by insisting on rigorous analysis of causality and remaining uncertainties by both plaintiffs and defendants should rest with the courts. In practice, failure of the courts to shoulder this burden encourages excessive regulation that suppresses socially beneficial economic activities without preventing the harms or producing the benefits that regulators and activists project in advocating for regulation. Stronger judicial review of regulations by courts that require sound and explicit reasoning about causality from litigants is needed to reduce arbitrary and capricious regulations, meaning those in which there is no rational connection between the facts found and the choice made, and to promote the public interest by increasing the netbenefits from regulated activities.

1. Introduction: Principles of Law and Economics and Benefit-Cost Analysis of Regulations

A common normative principle for defining “good” laws and regulations and institutions to administer and enforce them is that they should be designed and operated to maximize net social benefits, defined as the sum over all individuals of the difference between expected total benefits and expected total costs (including opportunity costs) received (see Watkins, n.d.). If time is important, expected net present values of costs and benefits are used. We will call this the benefit-cost analysis (BCA) principle. It has been widely used (1) in engineering economics to identify the most beneficial project scales and portfolios of projects to undertake (given budget constraints); (2) in public policy to justify proposed regulatory initiatives and policy interventions; and (3) in healthcare to recommend technologies and practices that are most worth adopting. As illustrated in this section, BCA can also be viewed as a framework motivating and unifying various law-and-economics principles for using the law to maximize widely distributed social benefits from economic transactions. Later in this paper, we will propose that it can also fruitfully be applied to the problem of determining how judicial review can and should be used to increase the net social benefits—henceforth called simply the netbenefits—of regulations.

More rigorous theoretical formulations often replace expected netbenefits with expected social utility (also called social welfare), usually represented as a weighted sum of individual utilities following a theorem of Harsanyi (1955; Hammond 1992). Various proposals have been advanced for scaling, weighting, eliciting, and estimating individual utilities, despite an impressive body of theory on incentive-compatibility constraints and impossibility theorems for voluntary truthful revelation of private information about preferences, and despite progress in behavioral economics showing that elicited preferences do not necessarily reflect rational long-term self-interest or provide a secure normative foundation for decision-making. Whether the expected netbenefit or expected social utility is taken as the normative criterion, its maximization is usually understood to be subject to constraints that protect individual rights and freedoms by preventing systematic exploitation of any individual or class of individuals to benefit others. In the simplest cases, risk aversion and other nonlinearities are ignored throughout, and decisions are made simply by comparing the expected netbenefits of the alternatives being considered, summed over all the parties involved.

1.1. Example: The Learned Hand Formula for Liability Due to Negligence

The Learned Hand formula in law and economics holds that a defendant should be considered negligent, and hence liable, for harm that his failure to take greater care in his activities caused to a plaintiff, if and only if it would cost the defendant less to take that care than the expected cost of harm to the plaintiff done by not taking that care, namely, the probability of harm to the plaintiff times the magnitude of the harm if it occurs (see United States v. Carroll Towing Co., 159 F.2d 169, 173 [2d Cir. 1947]). In symbols, the defendant should be found negligent if and only if C < pB, where C is the cost to the defendant of taking an expensive action to prevent the harm from occurring; p is the probability of harm occurring if that action is not taken; and B is the cost, or severity, of the harm to the plaintiff expressed in dollars if it does occur and hence, the benefit from preventing it if it would otherwise have occurred. The Learned Hand rule defines negligence as failure to take care not to harm another person when the expected net social benefit of doing so is positive. It encourages each participant in a society of laws to consider, in deciding what to do, the expected cost to others as of equal weight with the costs or benefits to one’s self. This dispassionate, equal weighting of the interests of all is precisely what is required for individual choices to maximize net social benefit—that is, the sum of the gains and losses of all participants.

The Learned Hand rule encourages economic agents to act as utilitarians in decision-making, counting the consequences to others as of equal weight with consequences to self. By doing so, it also provides a utilitarian rationale for the calculus of negligence in tort law as promoting the public interest, construed as maximizing net social benefit. In other words, the rationale for individual choices promoted by the rule of negligence law and the rationale for adopting that law itself are the same: to maximize total (and hence average) netbenefits from interactions among individuals in society. If each individual is as likely to be a potential defendant as a potential plaintiff in the many transactions and situations where the law of negligence applies, then each person expects to gain on average from the rule of this law.

The BCA principle of deciding what to do by comparing expected net social benefits to alternative choices can be used by individuals to choose among different actions and levels of care, given a legal and regulatory system. It can also be used as a basis for designing institutions—for example, choosing among alternative liability rules and regulations based on the behaviors that they are expected to elicit in response to the incentives they create, with the goal of maximizing the estimated net social benefits arising from those behaviors. The BCA principle can be generalized well beyond the domain of negligence liability. BCA comparisons provide a widely applicable rationale for determining the legal duties of economic actors to take due care in their activities, refrain from excessively hazardous behaviors, provide pertinent information to others about risks, and bear liability for harm arising at least in part from their activities.

1.2. Example: The Cheapest Cost-Avoider Principle When Liability Is Uncertain

Suppose that either of two parties—a producer and a user or consumer of a potentially hazardous product such as a lawnmower, an electrical hair dryer, or a medical drug—can choose between taking more care or less care to prevent harm to the consumer from occurring. Taking more care is more costly than taking less care to the agent making that choice. If harm occurs that either party could have prevented by taking more care in manufacture or use, respectively, then how should liability for the harm be allocated between them? A standard answer when the respective costs for each party to have prevented the harm are common knowledge is the cheapest cost avoider principle: the party who should bear liability for the harm is the one who could have prevented it most cheaply. Like the Learned Hand principle, this again requires each agent (each of the two parties) to consider a dollar of cost to another person to have the same importance as a dollar of cost to one’s self in calculating net social benefits. This implies that the agent who can most cheaply avoid a harm or loss should do so. When the relevant costs are not common knowledge, however, this simple principle does not necessarily hold. For example, if each agent assesses a positive probability that the other will be found liable if an accident occurs that harms the consumer using or consuming the product, then both may underinvest in safety, increasing the probability of harm above what it would be if costs and liability were common knowledge and reducing net social benefit. In this case, a regulation that requires the producer to use the higher level of care, combined with credible enforcement of the regulation via random inspections and large penalties for detected violations, can in principle increase the netbenefits to both producers and consumers by making it common knowledge that it is in the producer’s best interest to take a high level of care. Then consumers, who might otherwise have been unwilling to buy the product because of fear that it is unsafe, might be willing to buy it and even to pay more for its higher level of safety, thus increasing both producer surplus and consumer surplus compared to the situation without regulation. A strict liability standard that makes it common knowledge that it is in the producer’s best interest to take a high level of care could create the same benefits. In both cases, however, shifting all responsibility for safety to the producer could make an otherwise beneficial product too costly to produce, even if both producer and consumer would benefit if the consumer could be trusted to take due care in using it. In general, designing liability standards and regulations to maximize total social benefits requires considering who knows what about care levels and product risks, the costs of acquiring such information, possibilities for credibly signaling it, and moral hazards and incentives. In practice, a mix of market, liability, insurance, warranty, and regulatory instruments is used to deal with these realistic complexities.

More generally, producers and consumers of potentially hazardous consumer products or services, in addition to employers and employees in hazardous occupations, owners and renters or leasers of properties, sellers and buyers of used cars or mortgages or insurance products or collateralized debt obligations, and owners and neighbors of hazardous or noxious facilities, can all be viewed as making choices that jointly affect net social benefits. The obligations and penalties imposed by legal and regulatory institutions can then be viewed from the law-and-economics perspective as seeking to promote choices to maximize widely distributed net social benefits. In turn, these institutions themselves, and the ways in which they interact with each other and with the public, can also be designed and evaluated by this criterion.

A powerful intuition underlying these applications of BCA principles to liability and regulation of economic activities is that a nation of laws serves its citizens best by implementing just those laws and policies for which the total benefits outweigh the total costs, including opportunity cost as well as implementation and enforcement costs. To protect the rights of individuals, costs and benefits should also be distributed so that everyone expects to gain over a lifetime from application of the adopted laws, even though specific individuals lose in specific cases, as when a court determines that one litigant and not the other wins a tort law case. This distribution requirement prevents adoption of laws that systematically exploit one individual or class of individuals to benefit the rest. It encourages laws and regulations that, arguably, might be collectively chosen from behind a Rawlsian veil of ignorance (see Hammond 1992). Identifying collective choices with positive netbenefits and acceptable distributions of costs and benefits is a central goal of BCA.

Despite their intuitive appeal, making these principles precise is famously difficult. A vast literature in collective choice theory considers how to aggregate individual preferences, beliefs, and utilities to yield social utility functions that maximize, or at least decision rules for making Pareto-efficient collective decisions that would increase, the netbenefits to all individuals. This literature has yielded a rich variety of impossibility results and tradeoffs among proposed criteria for evaluating the performance of aggregation procedures. Criteria that have been considered include voluntary participation, freedom to have and to express any individual preferences or utility functions in a large set, incentive compatibility in revealing the private information needed to make the process work (e.g., willingness-to-pay information), Pareto-efficiency of outcomes, complexity of the procedure (e.g., yielding a result in less than the lifetime of the participants), and balanced budget, if the procedure is costly to administer. For rational participants, only some subsets of such desiderata are mutually consistent. Some must be relinquished to obtain the rest.

As a practical matter, however, real people often do not reason or behave like the idealized rational agents to which such results apply (Thaler 2015). They are often more altruistic and cooperative than purely economic or game-theoretic reasoning would predict (Kahneman 2011). Thus, the question of how well different legal and regulatory institutions elicit choices and behaviors from real people that increase net social benefits is worth empirical as well as theoretical investigation. Likewise, even if economic agents have little or no choice about how to comply with regulations (other than perhaps the option of suing for relief if the regulations seem arbitrary and capricious or otherwise unjust), how well regulatory and legal institutions succeed in developing and enforcing policies that increase net social benefits, and how they might work together to do so better, are questions worthy of empirical and theoretical investigation.

2. Uncertain Causation Encourages Ineffective and Potentially Harmful Regulations

The central problem to be investigated in the remainder of this paper is how to identify socially beneficial choices when causation is uncertain and the sizes of the benefits caused by alternative choices are therefore unknown. In many important applications, the benefits caused by costly regulations, policy interventions, or investments are uncertain. Whether they exceed the costs may then also be uncertain. In such settings, BCA principles must be modified to deal with risk aversion, rather than considering only expected netbenefits. Moreover, uncertainty about causation can encourage regulations that are socially harmful and that would not pass a BCA test that accounts for correlated uncertainties about the effects on many individuals. This creates a need for review and accountability that regulatory agencies are not well equipped to provide for themselves. The following paragraphs develop these points.

2.1. Uncertain Causation Encourages Socially Reckless Regulation

To understand how uncertainty about causation can lead to adoption of regulations whose risk-adjusted costs exceed their benefits, suppose that a regulatory ban or restriction on some activity or class of activities, such as emissions of a regulated air pollutant, imposes an expected cost of c on each of N economic agents and yields an uncertain benefit of b for each of them, with the expected value of b denoted by E(b) and its variance denoted by Var(b). If the uncertain benefit b has a normal distribution, then a standard calculation in decision analysis shows that its certainty-equivalent value to a decision-maker with an exponential utility function is

where k is proportional to the decision-maker’s risk aversion. To a risk-averse decision-maker (i.e., having k > 0), the uncertain benefit is worth less than a certain benefit of size E(b) by the amount of the “risk premium,” k*Var(b). If the uncertainty about b is due to uncertainty about the size of the effect that a policy or regulation would cause—that is, the size of the reduction in annual mortality risk that would be achieved by a reduction or ban on a source of exposure—and if the size of this uncertain effect is the same for all N agents, then the net social benefit summed over all N agents is
(since the variance of N times b is N2 times the variance of b). This can be written as
showing that the per capita netbenefit after adjusting for risk aversion is [E(b) – cNkVar(b)]. This will necessarily be negative if N is sufficiently large, since kVar(b) > 0. However, a regulatory agency that implements regulations with expected benefits exceeding their expected costs—that is, with E(b) – c > 0—pays no attention to the risk premium term NkVar(b) or to the size of N. It ignores the fact that individual benefits received are positively correlated since a regulation that does not cause its intended and expected benefit can result in a net loss for each of the N agents. For all sufficiently large N, the risk of these simultaneous losses will outweigh the positive expected netbenefit. Thus, E(b) – cNkVar(b) will be negative even though E(b) – c is positive. (The Arrow-Lind theorem showing that government investments should be risk neutral does not apply here due to the correlated losses.) Agencies that focus only on expected benefits can therefore undertake activities or impose regulations which have negative risk-adjusted values. This is most likely for regulations with widely distributed but uncertain benefits (i.e., large N), such as air pollution or food safety regulations. In essence, if the causal hypothesis that the regulation will create its intended benefits turns out to be wrong, ignoring risk aversion to correlated losses encourages reckless expenditures and costly regulations with large net losses.

2.2. Warnings from Behavioral Economics and Decision and Risk Psychology: The Tyranny of Misperceptions

Regulatory agencies, as well as courts and corporations, are staffed by human beings with a full array of psychological foibles—heuristics and biases—that shape their beliefs and behaviors in addressing uncertain risks. Well-documented weaknesses in individual and group decision-making under uncertainty include forming opinions and taking actions based on too little information (related to Kahneman’s “what you see is all there is” heuristic); making one decision at a time in isolation rather than evaluating each in the context of the entire portfolio or stream of decisions to which it belongs (narrow framing, [see Guiso 2015]); believing what it pays us to believe, what it feels most comfortable to believe, or what fits our ideological world view (motivated reasoning, affect heuristic); seeking and interpreting information to confirm our existing opinions while failing to seek and use potentially disconfirming information (confirmation bias); and being unjustifiably confident in both our judgments and our level of certainty about them (overconfidence bias) (Kahneman 2011; Schoemaker and Tetlock 2016; Thaler 2015). In short, judgments and choices about how to manage risks when the effects of actions are highly uncertain are often shaped by emotions and psychology (“System 1”) far more than by facts, data, and calculations (“System 2”). Such decisions can be passionately advocated for, strongly felt to be right, and confidently approved of without being causally effective in producing desired results. These common weaknesses of human judgment and opinion formation have most opportunity to shape policy when the consequences caused by different choices are least certain.

Regulatory agencies face additional challenges to learning to act effectively under uncertainty, stemming from social and organizational psychology. Consider the following selection mechanism by which people with similar views might assemble into a regulatory agency with an organizational culture that is prone to groupthink (Coglianese 2001, 106) and that holds more extreme perceptions of the risks about what they regulate than most of the population. Suppose that people whose ideologies and beliefs suggest that it is highly worthwhile to regulate substance, activity, or industry X are more likely to work for agencies that regulate X than people who do not share those beliefs. Then an organizational culture might develop that is inclined to regulate well beyond what most people would consider reasonable or desirable. The result is a tyranny of misperceptions, somewhat analogous to the Winner’s Curse in auctions, in which those whose perceptions are most extreme are most likely to invest the time and effort needed to stimulate regulatory interventions. In this case, when the true hazards caused by a regulated substance or activity are highly uncertain, those who believe them to be worse than most people may be disproportionately likely to shape the regulations that restrict them. To the extent that average judgments are more likely to be accurate than extreme ones (Tetlock and Gardner 2015), such regulations may tend to reflect the misperceptions of those pushing for regulation—that risks are higher than they actually are.

Of course, the same self-selection bias can function throughout the political economy of a democracy: those who care most about making a change are most likely to work to do so, whether through advocacy and activism, litigation, regulation, legislation, or journalism directed at influencing perceptions and political actions. But regulatory agencies are empowered to take costly actions to promote social benefits even when the benefits caused by actions are highly uncertain, so that decisions are most prone to System 1 thinking. Moreover, empirical evidence suggests that simply recognizing this problem and making an effort to correct for it—for example, by instituting internal review procedures—are likely to have limited value; the same heuristics and biases that give rise to a policy are also likely to affect reviews, making misperceptions about risk and biased judgments about how best to manage them difficult to overcome (Kahneman 2011). External review by people who do not share the same information and worldview can be far more valuable in flagging biases, introducing discordant information to consider, and improving the effectiveness of predictions and decisions (Kahneman 2011; Tetlock and Gardner 2015).

It is distressing, and perhaps not very plausible a priori, to think that people and organizations that devote themselves to promoting the public interest might inadvertently harm it by falling into the familiar pitfalls of System 1 thinking when causation is uncertain. After all, do not well-run organizations anticipate and correct for such limitations by using relatively explicit and objective criteria and rationales for their decisions, well-documented reasoning and data based on peer-reviewed publications, multiple rounds of internal review, and invited critiques and reviews by external experts and stakeholders? Indeed, all of these steps play important roles in modern rulemaking procedures and regulatory procedures in the United States and elsewhere. Yet there is evidence that they are not sufficient to guarantee high-quality regulations or to block regulations for which there is no good reason to expect that the predicted benefits will actually occur. Such regulations are too often “arbitrary and capricious” in the sense that there is no rational connection (although there may be many irrational ones) between the facts presented to support projected benefits of regulations and the belief that these benefits will actually occur, and hence that regulations are worthwhile. The following examples illustrate the real-world importance of these concerns. Possible explanations and remedies will then be explored, including judicial review that insists on more rigorous and trustworthy standards of evidence for causality than regulatory agencies customarily use. We will argue that courts are often the “cheapest misperception corrector” and are best positioned to correct regulatory excesses by enforcing a higher standard of causal inference before uncertain benefits are accepted as justifying costly actions.

2.3. Example: The Irish Coal-Burning Bans

Between 1990 and 2015, coal burning was banned by regulators in many parts of Ireland, based on a belief that

[t]he smoky coal ban allowed significant falls in respiratory problems and premature deaths from the effects of burning smoky coal. … The original ban in Dublin is cited widely as a successful policy intervention and has become something of an icon of best practice within the international clean air community. It is estimated that in the region of 8,000 premature mortalities have been averted in Dublin since the introduction of the smoky coal ban back in 1990. Further health, environmental and economic benefits (estimated at €53m per year) will be realised, when the ban is extended nationwide. (Department of Communications, Climate Action and Environment of Ireland, n.d.)

The underlying scientific studies (Clancy et al. 2002), widely and approvingly cited by regulators and activists, clearly showed that particulate matter levels from coal smoke and mortality rates dropped significantly after the bans. For over a decade, activists, regulators, and the media have celebrated such findings as showing a clear causal link between coal burning and mortality and as providing a clear opportunity to reduce mortality by reducing coal burning, which could lead to substantial health and economic benefits from extending the bans nationwide and reduce substantial unnecessary deaths from delaying (Kelly 2015). Yet there is a clear logical fallacy at work here. Although the claimed successes of the bans in reducing mortality appear might appeal to common sense, wishful thinking, and confirmation bias, no potential disconfirming evidence that might conflict with this causal conclusion was sought or used in the studies that led to the claim. For example, the original study (Clancy et al. 2002) did not examine whether the drop in mortalities following the bans had causes unrelated to the ban, nor did it examine whether the drop also occurred in other countries and in areas unaffected by the bans (see Wittmaack 2007).

When a team including some of the original investigators examined these possibilities a decade later, long after successive waves of bans had already been implemented, they found no evidence that the bans had caused the reductions in total mortality rates that had originally been attributed to them (Dockery et al. 2013). Mortality rates had fallen just as much in areas not affected by the bans as in areas affected by them, and the bans had no detectable effect on reducing total mortality rates (Dockery et al. 2013). Rather, mortality rates had been declining over time throughout Ireland and much of the developed world since long before the bans began, and they continued to do so without interruption during and following them (Dockery et al. 2013). Thus, mortality rates were indeed lower after the bans than before them, even though the bans themselves had no detectable effect on them. (Searching for associations by type of adverse effect, without controlling for multiple testing biases, also failed to confirm the original identification of reductions in cardiovascular mortality associated with the bans, but it turned up associations with other cause-specific mortalities instead, as is typical of retrospective data-dredging.) If the ban left more elderly people cold in the winter, and thereby increased their mortality rates—a possibility that was not investigated—then this effect was masked by the historical trend of improving life expectancy and reduced mortality risks.

In this example, it appears that confirmation bias led to widespread and enthusiastic misinterpretation of an ongoing historical trend of declining elderly mortality rates—brought about largely by improved prevention, diagnosis, and treatment of cardiovascular diseases and reduced cigarette smoking—as evidence that the coal-burning bans were causally effective in protecting public health. This meme has continued to drive regulatory and media accounts and regulatory policy until the present as Ireland pushes to extend the bans nationwide (see Kelly 2015). The finding that the bans actually had no detectable effect in reducing all-cause or cardiovascular mortality (Dockery et al. 2013) continues to be widely ignored. This example illustrates how regulations can be enthusiastically supported and passed based on unsound reasoning about causality, such as neglecting to use control groups in assessing the effects of bans. It also shows how they can be perceived and evaluated favorably in retrospect by regulators, activists, environmental scientists, and the media as having been highly successful in creating substantial public health benefits, even if they actually had no beneficial effects.

2.4. Example: Estimated Benefits of Fine Particulate Matter (PM2.5) Regulation in the United States

An analogous process is currently unfolding on a much larger scale in the United States. The majority of total estimated benefits from all federal regulations in the United States are attributed to the effects of Clean Air Act regulations in reducing fine particulate matter (PM2.5) air pollution and thus reducing estimated elderly mortality risks. The United States Environmental Protection Agency (EPA) credits its regulation of fine particulate matter with creating nearly two trillion dollars per year of health benefits (EPA 2011a; 2011b). Yet, notwithstanding widespread impressions and many published claims to the contrary in scientific journals and the news media, it has never been established that reducing air pollution actually causes these benefits, as opposed to correlates with the benefits because of the historical context where both air pollution levels and mortality rates have been declining over time. As the EPA’s own benefits assessment states in a table, their “analysis assumes a causal relationship between PM exposure and premature mortality based on strong epidemiological evidence … However, epidemiological evidence alone cannot establish this causal link” (EPA 2011a; EPA 2011b). The reason that the epidemiological evidence cannot establish the assumed causal link is that it deals only with association, not with causation.

In the absence of an established causal relation, historical data showing that both PM2.5 and mortality rates are higher than average in some places and times (e.g., during cold winter days compared with milder days, or in earlier decades compared with later ones), and thus that they are positively correlated, are widely treated as if they were evidence of causation (see Cox and Popken 2015). This again illustrates the practical importance of confirmation bias in shaping perceptions and economic evaluations of the benefits attributed to (but not necessarily caused by) major regulations by those advocating them. Potential disconfirming evidence, such as that mortality risks declined just as much in cities and counties where pollution levels increased as where they decreased (see Cox and Popken 2015), has been neither sought nor used by advocates of PM2.5 reductions in attributing health and benefits to such reductions. As pointed out recently, “Many studies have reported the associations between long-term exposure to PM2.5 and increased risk of death. However, to our knowledge, none has used a causal modeling approach” (Wang et al. 2016). The relatively rare exceptions that report positive causal relations rest on unverified modeling assumptions to interpret associations causally, as discussed in greater detail later. Approaches that seek to avoid making such assumptions by using nonparametric analyses of whether changes in exposure concentrations predict changes in mortality rates have concluded that “[a] causal relation between pollutant concentrations and [all-cause or cardiovascular disease] mortality rates cannot be inferred from these historical data, although a statistical association between them is well supported” (Cox and Popken 2015) and that, for one hundred US cities with historical data on PM2.5 and mortality, “we find no evidence that reductions in PM2.5 concentrations cause reductions in mortality rates” (Cox, Popken, and Ricci 2013).

On the other hand, hundreds of peer-reviewed articles and media accounts claim that reducing PM2.5 causes reductions in mortality risks (e.g., Wang et al. 2016). These often present sensational conclusions, such as claiming that “[a]n effective program to deliver clean air to the world’s most polluted regions could avoid several hundred thousand premature deaths each year” (Apte et al. 2015). Similar to the original mistaken claims about effects of coal-burning bans on all-cause mortality risks in Ireland, such conclusions conflate correlation and causation. This confusion is facilitated by the increasing use of computer models to project hypothetical benefits based on assumptions of unknown validity. For example, the EPA provides a free computer program, BenMAP,1 to enable investigators to quantify the human health benefits attributed to further reductions in criterion air pollutants such as ozone (O3) and fine particulate matter (PM2.5) based on embedded expert opinions about their concentration-response correlations. Activist organizations such as the American Thoracic Society (ATS) have used BenMAP simulations to develop impressive-looking and widely publicized estimates of health benefits from further reductions in air pollution, such as this: “Approximately 9,320 excess deaths (69% from O3; 31% from PM2.5), 21,400 excess morbidities (74% from O3; 26% from PM2.5), and 19,300,000 adversely impacted days (88% from O3; 12% from PM2.5) in the United States each year are attributable to pollution exceeding the ATS-recommended standards” (Cromar et al. 2016).

But the concentration-response relations assumed in the computer simulations are not established causal relations. To the contrary, as clearly and repeatedly stated in the technical documentation for BenMAP (RTI International 2015), there is “no causality included” in BenMAP’s summary of health impact functions based on expert judgments. In more detail, the documentation explains,

Experts A, C, and J indicated that they included the likelihood of causality in their subjective distributions. However, the continuous parametric distributions specified were inconsistent with the causality likelihoods provided by these experts. Because there was no way to reconcile this, we chose to interpret the distributions of these experts as unconditional and ignore the additional information on the likelihood of causality. (RTI International 2015, 63)

Similar caveats hold for other instances of the increasingly prevalent practice of predicting reductions in mortality caused by reductions in exposure concentration by applying previously estimated concentration-response associations and slope factors, without any independent effort to establish whether they are causal. For example, Lin et al. (2017) “estimate the number of deaths attributable to PM2.5, using concentration-response functions derived from previous studies” and conclude “that substantial mortality reductions could be achieved by implementing stringent air pollution mitigation measures” without noting that the previous studies referred to only assessed associations, not causation.

In summary, similar to the case of the coal-burning bans in Ireland, substantial health benefits are attributed to tighter Clean Air Act regulations in the United States, with many calls for further reductions being voiced by activists, regulators, public health researchers and advocacy groups, and the media. Yet, it has not been shown that the regulations actually cause the benefits that are being attributed to them, and causal analysis approaches that do not make unverified modeling assumptions do not find any detectable beneficial effect of reducing current ambient concentrations of PM2.5 or ozone in recent decades, despite a voluminous scientific and popular literature projecting substantial health benefits, which should be easily detectable if real (see Cox and Popken 2015).

2.5. Example: Food Safety Regulation Based on Assumed Causation

Between 2000 and 2005, the Food and Drug Administration’s Center for Veterinary Medicine (FDA-CVM), in conjunction with activist and advocacy organizations such as the Alliance for Prudent Use of Antibiotics (APUA) and the Union of Concerned Scientists, successfully pushed to ban the antibiotic enrofloxacin, a fluoroquinolone antibiotic, from use in chickens because its use might select for antibiotic-resistant strains of the common bacterium Campylobacter, potentially causing cases of antibiotic-resistant food poisoning that would be more difficult to treat than non-resistant cases. This concern certainly sounds plausible. It received extensive media coverage via stories that usually linked it to frightening statistics on the tens of thousands of cases per year of “superbug” infections with multi drug resistant bacteria occurring in the United States. Few stories explained that those cases were from different bacteria, not from Campylobacter; that campylobacteriosis was specifically associated with consuming undercooked chicken in fast food restaurants, not with chicken prepared at home or in hospitals; and that molecular fingerprinting showed that superbug infections overwhelmingly were caused by hospital use of antibiotics in people, rather than animal antibiotics used on farms. A quantitative risk assessment model used by the FDA simply assumed that reducing use of enrofloxacin in chickens would proportionally reduce the prevalence of fluoroquinolone-resistant cases of campylobacteriosis food poisoning:

A linear population risk model used by the U.S. Food and Drug Administration (FDA) Center for Veterinary Medicine (CVM) estimates the risk of human cases of campylobacteriosis caused by fluoroquinolone-resistant Campylobacter. Among the cases of campylobacteriosis attributed to domestically produced chicken, the fluoroquinolone resistance is assumed to result from the use of fluoroquinolones in poultry in the United States. (Bartholomew et al. 2005)

This assumption swiftly made its way into risk numbers cited in activist reports and media headlines, and it was treated as a fact.

Industry and animal safety experts made a number of arguments: First, they argued real-world data refuted the causal assumption by showing that the strains of fluoroquinolone-resistant campylobacter found in people were acquired in hospitals and were not the same as those from animals. Additionally, campylobacteriosis was usually a self-limiting disease that caused diarrhea and then resolved itself, with no clear evidence that antibiotic therapy made any difference. And in the rare cases of severe infections, typically among AIDS patients or other immunocompromised people, physicians and hospitals did not treat campylobacteriosis with fluoroquinolones but generally prescribed a different class of antibiotics (macrolides). Moreover, even when fluoroquinolones (specifically, ciprofloxacin) were prescribed as empiric therapy, resistance did not inhibit its effectiveness because therapeutic doses are high enough to overcome the resistance. Evidence from earlier antibiotic bans for farm animals in Europe showed that reducing use in animals increased illnesses in animals (and hence total bacterial loads on meat) but did not benefit human health. Also, fluoroquinolone-resistant strains of campylobacter occur naturally whether or not enrofloxacin is used, and the main effect of continued use of enrofloxacin was to keep animals healthy and well nourished, reducing risks of foodborne bacteria, both resistant and non-resistant. These arguments were heard by an Administrative Law Judge (ALJ), who was a career FDA employee with a record of deciding cases in favor of his employer. The ALJ found the industry arguments unpersuasive, and the FDA withdrew approval of enrofloxacin use in poultry in 2005. Meanwhile, during the run-up to this decision from 2000 to 2005, consumer advocacy groups scored major successes in persuading large-scale food producers and retailers to reduce or eliminate use of antibiotics in chickens. Advocates for bans on animal antibiotics, from the Centers for Disease Control and Prevention (CDC), APUA, and elsewhere, many of whom had testified for FDA, quickly declared the enrofloxacin ban a “public health success story” in both the press and in scientific journals (Nelson et al. 2007).

After more than a decade, the causal aspects of this case are easier to see clearly. By 2007, some of the researchers who had most strongly advocated for the ban were beginning to observe that the original FDA assumption—that withdrawing enrofloxacin would reduce fluoroquinolone-resistant Campylobacter proportionally—now appeared to be mistaken (see, e.g., Price et al. 2007). The resistant strains persisted as the industry had warned (see Price et al. 2007). By 2016, it was clear that the dramatic improvements in food safety and reductions in campylobacteriosis risk in the population that had been taking place prior to the voluntary cessations of antibiotic use in farm animals and the enrofloxacin ban, including a nearly 50 percent reduction in risk between 1996 and 2004, had stopped and reversed course, as shown in Figure 1 (see Powell 2016). Advocates who had been vocal between 2000 and 2005 in explaining to Congress and the public why they thought that banning enrofloxacin would protect public health moved on to advocate banning other antibiotics. No post mortem or explanation has yet been offered for the data in Figure 1.

Figure 1.
Figure 1.

Reductions in Campylobacteriosis Risk Stopped and Reversed around 2005 (Powell 2016)

Yet, understanding why the enrofloxacin ban failed to produce the benefits that had been so confidently predicted for it (or, if benefits did occur, why they are not more apparent) might produce valuable lessons that would help future efforts to protect public health more effectively. Such lessons remain unlearned when the process for passing new regulatory actions relies on unproven causal assumptions for purposes of advocacy and calculation of hypothetical benefits of regulation—essentially, making a prospective case for regulation—with no need to revisit assumptions and results after the fact to assess how accurate they were or why they failed, if they prove inaccurate. In this example, it appears that the FDA’s causal assumption that risk each year is proportional to exposure (see Bartholomew et al. 2005) was simply mistaken (see Price et al. 2007). But there is no formal regulatory process at present for learning from such mistakes, for correcting them, or for preventing them from being made again in future calls for further bans.

2.6. Lessons from the Examples

The foregoing examples illustrate that regulators and administrative law judges sometimes deal with uncertainties about the benefits caused by a regulation by making large, unproven, simplifying assumptions. This may be done with the best of intentions. Uncertainty invites Rorschach-like projection of beliefs and assumptions based on the rich panoply of System 1 (“Gut”) thinking, genuinely felt concerns about the currently perceived situation, and hopes to be able to improve it by taking actions that seem sensible and right to System 1. Such projection often feels like, and is described as, careful and responsible reflection and deliberation followed by formation of considered expert judgments based on careful weighing of the totality of the evidence. The resulting judgments are typically felt to be valuable guides to action under uncertainty, not only by those who provide them, but also by those who receive them (Tetlock and Gardner 2015). Beliefs suggested by System 1 (“Gut”) in the absence of adequate data or opportunity for System 2 (“Head”) analysis are often confidently held, easy to reinforce with confirming evidence, and difficult to dislodge with disconfirming evidence—but they are also often objectively poor guides to what will actually happen (Tetlock and Gardner 2015; Gardner 2009).

For air pollution and food safety alike, one such large assumption about causation is that risk of adverse health effects decreases in proportion to reductions in exposure to a regulated substance or activity. This is easy to understand and appeals to intuition. It leads to readily calculated predictions based on aggregate data. Simply divide estimated cases of adverse health outcomes per year by estimated average annual exposure, assuming exposure is the sole cause of the adverse health effects, as in the FDA example. Otherwise, regress adverse health outcomes against exposure, allowing for an intercept and other hypothesized contributing factors to explain any cases not attributed to exposure, as in air pollution health effects modeling. Either way, there is no need for complex modeling of effects of different combinations of risk factors for different individuals, or of interactions and dependencies among the hypothesized explanatory variables. Such simplicity is often seen as a virtue (see Bartholomew et al. 2005) rather than something that omits the very details essential to correctly understanding and quantifying effects caused specifically by exposure, and not by other factors with which it is associated. Assuming that all-cause or cardiovascular disease mortality risks will decrease in direct proportion to reduction of ambient concentrations of PM2.5 in air, or that drug-resistant foodborne illness counts will decrease in proportion to reduction of antibiotic used on the farm, provides simple, plausible-seeming slope factors for calculating hypothetical benefits of further regulation without the difficulty of building and validating models of a more complex reality.

Historically, the resulting numbers have been sensational enough to garner prominent coverage in both scientific journals and popular media, where they are usually presented as if they were facts rather than assumptions (see, e.g., Cromar et al. 2016). Such coverage attracts the anxiety and resolution of activists to take action to reduce exposures and encourages funding from agencies and other stakeholders to support further, similar assumption-driven research on how large the benefits of regulation might be. This cycle, and associated phenomena, such as the social amplification of perceived risks as concern attracts more concern, are well documented in the social science and psychology of risk (see Gardner 2009). They are well served by simple assumptions and large risk numbers. By contrast, more complex and nuanced System 2 calculations suggesting that the quantitative difference in public health made by reducing exposures is at most vanishingly small (see, e.g., on the order of at most one extra case of compromised treatment of campylobacteriois per hundred million person-years, and plausibly zero [Hurd and Malladi 2008]) typically attract far less attention, and may be viewed with suspicion because they require more detailed data and calculations (Bartholomew et al. 2005).

The “risk reduction is proportional to exposure reduction” formulation of regulatory benefits encourages another System 1 habit that makes life seem simpler and more manageable (Kahneman 2011), narrowly focusing just on what one cares about and what one can do about it. For example, in banning enrofloxacin, the FDA focused exclusively on preventing cases of drug-resistant food poisoning by controlling what they could control—use of an animal antibiotic. The historical evidence from Europe that such control caused no detectable reductions in human illness risks was irrelevant for this focus, as FDA’s risk-is-proportional-to-exposure model assumes no other possibilities. That adequate cooking of meats prior to consumption is the only known control measure that demonstrably reduces illness risks was likewise irrelevant for an agency that does not oversee food preparation, and was excluded from the FDA risk assessment by considering only the ratio of drug-resistant illnesses to drug use on the farm. Similarly, estimates of human health benefits from reducing PM2.5 have seldom inquired about other effects, such as whether cleaner air promotes warmer temperatures, with consequent manmade climate change implications for economic and human health risks.

In summary, although a sound BCA approach unambiguously requires assessing a regulation’s total costs and benefits, regulatory agencies, like most of us, cope with uncertainty and complexity in the causal impacts of actions by adopting narrowly focused agendas, restricted jurisdictions, and greatly simplified causal models that focus on just a few things. These typically include some actions that we can take (never mind that other, less costly, actions might work much better); the consequences we want them to produce (never mind their unintended, unanticipated, or out-of-scope consequences); and at most a very few other factors (never mind the large, complex, and uncertain outside world that may make the consequences of our actions quite different from what was intended or predicted). It is much easier to understand and make predictions with these simplified models than to develop and validate more complex and realistic models (Kahneman 2011). Disregarding or downplaying most of the causal web in which our potential actions and desired consequences are embedded makes the effects of our own actions, and their potential benefits, loom larger in our calculations than they really are. This very human tendency to substitute simplified causal models for fuller and more realistic ones in the face of uncertainty and complexity (see Kahneman 2011; Thaler 2015) is inconsistent with the requirements of the BCA principle, but may be the best that can be done in the absence of institutions that enforce a higher standard.

3. Better Causal Inferences and Benefits Estimates via More Active Judicial Review

If regulations are sometimes advocated based on overly optimistic and simplistic causal assumptions and models of their effects and the benefits that they cause, what can and should be done about it—and by whom? How might causal inferences and benefits estimates used in regulatory proceedings be made more accurate and trustworthy? This section develops the following points.

Once the most relevant concept of causation as identifying actions that change the probabilities of preferred outcomes has been clearly defined, such improvements in predicting or assessing the benefits caused by regulations are technically possible based on experience in a variety of other areas.

It is (or should be) well within the competence and jurisdiction of courts to help bring about these improvements by exercising more stringent judicial review of the causal reasoning used to project benefits and advocate for regulations, especially if current forms of deference to regulatory agencies are replaced by a more active role, as urged by the recently proposed Separation of Powers Restoration Act amendment to the Administrative Procedures Act (see Walker 2016).

The organizational culture of many regulatory agencies makes it difficult for them to improve their own causal assumptions and benefits assessments without the compelling presence of an active judiciary engaged in questioning and challenging their reasoning. In part, this is because of a tendency to dismiss as nonscientific or non-expert the concerns of laypeople and other stakeholders that regulations will not produce their intended benefits (Wynne 1993). In part, it arises because regulators use frameworks that treat causality primarily as a matter for expert judgment rather than as a matter of empirically discoverable and verifiable fact.

To overcome these obstacles, it is both necessary and practical to inject more data-driven rather than judgment- and assumption-driven concepts and techniques for assessing causation into deliberations over regulations.

Advances in data science and analytics make it technically possible, and even easy, to test whether necessary conditions for causality, such as that a cause should help to predict its effects, hold in available data sets. They enable the shift from judgment-driven to data-driven analyses of causal impacts and benefits from regulations in many important cases where relevant data are available or readily obtainable, as in the examples of air pollution regulation and food safety regulation. But this shift is unlikely to take place within established regulatory cultures that emphasize the judgments of salaried experts and assumption-based modeling as tools for deciding how the world works (Wynne 1993).

By contrast, an adversarial setting, in which both those who support a proposed regulation and those who oppose it subject their competing causal analyses and resulting benefits estimates to critical review based on rigorous objective standards, provides incentives for production and use of relevant data and analyses. These incentives are lacking when only the regulator is charged with making a credible case for proposed regulations, and when opposing views are addressed only through responses to public comments (which, in practice, can usually be readily dismissed, for example, by citing the contrary views and expertise of those who side with the regulator). Excessive deference to regulatory science by administrative law courts damps the incentives for others to challenge, and perhaps improve, it. Conversely, more active judicial review can stimulate challenges to improve the factual basis for estimated regulatory benefits. Courts are already positioned as the cheapest providers of review and enforcers of rigorous reasoning about the benefits claimed to be caused by proposed regulations. Finding regulations to be arbitrary and capricious when the evidence provides no good reason to expect they will cause the benefits claimed might create incentives to improve the quality of causal inference in regulatory science and reduce the passage of regulations that end up falling short of their projected benefits and perhaps even their costs.

3.1. Distinguishing among Different Types of Causation

In discussing the health and economic benefits caused by regulations, policymakers, regulators, courts, scientists, media, and the public refer to several distinct types of causation, often without clearly distinguishing among them (see Dawid 2008). Each of the following concepts of causality is widely used in discussing the causal implications of associations found in observational data. Each has its own large, specialized technical literature, but they are often conflated.

Associational and attributive causation. This is the concept of causation most commonly used in epidemiology and in regulatory risk assessments and benefits estimates for health and safety regulations. It addresses how much of an observed statistical association between an exposure and an adverse outcome will be attributed to the exposure, and how much will be attributed to other factors. This is often interpreted as showing how much of the causation (or blame or liability in legal applications) for an adverse outcome is attributable to exposure, and how much to each of the other causes or factors that produced it. In epidemiology, etiological fractions, population attributable fractions, population attributable risks, burdens of disease, and probabilities of causation are all examples of attributive causal concepts (see Tian and Pearl 2000). As commonly used and taught, all are derived from relative risk—that is, the ratio of risks in exposed and unexposed populations, or among more- and less-exposed individuals. Hence, they are all based solely on statistical associations.

Predictive causation. In statistics, economics, physics, and neuroscience, among other fields, it is common to define one variable as being a cause of another if and only if the first helps to predict the second (see, e.g., Friston, Moran, and Seth 2013; Furqan and Siyal 2016). For example, if exposure helps to predict an adverse health response, then exposure is considered a (predictive) cause of the response. As an important special case, Granger causality between an exposure time series and a response time series (Kleinberg and Hripcsak 2011) is based on the principle that causes help to predict their effects. (Technically, X is a Granger-cause of Y if the future of Y is dependent on—or, more formally, is not conditionally independent of—the history of X, given the history of Y.) Thus, nicotine-stained fingers can be a Granger cause of lung cancer, helping to predict it, even if cleaning one’s fingers would have no effect on future lung cancer risk (manipulative causality) (see Woodward 2008).

Counterfactual causation (Höfler 2005; Wang et al. 2016) attributes the difference between observed outcomes and predicted outcomes that would have occurred under different conditions, such as if exposure had not been present, to the differences between the real and alternative (“counterfactual”) conditions. This difference in conditions is said to cause the difference in outcomes, in counterfactual causal models.

Structural causation and exogeneity. In constructing a simulation model of a dynamic system, the values of some variables are calculated from the values of others. As the simulation advances, input values may change, and then the values of variables that depend on them may change, and so forth, until exogenous changes in the inputs have propagated through the system, perhaps leading to new steady-state values for the output variables until a further exogenous change in inputs leads to further changes in the values of other variables. The order in which variable values are calculated and updated reflects a concept of causality in which the values of some variables are determined by the values of others that cause them. This computational view of causality considers that the values of effects (or their conditional probabilities, in stochastic models) can be determined from the values of their causes via equations or formulas, representing causal mechanisms, with exogenous changes entering from outside the modeled system propagating through the modeled mechanisms in such a way that values of causes are determined prior to the values of effects that depend on them. It has been formalized in seminal work by Simon (1953) and subsequent work by many others, mainly in economics and econometrics, artificial intelligence, and time series analysis (e.g., Iwasaki 1988; Hendry 2004; Voortman, Dash, and Druzdzel 2010; Hoover 2012).

Manipulative causation is the concept of causality in which changing (“manipulating”) the values of controllable inputs to a system changes the values of outputs of interest (Woodward 2008; Voortman, Dash, and Druzdzel 2010; Hoover 2012). In detailed dynamic models, the changes in inputs might propagate through a system of algebraic and differential equations describing a system to determine the time courses of changes in other variables, including outputs. If such a detailed dynamic model is unavailable, the relation between changes in values of controllable inputs and changes on the values of variables that depend on them may instead be described by more abstract models such as functions (“structural equations”) relating their equilibrium values, or by Bayesian Networks specifying conditional probability distributions of outputs conditioned on values of inputs. Manipulative causation is the type of causality of greatest interest to decision-makers and policymakers seeking to make preferred outcomes more likely by changing the values of variables that they can control.

These different concepts of causality are interrelated, but not equivalent. For example, attributive causality does not imply counterfactual, predictive, or manipulative causality and is not implied by them. There is no guarantee that removing a specific exposure source would have any effect on the risks that are attributed to it, nor is there any requirement than no more than 100 percent of a risk be attributed to the various factors that are said to cause it. For example, in the diagram X1 ← X0 → X2 → X3 → X4 → … → Xn, if exogenously changing the value of a variable at the tail of an arrow from 0 to 1 causes the value of any variable into which it points to change from 0 to 1, and if the value of X0 is changed from 0 (interpreted as “unexposed”) to 1 (interpreted as “exposed”), then not only would these measures attribute 100 percent of the blame for X1 becoming 1 to this change in X0, but also they would attribute the same 100 percent of the blame to changes in each of X2, X3, … and Xn, even though those are simply other effects of the change in X0. Relative risks are the same for all of these variables, and so attributive risk measures derived from relative risk assign the same blame to all (in this case, a “probability of causation” of 1).

Tort law, by contrast, uses an attributive concept of “but-for” causation that attributes harm to a cause if and only if the harm would not have occurred in its absence—that is, “but for” the occurrence of the cause. This concept would single out the change in X0 as the only but-for cause of the change in X1. On the other hand, X0, X2, X3, X4 … would all be causes of Xn by this criterion. Thus, but-for causation can lead to promiscuous attribution of harm to remote causes in a chain or network, for example, by attributing responsibility for a smoker’s lung cancer not only to the practice of smoking, but also to the retailer who sold the cigarettes, the manufacturer of the cigarettes, the grower of the tobacco, the media that advertised the brand smoked, the friends or family or culture that encouraged smoking, the schools that failed to intervene, genes that predisposed the smoker to addiction, and so on. All can be considered but-for causes of smoking. Tort law also provides standards such as more-likely-than not and joint-and-several liability for cases where causation is uncertain or is distributed among multiple causes.

Predictive causality does not necessarily imply manipulative causality unless other conditions hold, such as that no omitted confounders are present. This is illustrated by the example of nicotine-stained fingers being a predictive but not a manipulative cause of lung cancer, where smoking is the omitted confounder. Often in public health and safety regulations, it is not known whether these other conditions hold, and hence it is not clear whether predictive causality implies manipulative causality. On the other hand, predictive causation can very often be established or refuted (at a stated level of statistical confidence) based on data by applying statistical tests to determine whether predictions of outcomes are significantly improved by conditioning on information about their hypothesized causes. Such tests examine what did happen to the effects when the hypothesized causes had different values, rather than requiring speculations about what would happen to effects under different conditions, as in counterfactual causal modeling. Therefore, even though predictive causality does not necessarily imply manipulative causality, it provides a useful data-driven screen for potential manipulative causation, insofar as manipulative causation usually implies predictive causation (since changes in inputs help to predict the changes in outputs that they cause).

For counterfactual causation, what the outcomes would have been under different, counterfactual conditions is never observed. Therefore, the estimated difference in outcomes caused by differences between real and counterfactual conditions must be calculated using predictive models or assumptions about what would have happened. These predictions may not be accurate. In practice, they are usually simply assumed, but are difficult or impossible to validate. Counterfactual models of causation are also inherently ambiguous in that the outcomes that would have occurred had exposure been absent usually depend on why exposure would have been absent, which is seldom specified. For example, nicotine-stained fingers would be a counterfactual cause of lung cancer if clean fingers imply no smoking but not if they are clean only because smokers wear gloves when smoking. In the case of air pollution—especially if exposure and income interact in affecting mortality rates—assuming that the counterfactual condition without exposure occurs because everyone becomes wealthy enough to move to unpolluted areas might yield quite different estimates of counterfactual mortality rates than assuming that lack of exposure was caused by the onset of such abject poverty and economic depression that pollution sources no longer operate. Counterfactual models usually finesse any careful exposition of specific assumptions about why counterfactual exposures occur by using statistical models to predict what would have happened if exposures had been different. But these models are silent about why exposures would have been different, and hence the validity of their predictions is unknown. (Economists have noted a similar limitation of macroeconomic models derived from historical data to predict the effects caused by future interventions that change the underlying data-generating process. This is known as the Lucas critique of causal predictions in macroeconomics policy models [see Ljungqvist 2008].)

Although manipulative causality usually implies predictive causality, neither one necessarily implies attributive causality. For example, if consuming aspirin every day reduces risk of heart attack in an elderly population, but only people with high risks take daily aspirin, then there might be both a positive association (and hence a positive etiologic fraction, probability of causation, and population attributable risk) between aspirin consumption and heart attack risk in the population but a negative manipulative causal relationship between them, with aspirin consumption reducing risk. Even if aspirin had no effect on risk, it could still be positively associated with risk if people at high risk were more likely to consume it. Thus, manipulative and associational-attributive causation do not necessarily have any implications for each other.

The following example illustrates some of these important distinctions among causal concepts more quantitatively.

3.2. Example: Associations Do Not Necessarily Provide Valid Manipulative Causal Predictions

Suppose that in a certain city, daily mortality rate, R, and average daily exposure concentration of an air pollutant, C, over an observation period of several years are perfectly described by the following Model 1:

(Model 1)R=C+50
That is, each day, the number of deaths is equal to 50 deaths plus the average daily concentration of the air pollutant. What valid inferences, if any, do these observations enable about how changing C would change R? The answer is, none; historical associations do not logically imply anything about predictive, counterfactual, computational, or manipulative causation. One reason is that Model 1 implies that the same data are also described perfectly by the following Model 2, where T is an unmeasured third variable (such as temperature) with values between 0 and 100:
(Model 2)R=150CT
(The first equation implies that T = 100 – 2C, and substituting this into the second equation to eliminate T yields Model 1.) If the equations in Model 2 are structural equations with the explicit causal interpretation that exogenously changing the value of a variable on the right side of an equation will cause the value of the dependent variable on its left side to change to restore equality, then the second equation reveals that each unit of reduction in C would increase R by one unit. In this case, if Model 1 is only a reduced-form model describing historical associations, then misinterpreting it as a causal model would mistakenly imply that increasing C would increase R. The associational Model 1 is not incorrect as a description of past data. It would be valid for predicting how many deaths would occur on days with different exposure concentrations in the absence of interventions. But only the causal Model 2 can predict how changing C would change R, and there is no way to deduce Model 2 by scrutiny of Model 1.

This review of different concepts of causation has highlighted the following two key conclusions: First, policymakers, regulators, courts, and the general public are primarily interested in manipulative causation, that is, in how regulations or other actions that they might take would affect probabilities of outcomes, and hence the benefits caused by their actions. But second, regulatory science and claims about the causal impacts of regulations usually address only associational-attributive causation, and occasionally other non-manipulative (especially, counterfactual) causal concepts. Judicial review of the causal reasoning and evidence supporting estimates of regulatory benefits can and should close this gap between causal concepts by insisting that arguments and evidence presented must address manipulative causation, and that other forms of causation must not be conflated with it. There is an urgent need to enforce such clarity because current practices in epidemiology, public health, and regulatory science routinely confuse associational-attributive causation with manipulative causation.

3.3. Example: Association Is Currently Routinely Confused with Manipulative Causation

Here, examples follow of how associational-attributive causation and manipulative causation are conflated in the literature on health effects attributed to air pollution, with examples of associational and causal language in italics, all emphases added, and with brief comments in parentheses noting where stated manipulative causal interpretations do not follow from the associational-attributive findings presented (based on Cox 2017):

Dockery et al. (1993): “We observed statistically significant and robust associations between air pollution and mortality … these results suggest that fine-particulate air pollution, or a more complex pollution mixture associated with fine particulate matter, contributes to excess mortality in certain U.S. cities.” (Associations do not suggest a contribution to excess mortality, or that reducing exposure would reduce excess mortality, unless they are manipulative-causal.)

Schwartz, Laden, and Zanobetti (2002): “The magnitude of the association suggests that controlling fine particle pollution would result in thousands of fewer early deaths per year.” (Associations do not allow prediction of results from changes in exposure concentrations unless they coincide with manipulative causal relations.)

Franklin, Zeka, and Schwartz (2007): “We examined the association between PM(2.5) and both all-cause and specific-cause mortality … . Our findings … suggest that PM(2.5) may pose a public health risk even at or below current ambient levels.” (An association does not suggest that exposure poses a public health risk unless manipulative causation can be shown.)

Hart, Garshick, Dockery et al. (2011): “Residential ambient air pollution exposures were associated with mortality. … [O]ur study is the first to assess the effects of multiple air pollutants on mortality with fine control for occupation within workers from a single industry.” (Associations are not health effects.)

Lepeule et al. (2012): “Each increase in PM2.5 (10 µg/m3) was associated with an adjusted increased risk of all-cause mortality (PM2.5 average on previous year) of 14%… These results suggest that further public policy efforts that reduce fine particulate matter air pollution are likely to have continuing public health benefits.” (Associations do not suggest that public policy efforts that reduce exposure will thereby create public health benefits unless manipulative causation can be demonstrated.)

Fann et al. (2012): “Ground-level ozone (O3) and fine particulate matter (PM2.5) are associated with increased risk of mortality. We quantify the burden of modeled 2005 concentrations of O3 and PM2.5 on health in the United States. … Among populations aged 65–99, we estimate nearly 1.1 million life years lost from PM2.5 exposure. … Among the 10 most populous counties, the percentage of deaths attributable to PM2.5 and ozone ranges from 3.5% in San Jose to 10% in Los Angeles. These results show that despite significant improvements in air quality in recent decades, recent levels of PM2.5 and ozone still pose a nontrivial risk to public health.” (Associations and attributable risks calculated from associations do not quantify burden of disease or life-years lost because of exposure or indicate a risk unless manipulative causation is present.)

Apte et al. (2015): “Ambient fine particulate matter (PM2.5) has a large and well-documented global burden of disease. Our analysis uses high-resolution (10 km, global-coverage) concentration data and cause-specific integrated exposure-response (IER) functions developed for the Global Burden of Disease 2010 to assess how regional and global improvements in ambient air quality could reduce attributable mortality from PM2.5. Overall, an aggressive global program of PM2.5 mitigation in line with WHO interim guidelines could avoid 750 000 (23%) of the 3.2 million deaths per year currently (ca. 2010) attributable to ambientPM2.5.” (The Global Burden of Disease IER functions are based on relative risk measures of association. Such associations do not allow prediction or assessment of “how … improvements on ambient air quality could reduce attributable mortality” or avoid deaths unless the underlying relative risks coincide with causal relations.)

Giannadaki, Lelieveld, and Pozzer (2016): “We use a high-resolution global atmospheric chemistry model combined with epidemiological concentration response functions to investigate premature mortality attributable to PM2.5 in adults ≥30 years and children <5 years. …Based on sensitivity analysis, applying worldwide the EU annual mean standard of 25 μg/m3 for PM2.5 could reduce global premature mortality due to PM2.5 exposure by 17%…Our results reflect the need to adopt stricter limits for annual mean PM2.5 levels globally, like the US standard of 12 μg/m(3) or an even lower limit to substantially reduce premature mortality in most of the world.” (Epidemiological exposure concentration-response associations and estimates of PM2.5-attributable mortalities based on them do not imply that reducing PM2.5 would reduce mortality, or allow such reductions to be predicted, unless the associations coincide with manipulative causal relations.)

Lo et al. (2016): “Relative risks were derived from a previously developed exposure-response model. … Nationally, the population attributable mortality fraction of PM2.5 for the four disease causes was 18.6% (95% CI, 16.9-20.3%). … Aggressive and multisectorial intervention strategies are urgently needed to bring down the impact of air pollution on environment and health.” (Relative risks and population attributable mortality fractions are measures of exposure-response associations. Such associations do not imply that interventions to reduce exposures would reduce risks of adverse responses unless there is a manipulative causal relation between them.)

These and many other papers move freely between associational and manipulative causal interpretations of exposure-response associations without showing that the presented associations describe (manipulative) causation. As a consequence, regulatory benefits assessments, and calls for further regulation based on these and similar analyses, do not reveal what consequences, if any, further regulations should actually be expected to cause. In this sense, they are arbitrary and capricious, as they provide no rational basis for identifying the likely consequences of the recommended regulations.

3.4. Can Regulatory Benefits Estimation Be Improved, and, If So, How?

Can more active judicial review truly improve the accuracy of causal inferences and benefits predictions used in deciding which proposed regulatory changes to make and in evaluating their performance? To what extent are improvements constrained by hard limits on what can be reliably predicted and learned from realistically incomplete and imperfect data? The following distinct lines of evidence from very different areas suggest that substantial improvements are indeed possible in practice, but that they are best accomplished with the help of strong external critical review of the evidence and reasoning relied on by regulatory agencies and advocates.

The first line of evidence comes from sociological and organizational design studies. These suggest that the organizational culture and incentives of regulatory agencies usually put weight on authoritative, prospective estimates of benefits, with supporting causal assumptions that reflect the entrenched views of the organization that regulation produces desirable results and that the beliefs of the regulators are scientific and trustworthy (Wynne 1993). However, organizational cultures that foster demonstrably high performance in managing risks and uncertainties function quite differently. They typically acknowledge ignorance and uncertainty about how well current policies and actions are working. They focus on learning quickly and effectively from experience, frequently revisiting past decisions and assumptions, and actively questioning and correcting entrenched assumptions and beliefs of current policies as new data are collected (Dekker and Woods 2009; Weick and Sutcliffe 2001). For example, difficult and complex operations under uncertainty, such as managing air traffic coming and going from nuclear aircraft carriers, operating nuclear power plants or offshore oil platforms safely for long periods under constantly changing conditions, fighting wildfires, landing airplanes successfully under unexpected conditions, or performing complex surgery, are all carried out successfully in hundreds of locations worldwide every day. The disciplines and habits of mind practiced and taught in such high reliability organizations (HROs) have proved useful in helping individuals and organizations plan, act, and adjust more effectively under uncertainty; regulatory agencies dealing with uncertain health and safety risks can profit from these lessons (Dekker and Woods 2009).

Five commonly listed characteristics of HROs are as follows: sensitivity to operations—to what is working and what is not, with a steady focus on empirical data and without making assumptions (Gamble 2013); reluctance to oversimplify explanations for problems, specifically including resisting simplistic interpretations of data and assumptions about causality; preoccupation with failure, meaning constantly focusing on how current plans, assumptions, and practices might fail, rather than on building a case for why they might succeed; deference to expertise rather than to seniority or authority; and commitment to resilience, including willingness to quickly identify and acknowledge when current efforts are not working as expected and to improvise as needed to improve results (see Weick and Sutcliffe 2001). Of course, regulatory processes that unfold over years, largely in the public sphere, are a very different setting from operations performed by specially trained teams. But it is plausible that many of the same lessons apply to regulatory organizations seeking to improve outcomes in a changing and uncertain environment (Dekker and Woods 2009).

A second line of evidence that the improvements in predicting the effects of regulations can be achieved in practice comes from research on improving judgment and prediction, recently summarized in the popular book Superforecasting (Tetlock and Gardner 2015). Although most predictions are overconfident and inaccurate, a small minority of individuals display consistent, exceptional performance in forecasting the probabilities of a wide variety of events, from wars to election outcomes to financial upheavals to scientific discoveries (Tetlock and Gardner 2015). These “superforecasters” apply teachable and learnable skills and habits that explain their high performance (Tetlock and Gardner 2015). They remain open-minded, always regarding their current beliefs as hypotheses to be tested and improved by new information. They are eager to update their current judgments frequently and precisely, actively seeking and conditioning on new data and widely disparate sources of data and evidence that might disprove or correct their current estimates. They make fine-grained distinctions in their probability judgments, often adjusting by only one or a few percentage points in light of new evidence, which is a level of precision that most people cannot bring to their probability judgments (Tetlock and Gardner 2015). The authors offer the following rough recipe for improving probability forecasts: (1) “Unpack” the question to which the forecast provides an answer, such as about the health benefits that a regulation will end up causing, into its components, such as who will receive what kinds of health benefits and under what conditions. (2) Distinguish between what is known and unknown and scrutinize all assumptions. For example, do not assume that reducing exposure will cause proportional reductions in adverse health effects unless manipulative causation has actually been shown. (3) Consider other, similar cases and the statistics of their outcomes (taking what the authors call “the outside view”), and then (4) consider what is special or unique about this specific case in contradistinction to others (the “inside view”). (5) Exploit what can be learned from the views of others, especially those with contrasting informed predictions, as well as from prediction markets and the wisdom of crowds. (6) Synthesize all of these different views into one (the multifaceted “dragonfly view,” in the authors’ term), and (7) express a final judgment, conditioned on all this information, as precisely as possible using a fine-grained scale of probabilities (Tetlock and Gardner 2015). Skill in making better predictions using this guidance can be built through informed practice and clear, prompt feedback, provided that there is a deliberate focus on tracking results and learning from mistakes (see Tetlock and Gardner 2015).

A third line of evidence showing that it is possible to learn to intervene effectively, even in uncertain and changing environments, comes from machine learning. Machine learning—specifically the design and performance of reinforcement-learning algorithms which automatically learn decision rules from experience and improve them over time—makes preferred outcomes more likely. A very successful class of algorithms, called “actor-critic” methods (see Konda and Tsitsiklis 2003; Lei 2016; Ghavamzadeh, Engel, and Valko 2016), consists of a policy or an “actor,” who decides what actions to take next given currently available information, and one or more reviewers or “critics,” who evaluate the empirical performance of the current policy and suggest changes based on the difference between predicted and observed outcomes. These algorithms have proved successful in learning optimal (netbenefit maximizing) or near-optimal policies quickly in a variety of settings with probabilistic relations between actions and their consequences and with systems that behave in uncertain ways, so that is necessary to adaptively learn how best to achieve desired results.

High-reliability organizations, superforecasters, and successful machine learning algorithms for controlling uncertain systems all apply the following common principles:

(a) Recognize that even the best current beliefs and models for predicting outcomes and for deciding what to do to maximize netbenefits will often be mistaken or obsolete. They should be constantly checked, improved, and updated based on empirical data and on gaps between predicted and observed results.

(b) Relying on any single model or set of assumptions for forecasting and decision-making is less effective than considering the implications of many plausible alternatives.

(c) Seek and use potential disconfirming data and evidence from many diverse sources to improve current beliefs, predictions, and control policies.

(d) Use informed external critics to improve performance by vigilant review; frequent challenges to current assumptions, predictions and policies; and informed suggestions for changes based on data.

Applying these principles to regulatory agencies suggests that a mindset that seeks to identify and defend a single “best” model, set of assumptions, or consensus judgment about the effects caused by proposed regulations will be less likely to maximize uncertain net social benefits than treating effects as uncertain quantities to be learned and improved via experience and active learning from data. A judgment-driven culture, in which selected experts form and defend judgments about causation and estimated regulatory benefits, is less beneficial than a data-driven culture, in which the actual effects of regulations are regarded as uncertain and possibly changing quantities to be learned about and improved by intelligent trial and error and learning from data. A data-driven regulatory culture expects to benefit (1) from independent external challenges and reviews of reasoning and assumptions before regulatory changes are approved and (2) from frequent updates of effects estimates based on data collected after they are implemented. Strong judicial review can provide the first part, external reviews of reasoning, by maintaining a high standard for causal reasoning based on data and manipulative causation.

Medicine, public health, and regulatory science have a long tradition of working against the establishment of a data-driven culture by treating causation as a matter of informed judgment that can only be rendered by properly prepared experts, rather than as a matter of empirically discoverable and independently verifiable fact that can be determined from data. The difficulties and skepticism that have faced proponents of evidence-based medicine and evidence-based policies, emanating from traditions that emphasize the special authority of trained experts (Tetlock and Gardner 2015), suggest the barriers that must be overcome to shift more toward data-driven regulatory cultures.

The following sections discuss the contrasting technical methods used by proponents of the causation-as-judgment and causation-as-fact views, and then suggest that a modern synthesis of these methods provides practical principles for defining and using informative evidence of manipulative causation in administrative law to achieve better results from regulations.

3.5. Causation As Judgment: The Hill Considerations for Causality and Some Alternatives

The most influential framework for guiding consideration and judgments about causality is from Sir Austin Bradford Hill, who in 1965 proposed nine aspects of exposure-response associations that “we especially consider before deciding that the most likely interpretation of it is causation” (quoted in Lucas and McMichael 2005). This original formulation reflects a view in which causation is dichotomous: an association is either causal or not. Modern statistics and machine learning approaches to causal inference take a more nuanced view in which the total association between two quantities can be explained by a mix of factors and pathways, including some causal impacts and some confounding, sample selection and model selection biases, coincident historical trends, omitted variables, omitted errors in explanatory variables, model specification errors, overfitting bias, p-hacking, and so forth.

The expressed goal of the Hill’s considerations is to help someone make a qualitative judgment, “deciding that the most likely interpretation of [an exposure-response association] is causation,” rather than quantifying how likely this interpretation is and determining what the probability is that the association is not causal after all, even if causation is decided to be the most likely interpretation (quoted in Lucas and McMichael 2005). Thus, Hill’s considerations were never intended to provide the quantitative information that is essential for BCA evaluations of uncertain regulatory benefits. Consistent with the culture of many medical and public health organizations over a long history (see Tetlock and Gardner 2015), Hill’s criteria instead portray causality as a matter for informed subjective qualitative judgment by expert beholders, not as a fact to be inferred (or challenged) by rigorous, objective, and independently reproducible analysis of data.

Hill’s criteria themselves—briefly referred to as strength, consistency, specificity, temporality, biological gradient, plausible mechanism, coherence, experimental support (if possible), and analogy for exposure-response associations—are discussed in more detail later in the context of showing how they can be updated and improved using ideas from current data science. Hill himself acknowledged that they are neither necessary nor sufficient for establishing causation, but held that no algorithmic process for mechanically inferring causation from data is possible and suggested that admittedly fallible subjective judgments based on these considerations may be the best that we can hope for. This line of thinking continues to dominate many regulatory approaches to causal inference. For example, the EPA has formulated and adopted modified versions of the Hill considerations as principles for making weight-of-evidence determinations about causation for ecological, carcinogen, and other risks (see EPA 2005). Neither the original Hill considerations nor more recent weight-of-evidence frameworks based on them distinguish between associational-attributive, predictive, manipulative, and other types of causation. Thus, the enormous influence of these considerations has tended to promote judgment-based cultures for making and defending causal assertions while conflating different concepts of causation, without providing a sharp focus on objective evidence and quantification of the manipulative causal relationships needed for rational choice among alternatives based on BCA calculations.

Of course, methodologists have not been blind to the difficulties with associational and attributive methods. The fact that the sizes and signs of associations are often model-dependent and that different investigators can often reach opposite conclusions starting from the same data by making different modeling choices has long been noted by critics of regulatory risk assessments, finally leading some commentators to conclude that associational methods are unreliable in general (e.g., Dominici, Greenstone, and Sunstein 2014). These criticisms have been recognized, and there has been intense effort over the past decade to develop and apply more formal methods of causal analysis within the judgment-oriented tradition. This has resulted in a small but growing body of literature that replaces the relatively crude assumption that associations can simply be judged as causal by appropriately qualified and selected experts with more sophisticated assumptions that imply that associations are causal without directly assuming it. Human judgment still plays a crucial role, however, insofar as the key assumptions are usually unverifiable based on data, and are left to expert judgments to accept. The most important of these assumption-driven causal inference frameworks and their underlying assumptions, thus far, are as follows (Cox 2017).

Intervention studies assume that if health risks change following an intervention, then the change is (probably) caused by the intervention. This assumption is often mistaken, as in the Irish coal-burning ban studies (Dockery et al. 2013): both exposures and responses may be lower after an intervention than before it simply because both are declining over time, even if neither causes the other. Construing such coincidental historical trends as evidence of causation is a form of the post hoc ergo propter hoc logical fallacy.

Instrumental variable (IV) studies assume that a variable (called an “instrument”) is unaffected by unmeasured confounders and that it directly affects exposure but not response (Schwartz et al. 2015). The validity of these assumptions is usually impossible to prove, and the results of the IV modeling can be greatly altered by how the modeler chooses to treat lagged values of variables (O’Malley 2012).

Counterfactual “difference-in-differences” and potential outcomes models assume that differences between observed responses to observed exposure concentrations and unobserved model-predicted responses to different hypothetical “counterfactual” exposure concentrations are caused by the differences between the observed and counterfactual exposures (e.g., Wang et al. 2016). However, these differences might instead be caused by errors in the model or by systematic differences in other factors such as distributions of income, location, and age between the more- and less-exposed individuals. The assumption that these are not the explanations is usually untested and left as a matter for expert judgment to decide.

Regression discontinuity (RD) studies assume that individuals receiving different exposures or treatments based on whether they are above or below a threshold in some variable (e.g., age, income, location), which triggers a government intervention, are otherwise exchangeable, so that one can assume the differences in outcomes for populations of individuals above and below the threshold are caused by differences in the intervention or treatment received. The validity of this assumption is often not known. In addition, as noted by Gelman and Zelizer (2015), RD models “can overfit, leading to causal inferences that are substantively implausible.” For an application to air pollution health effects estimation based on differences in coal burning in China, they conclude that a “claim [of a health impact], and its statistical significance, is highly dependent on a model choice that may have a data-analytic purpose, but which has no particular scientific basis” (Gelman and Zelizer 2015).

As already discussed, associational, attributable-risk, and burden-of-disease studies assume that if responses are greater among people with higher exposures, then this difference is caused by the difference in exposures, and could be removed by removing it (manipulative causation). Typically, this assumption is made without careful justification. It simply assumes that association reflects causation. Conditions such as Hill’s considerations of strong and consistent association are commonly misconstrued as evidence for manipulative causation in such studies (see, e.g., Fedak et al. 2015; Höfler 2005), without testing potential disconfirming alternative hypotheses such as that strong and consistent modeling assumptions, biases, confounders, effects of omitted variables, effects of omitted error terms for estimated values of predictors, model specification errors, model uncertainties, coincident historical trends, and regression to the mean, might account for them (Greenland 2005).

These methods all make assumptions that, if true, could justify treating associations as if they indicated manipulative causation. Whether they are true, however, is usually not tested based on data, but is left to expert judgment to decide. As succinctly noted by Gelman and Zelizer (2015) in presenting their own critique of regression discontinuity (RD) studies,

One way to see the appeal of RD is to consider the threats to validity that arise with five other methods used for causal inference in observational studies: simple regression, matching, selection modeling, difference in differences, and instrumental variables. These competitors to RD all have serious limitations: regression with many predictors becomes model dependent … ; matching, like linear or nonlinear regression adjustment, leans on the assumption that treatment assignment is ignorable conditional on the variables used to match; selection modeling is sensitive to untestable distributional assumptions; difference in differences requires an additive model that is not generally plausible; and instrumental variables, of course, only work when there happens to be a good instrument related to the causal question of interest.

Something better than unverified assumption-driven methods is needed.

3.6. Causation As Discoverable Empirical Fact: Causal Inference Algorithms and Competitions

At the opposite pole from Hill’s contention that determination of causation cannot be reduced to an algorithm, there is a rich body of literature and computational approaches to causal inference that seeks to do exactly that by providing algorithms to automatically draw reliable causal inferences from observational data (see, e.g., Aliferis et al. 2010; Kleinberg and Hripcsak 2011; Hoover 2012; Rottman and Hastie 2014; Bontempi and Flauder 2015). The best-known modern exponent of causal inference algorithms may be the computer scientist Judea Pearl (see Pearl 2009; 2010). This analytic tradition, however, extends back to work by economists and social statisticians from the 1950s (see, e.g., Simon 1953) and to work by biologists, geneticists, and psychologists since the invention of path analysis by Sewell Wright a century ago (see Joffe et al. 2012). Most causal inference algorithms use statistical tests to determine which variables help predict effects of interest, even after conditioning on the values of other variables (Pearl 2010). Thus, they mainly detect predictive causation, although some also explicitly address implications for causal mechanisms, computational causation, and manipulative causation (see, e.g., Iwasaki 1988; Voortman, Dash, and Druzdzel 2010). Their emphasis on predictive causation allows causal inference algorithms to benefit from well-developed principles and methods for predictive analytics and machine learning (ML).

Key technical ideas of causal inference algorithms can be used more generally to guide human reasoning about causal inference. An idea used in many causal inference algorithms is that in a chain such as XYZ, where arrows denote manipulative or predictive causation (so that changes in the variable at the tail of an arrow change or help to predict changes in the variable that it points into, respectively), each variable should have a statistical dependency on any variable that points into it, but Z should be conditionally independent of X given the value of Y, since Z depends on X only through the effect of X on Y. Algorithms that test for conditional independence and that quantify conditional probability dependencies among variables are now mature (Frey et al. 2003; Aliferis et al. 2010) and are readily available to interested practitioners via free Python and R packages for ML, such as the bnlearn2 package in R, which learns probabilistic dependencies and independence relations (represented via Bayesian network (BN) structures and conditional probability tables) from data. A second, related idea is that in the chain XYZ, Y should provide at least as much information as X for predicting Z. A third idea, introduced in path analysis for linear relationships among variables and generalized in BNs to arbitrary probabilistic dependencies, is that the effect of changes in X on changes in Z should be a composition of the effect of changes in X on Y and the effect of changes in Y on Z. Such ideas provide constraints and scoring criteria for identifying causal models that are consistent with data.

Modern causal inference algorithms offer dozens of constructive alternatives for assessing predictive causal relations in observational data without relying on human judgment or unverified modeling assumptions. The field is mature enough that, for over a decade, different causal inference algorithms have been applied to suites of problems, for which the underlying data-generating processes are known, to see how accurately the algorithms can recover correct descriptions of the underlying causal models from observed data. Competitions3 are now held fairly regularly that quantify and compare the empirical performance of submitted causal inference algorithms on suites of test problems (see, e.g., Hill 2016). Results of recent causal inference competitions suggest the following principles for causal inference from observational data as common components of many of the top-performing algorithms.

Information principle. Causes provide information that helps predict their effects and that cannot be obtained from other variables. This principle creates a bridge between well-developed computational statistical and ML methods for identifying informative variables to improve prediction of dependent variables, such as health effects, and the needs of causal inference (Pearl 2009). To the extent that effects cannot be conditionally independent of their direct manipulative causes, such information-based algorithms provide a useful screen for potential manipulative causation, as well as for predictive causation.

Propagation of changes principle. Changes in causes help explain and predict changes in effects (Friston, Moran, and Seth 2013; Wu, Frye, and Zouridakis 2011). This applies the information principle to changes in variables over time. It can often be visualized in terms of changes propagating along links (representing statistical dependencies) in a BN or other network model.

Nonparametric analyses principle. Multivariate nonparametric methods, most commonly, classification and regression trees (CART) algorithms, can be used to identify and quantify information dependencies among variables without having to make any parametric modeling assumptions (see, e.g., Halliday et al. 2016). CART trees can also be used to test for conditional independence, with the dependent variable being conditionally independent of variables not in the tree given the variables that are in it, at least as far as the tree-growing algorithm can discover (see Frey et al. 2003; Aliferis et al. 2010).

Multiple models principle. Rather than relying on any single statistical model, the top-performing causal analytics algorithms typically fit hundreds of nonparametric models (e.g., CART trees), called model ensembles, to randomly generated subsets of the data (see Furqan and Siyal 2016). Averaging the resulting predictions of how the dependent variable depends on other variables over an ensemble of models usually yields better estimates with lower bias and error variance than any single predictive model. This is reminiscent of the principle in high-reliability organizations and among superforecasters of considering many theories, models, and points of view, rather than committing to a single best one. Computational statistics packages such as the randomForest4 package in R, automate construction, validation, and predictive analytics for such model ensembles and present results in simple graphical forms, especially partial dependence plots5 that show how a dependent variable is predicted to change as a single predictor is systematically varied while leaving all other variables with their empirical joint distribution of values. If this dependency represents manipulative causality, then the partial dependence plot indicates how the conditional expected value of an output such as mortality in a population is expected to change when a variable such as exposure is manipulated, given the empirical joint distribution of other measured predictors on which the output also depends. Otherwise, it quantifies a predictive relation.

High-performance causal inference algorithms for observational data usually combine several of these principles. Interestingly, none of them uses Hill’s considerations or associational-attributional methods such as probability of causation or attributable risk formulas from epidemiology. A counterfactual-potential outcomes causal modeling approach was entered in a recent competition (see Hill 2016) but performed relatively poorly, with roughly twenty times larger bias, twenty times larger mean square prediction error for estimated causal effects, and wider uncertainty intervals than tree-based algorithms incorporating the above principles. This presumably reflects the fact that the counterfactual approach depends on models of unknown validity. In short, causal inference and discovery algorithms that assume causal relationships are empirical facts, which can be discovered from data, have made great progress and have yielded encouraging performances in competitive evaluations (see Bontempi and Flauder 2015), but they do not use the methods usually relied on by regulatory agencies in making judgments about causation. Regulatory methods, including weight-of-evidence schemes for evaluating and combining causal evidence, were tried and evaluated as approaches to automated assessment of causality in expert systems research in the 1980s (see, e.g., Spiegelhalter 1986; Todd 1992), but they have been outcompeted by modern causal inference algorithms incorporating the above principles and are no longer used in practical applications. That they continue to play a dominant role in causal inference in many regulatory agencies invites the question of whether these agencies could also dramatically improve their performance in predicting and assessing causal effects of regulations by applying modern causal inference algorithms and principles instead.

3.7. Synthesis: Modernizing the Hill Considerations

The enduring influence and perceived value of the Hill considerations and of judgment-centric methods for causal inference in regulatory agencies shows that they fill an important need. Despite Hill’s disclaimers, this is largely the need to have a simple, intuitively plausible checklist to use in assessing evidence that reducing exposures will reduce risks of harm. At the same time, the successes of data-centric, algorithmic methods of causal inference and causal discovery in competitive evaluations suggests the desirability of a synthesis that combines the best elements of each. This section describes each of the Hill considerations, their strengths and limitations, and possibilities for improving on Hill’s original 1965 formulation using contemporary ideas.

Strength of association

Hill proposed as the first consideration that larger associations are more likely to be causal than smaller ones (see Lucas and McMichael 2005). One possible underlying intuition to support this is that causal laws always hold, so they should produce large associations, whereas conditions that generate spurious associations only hold sometimes, such as when confounders are present, and thus they tend to generate smaller associations. Whatever the rationale, objections to this consideration are that (1) the existence, direction, and size of an association are often model-dependent (see Dominici, Greenstone, and Sunstein 2014; Gelman and Zelizer 2015). Recall the example of Models 1 and 2 with R = C + 50 and R = 150 – CT, respectively. In Model 1, C is positively associated with R. But in Model 2, C is negatively associated with R. More generally, whether an association is large or small may reflect modeling choices rather than some invariant fact about the real world that does not depend on the modeler’s choices. (2) Associations are not measures of manipulative causation. (3) There is no reason in general to expect that a larger association is more likely to be causal than to expect that it indicates stronger confounding, larger modeling errors or biases, stronger coincident historical trends, or other non-causal explanations. On the other hand, there is a useful insight here that can be formulated more precisely and correctly in more modern terminology. In a causal network such as the chain WXYZ, where arrows signify predictive or manipulative causation or both, it must be the case that Y provides at least as much information about Z as X or W does, and typically more. (Technically, the information that one random variable provides about another, measured in bits, is quantified as the expected reduction in the entropy of the probability distribution of one variable achieved by conditioning on the value of the other.) Thus, if Y is a direct manipulative or predictive cause of a dependent variable Z, it will provide as much or more information about Z than indirect causes, such as X, or non-cause variables, such as W, which are further removed from it in the causal network. The same is not necessarily true for correlations: if Y = X2 and Z = Y1/2, then X and Z will be more strongly correlated than Y and Z, even though Z depends directly on Y and not on X. Thus, replacing association in Hill’s formulation with information yields a useful updated principle: the direct cause(s) of an effect provide more information about it than indirect causes and variables to which it is not causally related. Therefore, variables that provide more information about an effect are more likely to be direct causes or consequences of it than are variables that provide less information. Modern causal discovery algorithms incorporate this insight via the information principle, which says effects are not conditionally independent of their direct causes, and via CART tree-growing algorithms, which identify combinations of predictor values that are highly informative about the value of an effect-dependent variable.


Hill proposed that if different investigators arrive at consistent estimates of an exposure-response association in different populations, then this reproducibility provides evidence that the consistently found association is causal (see Lucas and McMichael 2005). Against this, as noted by Gelman and Zelizer (2015), is the recognition that, “once researchers know what to expect, they can continue finding it, given all the degrees of freedom available in data processing and analysis.” Modern ensemble-modeling methods for predictive analytics pursue a somewhat similar criterion—but avoid the potential bias of knowing what to expect and using p-hacking to find it—by partitioning the data into multiple randomly selected subsets (“folds”), fitting multiple predictive models (e.g., CART trees) to each subset, and then evaluating their out-of-sample performance on the other subsets. Averaging the predictions from the best-performing models then yields a final prediction, and the distribution of the top predictions characterizes uncertainty around the final prediction. Such computationally intensive methods of predictive analytics provide quantitative estimates of predictive causal relations and uncertainty about them. A principle that consistency of estimates across multiple models and subsets of available data implies less uncertainty about predictive relationships replaces the Hill consideration that consistent associations are more likely to be causal. In addition, conditions and algorithms have been developed for “transporting” causal relations among variables inferred from interventions and observations in one population and setting to a different population and setting for which observational data are available (Bareinboim and Pearl 2013; Lee and Honavar 2013). These transportability algorithms have been implemented in free R packages such as causaleffect (Tikka 2018). They capture the idea that causal relationships can be applied in different situations, but differences between situations might modify the effects created by a specified cause in predictable ways. This is a powerful generalization of the consistency consideration envisioned by Hill.


Hill considered that the more specific an association is between an exposure and an effect, the more likely it is causal (see Lucas and McMichael 2005). This consideration is seldom used now because it is recognized that most exposures of interest, such as fine particulate matter, might have more than one effect, and each effect, such as lung cancer, might have multiple causes. Instead, modern causal inference algorithms such as those in the R package bnlearn discover causal networks that allow multiple causes and effects to be modeled simultaneously.


Hill considered that causes must precede their effects (see Lucas and McMichael 2005). This was the only one of his nine considerations that he held to be a necessary condition. Modern causal inference algorithms agree, but refine the criterion by adding that causes must not only precede their effects, but also must help predict them. Methods such as Granger causality testing specify that the history (past and present values) of a cause variable must help to predict the future of the effect variable better than the history of the effect variable alone can do.

Biological gradient

This consideration states that if larger exposures are associated with larger effects, then their association is more likely to be causal than if such monotonicity does not hold (see Lucas and McMichael 2005). This is closely related to the strength-of-association criterion since many measures of association (such as correlation) assume a monotonic relationship. Just as a strong confounder can explain a strong exposure-response association in the absence of manipulative causation, so it can explain a monotonic relation between exposure and response even in the absence of manipulative causation. Since 1965, research on nonlinear and threshold exposure-response relations has made clear that many important biological processes and mechanisms do not satisfy the biological gradient criterion. Modern methods of causal discovery, including CART trees and Bayesian Networks, can discover and quantify non-monotonic relationships between causes and their effects, so the biological gradient criterion is unnecessary for applying these methods.


Hill considered that providing a plausible mechanism by which changes in exposure might change health effects would make a causal relationship between them more likely, but he acknowledged that ignorance of mechanisms did not undermine epidemiological findings of associations (see Lucas and McMichael 2005). The converse is that ignorance of mechanisms can make many proposed mechanisms seem superficially plausible. Fortunately, modern bioinformatics methods allow principles of causal network modeling to be applied to elucidate causal mechanisms and paths, along with describing multivariate dependencies among population level variables. Thus, proposed causal mechanisms and paths linking exposure to harm can now be tested using the principles already discussed and data on the relevant variables in bioinformatics databases. For example, a mechanistic path—such as “exposure X increases biological activity Y, which then increases risk of adverse effect Z”—might sound plausible when proposed but might then be shown to not be plausible if changes in Z turn out to be independent of changes in Y or if changes in Z are still dependent on changes in X even when the value of Y has been conditioned on. The same causal discovery and inference algorithms can be applied to both epidemiological and biological data. No new principles or algorithms are required to develop causal network models and dynamic causal simulation models from data collected at the levels of populations, individuals, organ systems, tissues and cell populations, or intracellular processes, as witnessed by the explosive growth of causal discovery and inference algorithms and network modeling in systems biology.


Similar to plausibility, coherence of a manipulative causal exposure-response with current scientific understanding, which Hill considered to increase the likelihood that a causal relationship exits, can be addressed by modern causal diagram methods (see Joffe et al. 2012) without introducing any new principles or algorithms. Causal network inference and modeling algorithms can be applied to variables at different levels in the biological hierarchy, allowing coherence among causal networks at different levels to be determined from data. Coherence of knowledge at different levels is then an output from these algorithms, rather than an input to them. Alternatively, if knowledge is sufficient to allow some arrows in a causal diagram to be specified or forbidden, then these knowledge-based constraints can be imposed on the network-learning algorithms in programs such as bnlearn, assuring the coherence of discovered networks with these constraints.


If interventions are possible for a subset of controllable variables, then setting them to different values and studying how other variables respond can quickly elucidate manipulative causality (see Voortman, Dash, and Druzdzel 2010). Causal network discovery algorithms add to this consideration, providing specific techniques for designing experimental manipulations to reveal manipulative causal relationships and algorithms for “transporting” the resulting causal knowledge to new settings with different values of some of the variables (see Tikka 2018).


The last of Hill’s considerations is that it is more likely that an association is causal if its exposure and response variables are similar to those in a known causal relationship (see Lucas and McMichael 2005). But this can be difficult because what constitutes relevant “similarity” may not be known. For example, are two mineral oils “similar” for purposes of predicting causation of dermal carcinogenicity if they have similar viscosities, or similar densities, or similar polycyclic aromatic hydrocarbon (PAH) content, or some other similarities? The theory of transportability of causal relationships across different settings (see Bareinboim and Pearl 2013; Lee and Honavar 2013; Tikka 2018) provides a more precise and rigorous understanding of what conditions must be satisfied for a causal relationship identified and quantified in one system to hold in another. The variables (e.g., viscosity, density, PAH content) which are relevant for letting a causal relationship be transported define the relevant similarities between systems, thus allowing the analogy consideration to be made precise.

This comparison of Hill’s considerations with principles used in current causal network learning algorithms suggests that real progress has been made since 1965. The considerations of strength, consistency, and temporality can be refined and made more precise using modern concepts and terminology. The considerations of specificity, plausibility, and biological gradients incorporate restrictions that are no longer needed to draw sound and useful causal inferences, since current causal inference algorithms can simultaneously handle multiple causes and effects, multiple causal pathways, and nonlinear and non-monotonic relationships. The somewhat vague considerations of coherence and analogy can be made more precise, and experimental and observational data can be combined for purposes of causal inference, using the recent theory of transportability of causal relationships (Bareinboim and Pearl 2013; Tikka 2018). These technical advances suggest that it is now practical to use data-driven causal inference methods and concepts to clarify, refine, and replace earlier judgment-based approaches to causal inference. They provide concrete criteria that can be implemented in software algorithms or applied by courts to make more objective and accurate determinations of manipulative causality than has previously been possible. This provides a technical basis for expanding the role of judicial review to include encouraging and enforcing improved causal inference.

4. Summary and Conclusions: Potential Roles of Judicial Review in Transforming Regulatory Causal Inference and Prediction

This paper has argued that more active and stringent judicial review of the causal reasoning and claims advanced in support of regulations can increase the net social benefits from regulations by correcting problems that currently promote unrealistically large estimates of the benefits caused by regulations. Among these are the following:

1. Ignoring risk aversion and risk premiums for correlated losses. When it is not certain that reducing exposure to a regulated substance or activity will actually cause the expected health, economic, or other benefits attributed to such reductions, and when the regulation affects a large number of economic agents, then the risk-adjusted value of the uncertain benefits can be much less than their expected value. This difference, called the risk premium in decision analysis, is due to risk aversion, which penalizes large numbers of correlated losses. This reduction in benefits due to uncertainty about causation is not accounted for in benefits assessments and BCA calculations that focus on expected netbenefits while ignoring risk aversion.

2. Tyranny of extreme perceptions. Regulatory agencies may attract and retain employees who believe that the uncertain netbenefits caused by regulation are higher than most other people do. If so, these relatively extreme perceptions are likely to shape agency beliefs and benefits assessments for regulations.

3. Use of unvalidated and simplistic models of benefits caused by regulations. Confronted with uncertainty and complexity in the causal networks that link regulations to their consequences (both intended and unintended), regulators, like other people, often adopt simplistic, inaccurate, or unproved modeling assumptions, such as that adverse health effects will decrease in proportion to reductions in regulated exposures, or that positive exposure-response associations represent manipulative causation. These assumptions can lead to large but inaccurate predictions of the benefits from regulation. Such estimates are then amplified by media reports and public concerns in which the assumption-based numbers are treated as facts, without discounting them for uncertainty.

4. Failure to focus on manipulative causality. The epidemiological evidence of harm caused by regulated exposures and estimates of the presumed benefits of reducing exposures are based almost entirely on associational-attributive causal findings in important real-world examples such as the Irish coal-burning bans, the EPA Clean Air Act Amendments, and the FDA ban of animal antibiotics. As previously discussed, such findings have no necessary implications for predictive or manipulative causation. They do not provide a logically sound basis for risk assessments or benefits estimates for proposed future changes in regulations to reduce exposures. Moreover, associational causation can almost always be found by making modeling choices and assumptions that create a statistically significant exposure-response association (“p-hacking”), even in the absence of predictive or manipulative causation. Thus, conflating evidence of associative and attributive causation with evidence of manipulative causation can lead to routinely exaggerated estimates of the benefits caused by regulations.

4. Failure to learn effectively from experience. Health, safety, and environmental regulations are usually evaluated during the rulemaking process based on prospective modeling and prediction of the desirable effects that they will cause. This prospective view does not encourage learning from data via retrospective evaluation or designing regulations to be frequently modified and improved in light of experience. But such adaptive learning and policy refinement are essential for effective decision-making and forecasting under uncertainty in other areas, such as high-reliability organizations, superforecasting, and control of systems under uncertainty. As illustrated by the example of the Irish coal-burning bans, the relatively rare retrospective evaluations of the effectiveness of regulatory interventions that are currently conducted are prone to unsound design and confirmation bias. Rigorously designed data collection, evaluation, and modification based on performance feedback are not routinely incorporated into the implementation of most regulations. Thus, estimates of the benefits caused by a costly but ineffective regulation may remain exaggerated for years or decades, leading to widespread perceptions that it was effective and to adoption of similar measures elsewhere, as in the case of the Dublin coal-burning bans that are now being advocated for nationwide adoption.

The preceding problems have a single root cause: reliance on fallible and overconfident human judgments about causation. Such judgments tend to overestimate the benefits of regulations and neglect or underestimate uncertainties about them, thus promoting more regulation than needed to maximize net social benefits. We have argued that, fortunately, more objective and trustworthy data-driven estimates of the effects actually caused by regulations and of uncertainties about those effects are now technically possible, and they are also organizationally possible and practical if judicial review of causal reasoning and claims is strengthened. Advances in data science have yielded demonstrably useful principles and algorithms for assessing and quantifying predictive causation from data. Stronger judicial review that incorporates lessons from these methods into the review and application of causal reasoning used to support contested regulations can help correct the preceding problems and obtain many of the benefits of more accurate and trustworthy estimates of the impacts caused by regulations.

The following recommendations suggest how courts can promote better regulatory benefits assessment, impact evaluation, and adaptive learning to increase the net social benefits of regulations.

1. Insist on evidence of manipulative causation. Rules of evidence used in determining whether it is reasonable to conclude that a proposed regulation will probably cause the benefits estimated should admit only evidence relevant for manipulative causation. This includes evidence of predictive causation, insofar as manipulative causation usually implies predictive causation. It also includes evidence on causal pathways and mechanisms whereby changes in exposures in turn change harm, based on well-validated and demonstrably applicable causal laws, mechanisms, processes, or paths in a causal network. Regulatory actions proposed without evidence of manipulative causation should be rejected. Insofar as they provide no sound reason to believe that the proposed actions will actually bring about the consequences claimed for them, they should be viewed as arbitrary and capricious.

2. Exclude evidence based on associational and attributive causation. Such evidence is not a logically, statistically, or practically sound guide for predicting effects of regulatory interventions.

3. Encourage data-driven challenges to current benefits estimates. Producing relevant (manipulative causation or predictive causation) information about the impacts caused by regulations can improve risk assessments and benefits estimates, but it is expensive for those who undertake to develop such information. To the extent that it can increase the net social benefits of regulation by more accurately revealing the impacts of changes in regulations, such information has a social value—a positive externality—and its production and use to improve regulations should therefore be encouraged. One way to do so might be to grant legal standing to parties who seek to challenge current estimates of regulatory impacts based on new information or analyses of manipulative causation (at least if they also bear either costs or predicted benefits of proposed regulations). A second way might be to emphasize that the burden of proof for changing a regulation can be met by any stakeholder with standing who can show that doing so will increase net social benefits.

4. Discourage reliance on expert judgments of causation. Do not defer to regulatory science and expertise based on professional or expert judgments. Instead, insist on data-driven evidence of manipulative causation (including tests for predictive causation and elucidation of causal pathways or mechanisms) as the sole admissible basis for causal claims and estimates of the impacts caused by regulations.

A joint regulatory and legal system that encourages data-driven challenges to the assumptions and benefits estimates supporting current regulatory policies can create incentives for stakeholders—whether advocates or opponents of a change in regulation—to develop and produce the information needed to improve the effectiveness and net social benefits of regulation. It can also create incentives for regulators to adopt more of the habits of high-reliability organizations, regarding current policies as temporary and subject to frequent improvements based on data. Setting expectations that judicial review will provide independent, external, rigorous review of causal claims in light of data whenever stakeholders with standing insist on it may also encourage development of lighter-weight regulations that are less entrenched and difficult to change and that are more open to learning from experience and revising as needed to maximize net social benefits.

Of course, there is a large overhead cost to changing regulations that makes too-frequent change undesirable (see Stokey 2008). However, the threat of rigorous judicial review and court-enforced revisions when data show that estimates of benefits caused by regulations are either unsound or obsolete would encourage regulators to develop sounder initial causal analyses and more modest and defensible estimates of the benefits of actions when manipulative causality—and hence the true benefits from regulation—are uncertain. This provides a useful antidote to the above factors that currently promote overestimation of uncertain regulatory benefits with little penalty for mistakes and little opportunity for stakeholders to correct or improve estimates based on the judgments of selected experts.

In summary, more active judicial review of causal claims about regulatory impacts, with data-driven evidence about manipulative causation being the price of entry for affecting decisions, creates incentives to expand the socially beneficial role of stakeholders as information collectors. Simultaneously, active and rigorous judicial review of causal claims provides a mechanism to help regulators learn to perform better. It does so both by serving as an external critic and reviewer of causal reasoning and predictions on which contested actions are predicated, and also by providing an opportunity for new information and different data-informed views to be brought to bear in assessing the actual effects being caused by current contested policies. An adversarial system allows different stakeholders to produce relevant information, both confirming and disconfirming, for evaluating the hypothesis that current regulatory policies are performing as predicted in causing desired effects. Active judicial review of causal claims supporting contested regulations by a court that is known to apply BCA or law-and-economics principles provides incentives for the stakeholders to produce such information with the intent of reinforcing, revising, or overturning current regulations as needed to increase net social benefits. Doing so always coincides with increasing the netbenefits to at least some of the stakeholders, since increasing the sum of netbenefits received by all affected individuals implies increasing the netbenefits received by at least some of them. Thus, judicial review can promote production and use of causally relevant information and help regulators to learn from experience how to make regulations more beneficial. This is not a role that can easily be played by other institutions.

If courts develop, maintain, and routinely apply expertise in dispassionate, data-driven causal inference, both the threat and the reality of judicial review will help to overcome the significant drawbacks of current judgment-based approaches to causal inference for regulatory benefits assessment. Such review will also provide both regulators and stakeholders affected by regulations with incentives and ability to improve the net social benefits from regulations over time.


*. President of Cox Associates, a Denver-based applied research company specializing in health, safety, and environmental risk analysis; epidemiology; policy analytics; data science; and operations research.

1 The open source software is available here:

2 Available here:

3 For example, the Neural Information Processing Systems 2013 Workshop on Causality:

4 Available here:

5 See Pearson (2018) for a discussion of partial dependence plots.