Why is Causal Decision Theory wrong?

Not because Causal Decision Theorists are poor

Jul 08, 2025

Recently I wrote a post about decision theory. Let’s recall Newcomb’s Problem:

Our friend Omega is a very talented man (or, well, superintelligent AI). He’s really good at predicting what people will do. He can just scan you, and then predict what choices you’ll make in whatever circumstances you find yourself in—and he’s only wrong one in a trillion times.
Omega lets you take part in the following experiment. First, he scans. you. Now you get details. Before you are two boxes—Box A and Box B. Box B is transparent, and you see it contains a handsome $1,000. Box A is opaque, and you are informed that it has either $0 or $1,000,000. Specifically, he put the million in if and only if he predicted you would take only Box A. If he predicted you’d take Box A as well as Box B, he put nothing in Box A. (And those are your only two choices). But Omega has already made his prediction, the million is already in Box A or not, and now time to choose. Omega is a very good predictor, so all your one-boxer friends are millionaires, and all your two-boxer friends are mere thousandaires. Who would you rather emulate?

Causal Decision Theory (CDT) says you should, keeping your current circumstances fixed, choose the outcome that would cause the best results to obtain, on expectation. So in Newcomb’s problem, the CDTer says: “There are $x in Box A, and nothing will change the value of x. If I one-box, I get $x. If I two-box, I get $(x+1000). The latter is more money, so I two-box.” Omega predicted all this thinking, so he put nothing in Box A, so the CDTer walks away with a mere thousand. But Omega’s prediction was already out of his hands by the time the CDTer made his choice.

Evidential Decision Theory (EDT) says you should choose the action such that your choosing that action is evidence for the best possible outcome. So the EDTer says: “Conditional on one-boxing, Omega will have predicted I’ll one-box, so I’ll almost certainly get a million. Conditional on two-boxing, Omega will have predicted I two-box, so I’ll only get a thousand. Therefore I should take only one box.” Omega predicted all this, so the EDTer walks away with a million.

CDT and EDT are the two main contenders in academic decision theory. The EDTer says “If you’re so smart, why ain’cha rich?” and accuses the CDTer of acting like they’re an exception. The CDTer says that if you want money then it’s irrational to leave money on the table, and they accuse the EDTer of wishful thinking, as if getting evidence for something actually makes it true. The CDTer says they’re not rich because Newcomb’s problem punishes rational people. A decision theory isn’t false just because a super-predictor might scan people, figure out they have the correct decision theory, and then decide to give them a worse choice. This is a familiar dialectic for those knowledgeable of the Newcomb literature.

There is a third contender, in my view the correct position, called Functional Decision Theory (FDT). The view is more or less unknown to academia, since it was invented by an outsider named Eliezer Yudkowsky, though the view has precedent in David Gauthier (in Morals by Agreement and “Assure and Threaten”) and Douglas Hofstadter (see his notion of “superrationality”).

Basically, what FDT tells you to do in evaluating a choice is look at how things would transpire if FDT recommended that choice and you’re an FDTer. Say I’m a FDTer. Then conditional on FDT recommending two-boxing, Omega will predict I two-box, and I’ll two-box, and get a mere $1,000. Conditional on FDT recommending one-boxing, Omega will predict I one-box, then I’ll one-box, and get a million. Therefore FDT recommends one-boxing. Read my previous post linked at the beginning for more details.

Importantly, FDT alone says you should one-box even if Box A is transparent and Omega only put the million in if he predicted you’ll one-box no matter what you see. The CDTer in Transparent Newcomb will reason as before, taking to boxes and leaving with a mere $1,000. The EDTer will see either a million or nothing in Box A before making his choice, so consequently their choice doesn’t give any evidence as to what’s in Box A, so they take both boxes (that certainly just gives them an extra thousand on top of whatever they seen in Box A). Omega will have predicted that, so the EDTer leaves with $1,000. For the FDTer, the reasoning happens just as it did in the normal Newcomb problem, and they alone leave with a million.

My post mainly concerned making the case for FDT over EDT. In my view, all the considerations in favor of EDT (the pragmatic ones, Arif Ahmed’s arguments concerning choice and world-histories) actually point towards FDT. I think the only reasons the EDT-inclined in academia reject FDT are (i) they haven’t head of it, (ii) it was invented by Eliezer Yudkowsky, and (iii) it gives weird verdicts in e.g. Transparent Newcomb and MacAskill’s bomb case (see my linked post for discussion). On (iii), I don’t think FDT’s verdicts are any weirder than EDT’s; the only difference is you know what you’re doing, whereas the EDTer’s intuitions in e.g. the normal Newcomb problem are consoled by the fact that it at least seems to them like their choice makes there be a certain amount of money in Box A. And not only does the EDTer act on the most advantageous principle only when they don’t know what they’re doing, but they will pay people to not tell them what they’re doing.

In my post, I said in a footnote:

Actually, saying why CDT is wrong is really hard. The “it makes you poor” argument is, ultimately, question-begging. In reality, I think showing why it’s wrong would have to involve some deep philosophy stuff about the relationship between causation and the will. But if that bores you, then you can simply choose not to be poor.

and one of my mutuals on X (formerly twitter) replied:

It doesn’t bore me, and I want to read about why (you think) it’s wrong.
I’m not terribly convinced by the “don’t pay to do things you could do” argument, since really it’s paying to be in a different, more favorable situation, which you could not be in without paying

So, guys, get ready to not be bored!

First off, I agree with Olivia above that the pragmatic arguments are question begging or otherwise wrong. Yudkowsky mainly argues for FDT on pragmatic grounds, saying that agents who are FDTers do best on expectation. FDT alone gets the maximum payout in Transparent Newcomb, as well as in every other case, since it basically just tells you “Do the thing such that agents who do that get the best results.” But there’s the issue of punishment for being irrational: if Omega decides to torture agents who would one-box in the standard Newcomb problem, that doesn’t automatically make one-boxing irrational in the normal case. Therefore, Yudkowsky qualifies his claim that a decision theory should on average lead to the best outcome for agents with that decision theory; rather, he says a decision theory should lead to the best outcome when it comes to fair decision problems. He defines a “fair” decision problem as one where the outcome only depends on your choice in that very decision problem. So the case of the predictor who just decides to torture people with a particular decision theory, or torture people who would one-box in a Newcomb case where the torturer isn’t present, is not a fair decision problem. Who does best in those scenarios doesn’t tell you what’s rational.1 In contrast, in Newcomb’s Problem, Omega is only “punishing” you or not based on his prediction of what you will do in Newcomb’s Problem.

But Yudkowsky’s definition of fairness begs the question against CDT, since he hides in his definition an assumption about what counts as that very same decision problem. The CDTer who is predicted to two-box will consider themselves to be in a different decision problem then the EDTer/FDTer who is predicted to one-box, since the circumstances at the time of choice are different between them. So saying the CDTer doesn’t do best given their available options doesn’t help to convince them, since depending on your decision theory, you’ll have a different notion of what your “available options” are. And the CDTer will not admit that Omega only punishes you for your choice in that very decision problem; he will say, rather, that Omega punishes you for his prediction of your choice, which is out of your hands.

Why, then, is CDT wrong? I don’t think pragmatic arguments work. Indeed, I don’t think we have any choice but to get down and dirty and talk about the nature of choice ad the will, so we have a better idea of what our “available options” are.

Let’s talk about diachronic choice (choice across time). Thing is, I just call that “choice”: I don’t think there’s any fundamental difference between choosing to raise my arm and choosing right now to raise my arm tomorrow. Maybe the latter has a higher chance of failure, since I’ll be more likely to change my mind or forget in the interim, but the possibility of my choice failing to be executed doesn’t make it a different kind of thing. All choices have a chance of failure, they only differ in degree. I think this is a crux—I discovered in conversation with Quinn White it is kind of idiosyncratic of me to think of my choices this way. He said (I think I am faithfully representing him, though my memory does err) that I cannot simply choose to move my arm tomorrow; when I do what we would intuitively describe with those words, I am making a decision which merely commits me to raising my arm tomorrow, and it’s the job of later-me to stick with that commitment unless I find myself with new reasons not to. But I think I can just do things tomorrow: I make a decision now, it has a certain effect, the effect just happens to be father away and more mediated by some complicating factors that might arise in the interim.

Now, back to normativity. Three views of diachronic choice are nicely illustrated by the famous Wine Case:

You’re going to a dinner party, and you’ll be able to have zero, one, or two glasses of wine. The more wine you have, the more enjoyable your evening will be. But if you have two glasses, you will have a headache tomorrow, and you would prefer having zero glasses to having two glasses and a headache. But one glass won’t leave you with a headache, so that’s your most preferred option. But if you have one glass, you will become uninhibited, and you will then prefer two glasses and a headache to just one glass. What do you do?2

A naive chooser will choose based off their current, local preferences. The naive chooser will have the first glass because they prefer one to none, then their preferences change, so they have a second, leaving them with a hangover. Clearly, going into this situation you don’t want to be a naive chooser: you’ll predictably end up with the worst outcome.

A sophisticated chooser anticipates their future decisions and uses that to inform their earlier choice, assuming their future self will be a sophisticated chooser. The sophisticated chooser will anticipate that if they have the first glass of wine, they’ll then prefer a second, and they’ll end up with a hangover. Now seeing the hangover as an unavoidable consequence of having the first glass, the sophisticated chooser foregoes any wine, leaving them with the second-best outcome.

A resolute chooser makes plans and sticks to them, even if their later preferences make them want to depart from the plan. The resolute chooser will choose from the beginning to stop at one glass. They then have their one glass, and then forego a second one even though they now prefer a second, simply because resolute choosers stick to plans they rationally formed (unless they find new reasons to abandon the plan). The resolute chooser gets the best outcome from the perspective of the agent before having any wine.

I think people can just do things, even things things in the future. If having one glass of wine is best, then choose to have one glass of wine. “But later me will prefer a second glass, so I’ll unavoidably drink it and end up with the worst outcome!” False. Later me is still me, and nothing prevents me from not having a second glass. No one is forcing me. I can simply choose to have a second glass or not. “But it will be irrational to forego the second glass once I prefer it, and I have to anticipate my future self being rational.” I disagree that foregoing the second glass will be irrational; I decide right now to forego it, and my reasons seem cogent, since I don’t want a hangover.

The sophisticated chooser treats their future self like a different person.3 It’s like “I can’t leave ten pieces of candy on the table, then my kid will eat all of them and get sick.” You can’t just directly will that your child only eat one piece of candy. But you can directly will whether you have a second glass of wine in the future. That’s what willing is, and what makes future you future you.

“But, really, I’m just a series of time-slices of agents that happen to be psychologically connected in some way. That’s why I can’t directly will my future choices like I can my current choices; to do something in the future, my future self has to want to do the same thing I planned.” I think what makes you the same person now and later is that your decisions are decisions made by one and the same will. So, your continued existence is partly constituted by the fact that you make decisions about how to act in the future, and your future self carries them out under the assumption that it was their decision to adopt that plan in the first place. An extended self might change their plans, but the change in plans will be for the same sorts of reasons that would have lead the earlier self not to make them in the first place had they known about those reasons.

Maybe you’re unlucky, maybe your decisions do not in fact cause your later self to act under the assumption that the two of you have the same will, and there is (by hypothesis) nothing you can do about that. Then you have no choice but to be a sophisticated chooser. But I am not unlucky in that respect. If you tell me “No, you’ll rationally have to have the second glass of wine, since that becomes instrumentally rational due to your change of preferences. So, based on your current preferences, you rationally shouldn’t have any wine,” then I will reply, “It is within my control whether I’ll have a second glass given those circumstances. I hereby choose not to. I am in fact one agent extended over time; whether I’ll act on my later preferences is currently what is in question, and nothing forces me to act taking for granted that I will later act a certain way.4

Rationality was made for man, not man for rationality. If you tell me that I’ll have to do something, and if acting under that assumption leaves me with a worse outcome, and your only argument that I’ll have to act that way is that it’s rational, then I conclude it is not in fact rational. People think otherwise, I speculate, because they reify “being rational” and place value on that label. But when it comes to instrumental rationality, nothing says I have to value “rationality;” I value money and having a pleasant evening and not having hangovers. I value those things, and I am rational because I do what promotes those values.

So what does this have to do with Newcomb’s Problem? The discussion of the Wine Case (at least between the resolute and sophisticated choosers) is pretty much parallel to a small modification of Newcomb:

Same situation; Box A is opaque and has a million dollars iff Omega predicted you’ll one-box, and it has nothing otherwise. Box B is transparent and has $1,000. Only now, before Omega makes his prediction, you can choose to pay $100,000 to remove the choice to two-box (the second box gets glued to the table). If you pay, Omega will take that into account in his prediction and almost certainly predict you’ll one-box.

The CDTer will pay the $100,000 and then walk away with a net gain of $900,000. Since Omega hasn’t scanned you yet, paying to commit yourself will cause him to predict you’ll one-box, ultimately leaving you with $900,000. If the CDTer doesn’t pay, they will inevitably two-box since the choice occurs after he is scanned, and Omega will predict this, and the CDTer will leave with $1,000.

The FDTer (and the EDTer, in this case) will just not pay, take one box, and walk away with $1,000,000. The CDTer treats their future self as an obstacle to be overcome; the FDTer just adopts the strategy that will give them the best outcome. At least the wine case is a bit confusing, since your preferences do change, meaning different reasons weigh on the decision of whether to stick with the plan than weigh on whether to adopt the plan in the first place. But nothing like that happens in the Newcomb case; you always just want more money. There’s no reason to privilege what makes sense at the time of choosing boxes over what makes sense at the time at which you adopt a plan. So, you should just choose to take one box for free, and then walk away with $1,000,000.

The keen among you will note that I have, at best, argued in favor of CDT + resolute choice, which doesn’t always align with FDT, in particular in cases where you don’t get the chance to adopt a plan before being scanned. But why should a prior opportunity to adopt a plan make a difference? By simply acting like an FDTer, you get all the same benefits through the exact same mechanism, even when you don’t get a chance to explicitly adopt a plan first. “But it’s not the same mechanism—in the one case, the mechanism involves my sticking with a commitment I made. If I’m scanned before thinking about the case, then I have no such commitment, so I should just act based on the expected effects of my action.” But why should the mere act of adopting a plan matter here? If a plan is the rational one to adopt, I would think an agent who simply follows it intentionally is being rational, even if there is no particular point in time at which they “adopted the plan.”5

Perhaps, one may say, the relevant difference is that if you didn’t have a chance to make a prior commitment, your agency isn’t what’s causing you to get the best outcome. But what is my agency? It is the principles I adopt, which is manifest in what choices I make in particular cases. That’s what Omega bases his predictions on. But whether it is a fact about my agency that it leads to one-boxing )even in Transparent Newcomb where you don’t get a chance to precommit) is still an open question when I’m deliberating about how many boxes to pick. Even if I don’t precommit and my principles end up saying I should one-box anyway, then it was a fact about my agency all along that I’m a one-boxer, and that in turn is what causes Omega to put the million on.

To summarize: if you’re a resolute chooser (which you should be, since you can just choose to do things and your future self is you and not an obstacle to be worked around), then you should adopt rational plans and stick to them because you rationally adopted them. But even if you don’t explicitly get to adopt a plan beforehand, whether you were all along the kind of agent who would one-box is an open question, a question you settle by deciding to one-box or not. So if you do in fact decide that one-boxing is what’s justified by your principles, then it will have been the principles you rationally adopted all along that causes Omega to predict you’ll one-box. The causal relationship between your agency and Omega’s prediction is unaffected by whether you sit down and explicitly think about what plan to follow beforehand.

It look me a while to realize I was born a Functional Decision Theorist. I’ve always already had the commitment to one-box, because I’ve always had (at least to some extent) the control over my actions in virtue of which I can simply adopt the best plan and stick to it. I don’t know how to make you similar to myself; maybe you’re just a heap of time-slices making an effort to manipulate your future time-slices in a beneficial manner, in which case you should probably be a Causal Decision Theorist. But you can just stop thinking of yourself that way, and if nothing goes awry, your future time-slices will act under the assumption that they’re doing the things that they themselves already decided to do. Then, you will be born again as a new kind of agent, one whom FDT really does bind, and one who reaps all the benefits.

See Yudkowsky & Soares, “Functional Decision Theory,” pp. 29-30.

Adapted from Lara Buchak, Risk and Rationality, pp. 175-177.

C.f. Buchak, pp. 180-181.

I take this to be Korsgaard’s view of personal identity from her “Personal Identity and the Unity of Agency” (1989).

This sort of point was anticipated in a comment on my other post.

Jesse Clifton

Jul 8

> I think the only reasons the EDT-inclined in academia reject FDT are (i) they haven’t head of it, (ii) it was invented by Eliezer Yudkowsky, and (iii) it gives weird verdicts in e.g. Transparent Newcomb and MacAskill’s bomb case (see my linked post for discussion)

It could also be that

-FDT requires counterlogical suppositions

-FDT depends on a notion of “algorithmic similarity” that has yet to be worked out

-FDT seems to assume a controversial “algorithmic agent” ontology (see here [*])

[*] https://www.lesswrong.com/posts/dmjvJwCjXWE2jFbRN/fdt-is-not-directly-comparable-to-cdt-and-edt

Expand full comment

Corsaren

Jul 9

I do love how deliciously Kantian your view of FDT and theory of action is. Literally just adopt a maxim, loser. itsnotthathard.

I usually find Newcomb's Problem discussions pretty tiresome / contrived, but both of these posts were really good! Not boring! I'm still a bit skeptical about that MacAskill's bomb case though. I think I'm generally capable of pre-committing to actions and principles in a manner consistent with FDT, such that I could decide as a matter of course to be a true one-boxer for even the Transparent Newcomb.

But if I think about the bomb scenario, I don't think I could actually bring myself to open a box that I know has a bomb in it and which will kill me. Even if I did "pre-commit" to opening the left box, I know that in the 1 in a trillion trillion universe where Omega messes up and I end up in the situation described, I would under no circumstances open the Left box. It's just not an action that I'm capable of taking. So even if I choose to open the Left box ahead of time because doing so would save me $100, my conditional chance of failure in the described situation isn't merely higher; it's ~100% and I know it. Which of course means that Omega would predict** that I'll pick Right box and so I'll end up in this situation regardless.

Basically, this feels like a situation where I really am incapable of credibly committing to a policy / future action even if I want to, and because the prediction can account for that lack of credibility, no amount of pretending or telling myself that I really am committing to it will do anything to change the result.

**Side note: This is only true if we modify the original bomb problem. As currently stated, it's not actually equivalent to the Transparent Newcomb, because if I do commit to opening the Left box and Omega predicts that I will open the Left box and leaves me a note saying that it predicted as such, I will of course just open the left box. So the fact that I know I would open the right box if told there was a bomb in the left box doesn't actually determine Omega's prediction. For my failure to follow FDT to matter, we'd have to instead say that Omega specifically is predicting what I would do in the scenario described (i.e., when told that there is a bomb in the left box). I still don't think this is quite the same as Transparent Newcomb? But at least now the argument goes through.

2 replies by Flo Bacus and others

7 more comments...

Moral Law Within

Discussion about this post