Our friend Omega is a very talented man (or, well, superintelligent AI). He’s really good at predicting what people will do. He can just scan you, and then predict what choices you’ll make in whatever circumstances you find yourself in—and he’s only wrong one in a trillion times.
Omega lets you take part in the following experiment. First, he scans. you. Now you get details. Before you are two boxes—Box A and Box B. Box B is transparent, and you see it contains a handsome $1,000. Box A is opaque, and you are informed that it has either $0 or $1,000,000. Specifically, he put the million in if and only if he predicted you would take only Box A. If he predicted you’d take Box A as well as Box B, he put nothing in Box A. (And those are your only two choices). But Omega has already made his prediction, the million is already in Box A or not, and now time to choose. Omega is a very good predictor, so all your one-boxer friends are millionaires, and all your two-boxer friends are mere thousandaires. Who would you rather emulate?
The dialectic is familiar. The two-boxer says: “Oh my gosh, just think about what’s rational! Your decision won’t affect what’s in the first box. Either you’re lucky and Omega put the million in, or you’re unlucky and he put nothing in. Your only choice is whether you want an extra $1,000 on top of whatever you’ll get anyway. Obviously take the $1,000.” Halfway through that explanation, the one-boxer stopped listening. They simply reply, “If you’re so smart, why ain’cha rich?”
This is already too much decision theory, so I’ll let you guys take a break with an anecdote. I once knew someone (that is a lie—I’m making this up) who needed some new shoes. She was wondering whether to get some random ones from the thrift store, or a nice expensive pair. The latter will be higher quality and last longer and be well worth the price, the only disincentive being it hurts to buy expensive things, so she said she wouldn’t look at her bank account first. “But you might not have enough money,” I said, and she replied “No, I almost certainly have at least barely enough, but if it turns out I only have barely enough, I won’t want to buy them.” I reply, “You may as well look at your bank account. If you don’t have much money and it’s not worth the expense, that would be good to know.” “No,” she replied, “it’s definitely worth the expense either way.” Then I say, “Well, then, just get the information, just look at your bank account and buy the shoes no matter what it says, since it’s worth the expense either way.” “No,” she said, “if I don’t have much money left, I won’t want to buy the shoes.” Then I said, “it’s strange that you’re assigning a negative value to knowledge, not because the knowledge itself would hurt you, but because it would cause you to act in a way that you now assess to be worse even conditional on the thing being true. Like, would you pay someone $10 to not tell you how much money is in your bank account?” She replied, “I suppose I would. The shoes are so great they’re even worth their price plus $10, even conditional on my account being low.” I say, “That’s nuts. You’d pay someone to not tell you what’s in your bank account, but you could just let them tell you whatever and make the same decision regardless, all for free.” Finally, she said “I see your point. But if I agree with you that I shouldn’t pay people to just not give me information that would influence my actions, then we could come up with far-out cases where I die horribly from some kind of bomb. That sounds worse, so I won’t check my bank account.” My friend, I think, is quite irrational. If the shoes are worth the expense regardless, then she should just look at her bank account and make sure she won’t overdraft, then buy them regardless of what it otherwise says. Crazy, right? Now back to decision theory.
The two-boxer, named Casandra, complains to the one-boxer, named Edmund, the latter of whom is about to partake in Omega’s experiment. “The reason I’m not rich is because the scenario is rigged so as to punish rational people. Omega checks if you’re rational or not, then gives you a worse choice if you are,” says Casandra. Edmund replies, “Sounds like cope. If you’re being punished for being rational, meaning you can avoid the punishment by acting differently, then it sounds like your decision wasn’t rational in the first place.” “Come on,” says Casandra, “there are some cases where a person is punished for being rational. Say Omega scans you and finds out you’d one-box in the scenario as originally described. Then, if you would, he punches you in the face.” “Fine,” says Edmund, “agents can be punished for being rational. But this isn’t a case like that. You can become the guy who isn’t punished just by acting differently. That’s different than someone seeing if you’d be rational in some other case and punishing you based on that.” “But I can’t get the better reward just by acting differently—if I just act differently, keeping my circumstances the same, Omega will have already made his prediction and I’ll walk away with nothing.” “Yet,” says Edmund, “I’m about to be a millionaire, you’re a thousandaire, and the only difference between us is that I chose one box.”
I will not rehearse the dialectic further—this post is for those of you who want to be millionaires. Unfortunately for you, Omega is about to add a twist.
“What??” exclaims Edmund. “You told me Box A was going to be opaque?” Omega laughed: “What difference does it make? Everything else is the same, the only difference is now you know more—do you not like knowledge? The conditions for your reward are as you expected. If I predicted you take only Box A no matter what you see, I put a million in. If I predicted you’ll take both, I put nothing in. It’s the same problem, only now you know from the start whether you’ll be like Casandra or whether you’ll be a millionaire.” “You asshole, but you put nothing in Box A!” “Well, was my prediction wrong? Make your choice, though I already know what it’ll be.”
Another decision theorist, named Flo, comes to see Edmund. “Wait,” she said, “I know you’re mad, but you should really pick just Box A.” “Are you crazy?” said Edmund, “that dickwad put nothing in there. No way I’m going to walk away with nothing.” Flo said to him, “And what would you do if you saw the million in there?” Edmund replied, “Obviously I’d take that and the thousand as well, I already know what prediction Omega made. It would be crazy to just leave the thousand there.” Flo replies, “So Omega wasn’t just being a dickwad. No matter what, you’d pick both boxes. So Omega put nothing in Box A—those were the terms of the experiment. Your reasoning is exactly like Casandra’s: no matter what’s in there, you’d rather get the extra thousand. The only difference is that you know what you’ll get. But why does knowledge matter, if all you care about here is getting money? If you’d do something if you knew it was raining, and you’d do something if you knew it wasn’t raining, then surely you should just do that thing, regardless of what you know. States of affairs might be relevant in determining how to act, but your knowledge of states of affairs is only relevant to the extent that it tells you what to do given those states of affairs.” Edmund replies, “This is totally different. Casandra didn’t have a box with nothing in it right in front of her. She didn’t know it had nothing, and if she picked just one box, before opening it she would’ve been almost certain it had the million.” Flo replied, “You know, I heard Casandra talking shit about you before. She thought you engaged in magical thinking, like somehow what actually happens is dependent on what evidence you have, so you should do things that are evidence of a good outcome. I tried to defend you, saying ‘Edmund doesn’t literally just care about evidence, he wants the most money and the best outcome overall, he does the thing that would be evidence of a good outcome because it’s the policy which will overall give him the best outcomes over his whole life.’ But it seems I was mistaken.” Edmund finally said, “This is ridiculous. Obviously I’m not just leaving the $1,000 there.” Edmund takes his two boxes and leaves, maybe buying a new laptop afterwards (though not a top-end one).
“Your turn, Flo. Same as with Edmund—both boxes are transparent.” Flo sees the million in Box A, and a thousand in Box B. “That makes no difference to me.” Flo walks away with her mere million dollars, buys a top-end laptop, and still has $995,000 to spare.
Omega felt bad about how dissatisfied Edmund left the experiment, and let him participate again. Edmund was happy to see one transparent box with $1,000 next to an opaque box. “But you know there isn’t still a catch,” said Omega. Edmund wondered what he could possibly be up to. “My friend Flo here knows what prediction I made. She might tell you what’s in Box A, which would probably lead to a repeat of our last trial. I figure she’ll ask you for the $1,000 you got.” Edmund is highly irritated—“Fine, Flo, I’ll give you the $1,000 to keep your mouth shut.” “No,” Flo says, “how much money is in your bank account?” Edmund is horrified. “Look, I barely have $22,000 there. About $22,150 I think.” “Okay,” Flo says, “$22,150 is my price. Give me that much, and I won’t tell you what’s in Box A.” “How could you? If I empty my savings, that would be devastating. What if I need a car repair? I wouldn’t be able to afford it, I’d have to get rid of my car.” Flo replied, “Don’t give the money, and I’ll tell you there’s nothing or a million in Box A, and either way you’ll take both boxes, meaning Omega would have known you’d do that, so you’ll probably just get $1,000. Give me the money, and I won’t say anything, and you’ll almost certainly get a million. The choice is clear—$22,150 is nothing compared to a million, so it’s a small price to pay. I don’t feel bad screwing a millionaire out of chump change like that. You will be a millionaire, right?” Edmund just wants the money, so he pays Flo and walks away a near-millionaire, even though he could have been a millionaire if he just let Flo talk and taken one box regardless of what she says.
Okay, I’m done with the story-telling. Clearly, Edmund is irrational. If there is one certain principle in decision theory, it is this: don’t pay people to do things you can do for free. One-boxing means you’ll be a millionaire; you don’t need to pay people to not give you information that will cause you to two-box, since you can just one-box for free. Even Casandra is irrational. If she could pay Edmund to force her to one-box before Omega scans her, she would, since that would make her a millionaire minus whatever Edmund charges. But Flo can simply get the million without paying Edmund. The scenario is one in which people who one-box almost certainly get a million, and there is nothing but money at stake; Flo alone gets the million without having to pay people to cause her to do what she can simply do for free.
Casandra is a Causal Decision Theorist. She thinks that what you should do is the thing that causes you to get the best outcome, keeping the circumstances at the time of choice fixed. If we keep the circumstances fixed—including how much money Omega put in Box A—then two-boxing always gives an additional $1,000, so Casandra two-boxes. If you make her pay to have you force her to one-box, and this happens before she is scanned, paying you will cause her to get a million minus whatever you make her pay, since it will affect what Omega ends up predicting.
Edmund is an Evidential Decision Theorist. He thinks what you should do is the thing such that, if you did it, it would be evidence that you get the best outcome. Almost everyone who one-boxes ends up a millionaire, and almost everyone who two-boxes ends up a thousandaire. Edmund picking one box is near-certain evidence he’ll get a million, so he does that. But Edmund will two-box if he knows what’s in Box A, since then his choice won’t be evidence of anything beyond whether he gets the extra $1,000 or not. Thus, if you make him pay you to not tell him what’s in Box A, his current evidence says that if you tell him, then he’ll two-box, so he’ll probably only get $1,000. If he does pay you, he’ll pick one box and end up with a million minus whatever you charge him. Just as long as you charge him less than $999,000, his expected payout conditional on paying you is bigger than conditional on not paying you.
Flo is a Functional Decision Theorist. She thinks you shouldn’t pay people to do what you can just do for free. More generally, she thinks you should act on the principles that would yield the best outcomes conditional on those principles being correct. People with a principle of one-boxing do best, getting $1,000,000. People with a principle of not paying people to force them to one-box, but just one-boxing voluntarily, do best. People with a principle of one-boxing even if you know what’s in Box A do best—people with that principle get a million, people without it get a mere thousand. Therefore Flo one-boxes in all the cases described above, without having to pay anyone to make that happen.
The original case—where Box A is opaque and you don’t pay people to do anything, you just take one box or two—is called the Newcomb Problem. The case where Box A is transparent is the Transparent Newcomb Problem. Causal Decision Theory (CDT) says two-box in both cases. Evidential Decision Theory (EDT) says one-box in the normal Newcomb Problem, but two-box in the Transparent problem. Functional Decision Theory (FDT) alone tells you to be a millionaire in either case. CDT is wrong because it makes you poor.1 EDT is wrong because it makes you poor in the transparent case, and also tells you to do crazy stuff like pay people to not give you information. FDT literally never tells you to do things which will make you worse off overall—by definition, if some action being rationally required would make you worse off, then FDT does not recommend that action.
But, you say: what about Billy’s Barbaric Bomb? The case, given by William MacAskill, is as follows:
You face two open boxes, Left and Right, and you must take one of them. In the Left box, there is a live bomb; taking this box will set off the bomb, setting you ablaze, and you certainly will burn slowly to death. The Right box is empty, but you have to pay $100 in order to be able to take it.
[Omega] predicted whether you would choose Left or Right, by running a simulation of you and seeing what that simulation did. If the predictor predicted that you would choose Right, then [he] put a bomb in Left. If the predictor predicted that you would choose Left, then [he] did not put a bomb in Left, and the box is empty.
The predictor has a failure rate of only 1 in a trillion trillion. Helpfully, [he] left a note, explaining that [he] predicted that you would take Right, and therefore [he] put the bomb in Left.
You are the only person left in the universe. You have a happy life, but you know that you will never meet another agent again, nor face another situation where any of your actions will have been predicted by another agent. What box should you choose?2
(We assume, also, that losing $100 is bad for some reason, even though you’re the only person left in the universe. We assume also that burning to death is less than a trillion trillion times worse than losing $100.)
So, the case is very manipulatively framed.3 (By the way, my answer is: burn to death. Obviously.) Note that it’s basically parallel to the Transparent Newcomb Problem, just with more dramatic numbers (EDIT—see footnote).4 The change in numbers does not serve to make you better attuned to the structure of the problem, to whether you should accept an epistemically certain disadvantage because of some weird causal relation between your choice and whether you get put in this unfortunate circumstance in the first place. No, the numbers just make it more dramatic, indeed in a way that you would antecedently expect to cause you to have a less rational intuition.
Let’s talk about numbers. There’s a thing called scope insensitivity (read that post). Basically, if a number is really big or really small, we don’t really take account of its magnitude in our deliberation. We either treat it as “Basically infinity, but not infinity” or “basically zero, but not zero.” Is paying $100 worse than a one-in-100,000 chance of horribly burning to death? Probably not. Is it worse than a one-in-1,000,000 chance? I don’t know. Is it worse than a one-in-1-trillion-trillion chance? Probably. Do my intuitions reliably distinguish between these three cases, just based off the numbers? Almost certainly not.
The thing is, if an event objectively has a one in a trillion trillion chance, and then it happens (which it in fact won’t), my brain does not process it as “thing with super low probability that just happened to obtain,” rather it processes it as “thing whose probability can’t be that low, since it did in fact obtain.” Thus, if it’s part of the stipulation of a decision problem that the super-duper-low probability event actually obtained, my intuitions aren’t going to treat it like that; my intuitions will treat it like a one-in-100,000 chance, or probably as even more likely. And, in fact, if the predictor were inaccurate in as many cases as one in 100,000, not paying the $100 would be irrational, even on FDT.
If you see a thought experiment and you have certain intuitions about it, take that into account. If someone comes along and says “Okay, but what if we take the same thought experiment but make the numbers super dramatic? Like, so dramatic that we’d antecedently expect your intuitions to become unreliable?” then do not take that into account. Stay with the normal numbers; focus on the structural properties of the decision problem; don’t die on the hill of your intuitions in cases where you have independent reason to expect them to be unreliable.
Actually, saying why CDT is wrong is really hard. The “it makes you poor” argument is, ultimately, question-begging. In reality, I think showing why it’s wrong would have to involve some deep philosophy stuff about the relationship between causation and the will. But if that bores you, then you can simply choose not to be poor.
This is a statement about the case itself, not MacAskill’s intentions. He seems like a good person, indeed better than me.
There are some disanalogies. For one, the numbers are different: in Transparent Newcomb the CDT/EDT strategy has a tiny chance of giving you the best possible outcome ($1M+$1K), and more importantly, the relationships between predictions and outcomes are different. MacAskill’s case is underspecified, since it doesn’t say what happens conditional on all the four possible strategies. I had assumed that, at the very least, Omega puts the bomb in if he predicts you’ll pick Right conditional on finding out there’s a bomb, which is what you need for FDT to say you should burn to death.
Nevertheless, I think they’re parallel enough for the purposes at hand. MacAskill’s case is exactly parallel to a modification of the Transparent Newcomb case, where (i) you can take either just Box A or just Box B, (ii) Box A has a million dollars iff Omega predicted you’ll take A conditional on seeing it empty and nothing otherwise, and (iii) Box B has $1,000 in it for sure. I think everything I said about Transparent Newcomb would carry over just as well here, so my arguments still go through. (The EDTer would still pay to not learn stuff, assuming Omega puts the million in if the agent pays to not learn and then chooses one box).
Previously, I said “exactly parallel,” which is false. Thanks to
for bringing this issue up.
Thank goodness someone else is writing about FDT on substack! Just subscribed, looking forward to more of your work :)
If Newcomb's problem feels like a cheat to you, requiring predictions of what you would do in a hypothetical, Joe Carlsmith recommended (on the 80K hours podcast) thinking about twin prisoners' dilemmas as a more visceral and explicitly *fair* representation of the problem. I wrote about it here: https://jacktlab.substack.com/p/introduction-to-functional-decision
'The “CDT makes you poor” argument is, ultimately, question-begging.'
Could you clarify what you mean by this?