If anything, I'd rather we played with Diplomacy-like AI while it's in infancy and we can learn a bunch of stuff. The alternative is to have all the mature tech and BAM, somebody applies it the right way.
I'm an author on the paper. I want to point out what I think are a few mistakes with the blog.
First, Zvi says "The strategic engine, as I evaluated it based on a sample game with six bots and a human, is mediocre at tactics and lousy at strategy." This is not the feedback we have gotten from expert Diplomacy players. The general consensus among expert Diplomacy players is that the strategy/tactics are extremely strong, perhaps expert level, but that there is room for improvement in the dialogue. I'm not familiar with Zvi's experience with the game of Diplomacy, but I did place 3rd in the North American championship this year and have learned a lot about the game of Diplomacy, including how humans play it, over the past 3 years, so I feel somewhat qualified to comment on Diplomacy strategy.
In particular, I disagree with Zvi's opinion on the bot's strategy. Zvi says "I hate France’s tactical play, both its actual plays and the communications with Russia that are based on its tactics, dating back to at least 1903. The move here to Irish Sea needs to be accompanied by a convoy of Picardy into London or Wales, fighting for Belgium here is silly."
It's well-known among experienced Diplomacy players that France needs 3 armies on its mainland in order to defend itself well from a hostile Germany. Moreover, a convoy to London or Wales is unnecessary here. By moving to Irish Sea, France is setting themselves up for the option to convoy directly into Liverpool next turn. England can't block it because they have no army on their mainland. In short, I feel pretty confident saying that France made the right move here. If Zvi still disagrees, we could pull in some consensus expert Diplomacy players to get their opinions on this.
Second, we did a 200-game tournament for no-press (that is, no-dialogue) Diplomacy back in January where players were informed that one of the players in each game was a bot. Our bot, Diplodocus, placed first in this tournament. There's a video on it here: https://www.youtube.com/watch?v=AWQFhYSD7h4&ab_channel=DiploStrats . You can see in the Youtube comments that one expert Diplomacy player (Sploack) described the bot as "currently the best gunboat player in existence, or at least in the top 5." The strategy/tactics in no-press Diplomacy don't match up perfectly with full-press, but they do carry over to some extent.
Third, Zvi criticizes the fixed 1908 end date as working to the bot's advantage. The no-press tournament mentioned above did not have fixed end dates. Also, I think the fact that the full-press games ended in 1908 rather than a later date (say, 1910) actually hurt the bot. The bot handles endgame tactics quite well.
Thank you, that's interesting color. I certainly could be wrong about details.
I will admit that I have only played one tournament game of Diplomacy myself, one time when I had a spare cycle at WBC. I have played a lot of games against both humans and primitive bots, but not against experts. I would be interested to try my hand at the next gunboat championships and see how that goes.
I can believe the bot handles full endgame tactics well, that makes sense, although I would have expected most experts to also handle them well - I'd expect less differentiation in either direction. The difference is that the existence of an endgame changes the nature of the previous parts of the game, rather than that I think they'll screw up the endgame itself.
On France in particular: Long term, yes, you need three armies. However, what is going to fall so soon, before you can build them back? LIV is yours. If you convoy you are basically assured LON, giving you +2. If you support BUR then BUR will hold for 1903 (time stamp 34:00 for map). Thus, the only risk on the continent is on A BEL-PIC, which is crazy risky, you can even lose BEL and there is no particular reason for a France not convoying not to try it - as France I am expecting almost entirely A HOL S A BEL, then either a supported attack into BUR or a supported move into RUH, 90%+. Now you go A BUR-PIC in the fall plus A MAR-GAS, Germany takes BUR, You build A PAR and A MAR. Then in the spring you go A PIC-BEL to cut, A PAR-BUR with 2 supports, you're untouchable and have the army in England to fight over LON/LIV together with your northern fleet while the other guards the channel and you are safely on 7+ centers while also not giving Italy an invitation. You have Russia on your side so at this point Germany has to deal, likely trading LIV to send France's armies south, or it can face a combined FRA-RUS-(ITA) attack with no real path to victory. What am I missing here? If the 10% does get pulled off and Germany gets into PIC, yeah that's not GOOD but BUR is still yours so you're not in danger right away and even if Germany guesses correctly (which is <50% since you can choose to double-guard or to guard only PAR or BRE) then you're still +1 with only two German units inside France, and you always have MAR to build.
The convoy to LIV accomplishes very little, because it means you don't get LON that turn, and then you're left with F IRI that does nothing except IRI-WAL-LON, IRI-ENG which is likely taken and not under attack, or IRI-MAO and then what? Whereas F IRI-LIV-CLY puts pressure on EDI which has no natural guardian, and which you can potentially support with the army, or lets you emerge into NWS and then be a real pain.
I am very curious why this would be wrong, would be worth quickly asking a few experts, and yes I've played too many no-press games while also not sharing any of the 'expert consensus' on how to play the mid-game, although I'm familiar with opening theory and stalemate lines.
I checked with a Diplomacy expert and the feedback was:
-France's attack on Belgium is a good move because there are a lot of things for NTH to do, and if NTH doesn't tap ENG then France gets Belgium back, keeps it in the fall, and the German army is probably blown up. (I don't think you do it 100% in this spot, but I think it's good to do it with some probability.)
-Convoying to Wales is a reasonable option too but it doesn't guarantee London in full-press because Germany can support England into London, which is worth it for both of them.
-Between the two, getting Belgium would be preferable because it takes centers away from the main competitor, whereas capturing English dots would probably just lead to England blowing up units in Scandinavia. Those units are heavily committed to Scandinavia and not a threat to France.
Hi Zvi, just following up on this. If you're satisfied with this expert feedback, would you mind issuing a correction? I find it unfortunate that a lot of your readers have already read the article and walked away with what I think is an inaccurate representation of the bot's performance.
If you're not satisfied yet then I'm happy to continue the conversation. But in that case can you add a note at the top of the article pointing readers to my comments?
I'd also be happy to play some no-press games against the bots some time, which would very quickly settle the question of who is better at tactics. Happy to commit to publishing the moves of all games.
You can play our 1v1 bot on https://webdiplomacy.net/ by launching a FvA variant game in the "Start an AI/Bot Game" option. The classic variant bot on the website isn't ours though. It's an earlier bot from MILA. We've open-sourced our code and models so I suspect it's just a matter of time before it ends up on webdip.
I tried to start what I thought were 2 no-press games against bots and it turns out they're 2-day periods with press despite being vs. 6 bots? Sigh. I suppose I can try the 1vs1.
Are you talking about the FvA variant or the classic variant? The bot on the site for the classic variant isn't ours. It's a bot from 2019 made by MILA.
Agreeing with Daphne_W at the LW discussion, my main update on this is to reinforce my belief that human intelligence is a much lower bar than it seems, because most people are operating at a low cognitive level for most tasks, most of the time. We don't need uniformly expert level performance from an AI to pass as human-equivalent, but we do need the General part. Training special-purpose bots therefore seems to be shooting fish in a barrel: we should expect AI to do well at this when the task is well defined enough for optimization approaches to work well. We instead need to assess progress on how AI agents do at generalist tasks where the objectives are fuzzy and feedback is indirect and noisy. Agreeing with Zvi, I'm not going to worry about AGI more as a result of this work.
If anything, I'd rather we played with Diplomacy-like AI while it's in infancy and we can learn a bunch of stuff. The alternative is to have all the mature tech and BAM, somebody applies it the right way.
I'm an author on the paper. I want to point out what I think are a few mistakes with the blog.
First, Zvi says "The strategic engine, as I evaluated it based on a sample game with six bots and a human, is mediocre at tactics and lousy at strategy." This is not the feedback we have gotten from expert Diplomacy players. The general consensus among expert Diplomacy players is that the strategy/tactics are extremely strong, perhaps expert level, but that there is room for improvement in the dialogue. I'm not familiar with Zvi's experience with the game of Diplomacy, but I did place 3rd in the North American championship this year and have learned a lot about the game of Diplomacy, including how humans play it, over the past 3 years, so I feel somewhat qualified to comment on Diplomacy strategy.
In particular, I disagree with Zvi's opinion on the bot's strategy. Zvi says "I hate France’s tactical play, both its actual plays and the communications with Russia that are based on its tactics, dating back to at least 1903. The move here to Irish Sea needs to be accompanied by a convoy of Picardy into London or Wales, fighting for Belgium here is silly."
It's well-known among experienced Diplomacy players that France needs 3 armies on its mainland in order to defend itself well from a hostile Germany. Moreover, a convoy to London or Wales is unnecessary here. By moving to Irish Sea, France is setting themselves up for the option to convoy directly into Liverpool next turn. England can't block it because they have no army on their mainland. In short, I feel pretty confident saying that France made the right move here. If Zvi still disagrees, we could pull in some consensus expert Diplomacy players to get their opinions on this.
Second, we did a 200-game tournament for no-press (that is, no-dialogue) Diplomacy back in January where players were informed that one of the players in each game was a bot. Our bot, Diplodocus, placed first in this tournament. There's a video on it here: https://www.youtube.com/watch?v=AWQFhYSD7h4&ab_channel=DiploStrats . You can see in the Youtube comments that one expert Diplomacy player (Sploack) described the bot as "currently the best gunboat player in existence, or at least in the top 5." The strategy/tactics in no-press Diplomacy don't match up perfectly with full-press, but they do carry over to some extent.
Third, Zvi criticizes the fixed 1908 end date as working to the bot's advantage. The no-press tournament mentioned above did not have fixed end dates. Also, I think the fact that the full-press games ended in 1908 rather than a later date (say, 1910) actually hurt the bot. The bot handles endgame tactics quite well.
Thank you, that's interesting color. I certainly could be wrong about details.
I will admit that I have only played one tournament game of Diplomacy myself, one time when I had a spare cycle at WBC. I have played a lot of games against both humans and primitive bots, but not against experts. I would be interested to try my hand at the next gunboat championships and see how that goes.
I can believe the bot handles full endgame tactics well, that makes sense, although I would have expected most experts to also handle them well - I'd expect less differentiation in either direction. The difference is that the existence of an endgame changes the nature of the previous parts of the game, rather than that I think they'll screw up the endgame itself.
On France in particular: Long term, yes, you need three armies. However, what is going to fall so soon, before you can build them back? LIV is yours. If you convoy you are basically assured LON, giving you +2. If you support BUR then BUR will hold for 1903 (time stamp 34:00 for map). Thus, the only risk on the continent is on A BEL-PIC, which is crazy risky, you can even lose BEL and there is no particular reason for a France not convoying not to try it - as France I am expecting almost entirely A HOL S A BEL, then either a supported attack into BUR or a supported move into RUH, 90%+. Now you go A BUR-PIC in the fall plus A MAR-GAS, Germany takes BUR, You build A PAR and A MAR. Then in the spring you go A PIC-BEL to cut, A PAR-BUR with 2 supports, you're untouchable and have the army in England to fight over LON/LIV together with your northern fleet while the other guards the channel and you are safely on 7+ centers while also not giving Italy an invitation. You have Russia on your side so at this point Germany has to deal, likely trading LIV to send France's armies south, or it can face a combined FRA-RUS-(ITA) attack with no real path to victory. What am I missing here? If the 10% does get pulled off and Germany gets into PIC, yeah that's not GOOD but BUR is still yours so you're not in danger right away and even if Germany guesses correctly (which is <50% since you can choose to double-guard or to guard only PAR or BRE) then you're still +1 with only two German units inside France, and you always have MAR to build.
The convoy to LIV accomplishes very little, because it means you don't get LON that turn, and then you're left with F IRI that does nothing except IRI-WAL-LON, IRI-ENG which is likely taken and not under attack, or IRI-MAO and then what? Whereas F IRI-LIV-CLY puts pressure on EDI which has no natural guardian, and which you can potentially support with the army, or lets you emerge into NWS and then be a real pain.
I am very curious why this would be wrong, would be worth quickly asking a few experts, and yes I've played too many no-press games while also not sharing any of the 'expert consensus' on how to play the mid-game, although I'm familiar with opening theory and stalemate lines.
I checked with a Diplomacy expert and the feedback was:
-France's attack on Belgium is a good move because there are a lot of things for NTH to do, and if NTH doesn't tap ENG then France gets Belgium back, keeps it in the fall, and the German army is probably blown up. (I don't think you do it 100% in this spot, but I think it's good to do it with some probability.)
-Convoying to Wales is a reasonable option too but it doesn't guarantee London in full-press because Germany can support England into London, which is worth it for both of them.
-Between the two, getting Belgium would be preferable because it takes centers away from the main competitor, whereas capturing English dots would probably just lead to England blowing up units in Scandinavia. Those units are heavily committed to Scandinavia and not a threat to France.
Hi Zvi, just following up on this. If you're satisfied with this expert feedback, would you mind issuing a correction? I find it unfortunate that a lot of your readers have already read the article and walked away with what I think is an inaccurate representation of the bot's performance.
If you're not satisfied yet then I'm happy to continue the conversation. But in that case can you add a note at the top of the article pointing readers to my comments?
I will update to make readers aware of the comments here.
OK, I have updated in several places to note your objections and additional information, including a pointer to the comment section.
I'd also be happy to play some no-press games against the bots some time, which would very quickly settle the question of who is better at tactics. Happy to commit to publishing the moves of all games.
You can play our 1v1 bot on https://webdiplomacy.net/ by launching a FvA variant game in the "Start an AI/Bot Game" option. The classic variant bot on the website isn't ours though. It's an earlier bot from MILA. We've open-sourced our code and models so I suspect it's just a matter of time before it ends up on webdip.
You can see the games the no-press bot played in the tournament here under Round 1 of the Meta Speedboat Tournament: https://webdiplomacy.net/tournaments.php?tab=Ongoing .
I tried to start what I thought were 2 no-press games against bots and it turns out they're 2-day periods with press despite being vs. 6 bots? Sigh. I suppose I can try the 1vs1.
You just need to hit the "ready" button. The bots enter orders pretty quickly. You have up to 3 days to enter your orders.
All right, yeah, managed to play a few years of a game, curiosity fully saturated. Saw a bunch of the same patterns from the other game, actually.
Are you talking about the FvA variant or the classic variant? The bot on the site for the classic variant isn't ours. It's a bot from 2019 made by MILA.
Agreeing with Daphne_W at the LW discussion, my main update on this is to reinforce my belief that human intelligence is a much lower bar than it seems, because most people are operating at a low cognitive level for most tasks, most of the time. We don't need uniformly expert level performance from an AI to pass as human-equivalent, but we do need the General part. Training special-purpose bots therefore seems to be shooting fish in a barrel: we should expect AI to do well at this when the task is well defined enough for optimization approaches to work well. We instead need to assess progress on how AI agents do at generalist tasks where the objectives are fuzzy and feedback is indirect and noisy. Agreeing with Zvi, I'm not going to worry about AGI more as a result of this work.
Published just yesterday (06 Dec) Noam Brown and Lex Fridman had a characteristically deep, thorough conversation:
Noam Brown: AI vs Humans in Poker and Games of Strategic Negotiation:
https://www.youtube.com/watch?v=2oHH4aClJQs
OUTLINE:
0:00 - Introduction
1:09 - No Limit Texas Hold 'em
5:02 - Solving poker
18:12 - Poker vs Chess
24:50 - AI playing poker
58:18 - Heads-up vs Multi-way poker
1:09:08 - Greatest poker player of all time
1:12:42 - Diplomacy game
1:22:33 - AI negotiating with humans
2:04:58 - AI in geopolitics
2:09:43 - Human-like AI for games
2:15:44 - Ethics of AI
2:19:57 - AGI
2:23:57 - Advice to beginners