Superhuman AIfor Multiplayer Poker PDF Free Download

1y ago

22 Views

1 Downloads

514.55 KB

7 Pages

Report/dmca

Download PDF

Transcription

R ES E A RC HRESEARCH ARTICLE COMPUTER SCIENCESuperhuman AI for multiplayer pokerNoam Brown1,2* and Tuomas Sandholm1,3,4,5*In recent years there have been great strides in artificial intelligence (AI), with games oftenserving as challenge problems, benchmarks, and milestones for progress. Poker has served fordecades as such a challenge problem. Past successes in such benchmarks, including poker,have been limited to two-player games. However, poker in particular is traditionally playedwith more than two players. Multiplayer games present fundamental additional issuesbeyond those in two-player games, and multiplayer poker is a recognized AI milestone. Inthis paper we present Pluribus, an AI that we show is stronger than top human professionalsin six-player no-limit Texas hold’em poker, the most popular form of poker played by humans.by, for example, trying to detect and exploitweaknesses in the opponent. A Nash equilibrium is a list of strategies, one for each player,in which no player can improve by deviating toa different strategy. Nash equilibria have beenproven to exist in all finite games—and manyinfinite games—though finding an equilibriummay be difficult.Two-player zero-sum games are a special classof games in which Nash equilibria also have anextremely useful additional property: Any playerwho chooses to use a Nash equilibrium is guaranteed to not lose in expectation no matter whatthe opponent does (as long as one side doesnot have an intrinsic advantage under the gamerules, or the players alternate sides). In otherwords, a Nash equilibrium strategy is unbeatable in two-player zero-sum games that satisfythe above criteria. For this reason, to “solve”a two-player zero-sum game means to find anTheoretical and practical challenges ofmultiplayer gamesAI systems have reached superhuman performance in games such as checkers (7), chess (8),two-player limit poker (4), Go (9), and twoplayer no-limit poker (6). All of these involveonly two players and are zero-sum games (meaning that whatever one player wins, the otherplayer loses). Every one of those superhuman AIsystems was generated by attempting to approximate a Nash equilibrium strategy rather than1Computer Science Department, Carnegie Mellon University,Pittsburgh, PA 15213, USA. 2Facebook AI Research, NewYork, NY 10003, USA. 3Strategic Machine, Inc., Pittsburgh,PA 15213, USA. 4Strategy Robot, Inc., Pittsburgh, PA 15213,USA. 5Optimized Markets, Inc., Pittsburgh, PA 15213, USA.*Corresponding author. Email: noamb@cs.cmu.edu (N.B.);sandholm@cs.cmu.edu (T.S.)Brown et al., Science 365, 885–890 (2019)Fig. 1. An example of the equilibrium selection problem. In the Lemonade Stand Game, playerssimultaneously choose a point on a ring and want to be as far away as possible from any otherplayer. In every Nash equilibrium, players are spaced uniformly around the ring. There are infinitelymany such Nash equilibria. However, if each player independently chooses one Nash equilibrium toplay, their joint strategy is unlikely to be a Nash equilibrium. (Left) An illustration of three differentNash equilibria in this game, distinguished by three different colors. (Right) Each playerindependently chooses one Nash equilibrium. Their joint strategy is not a Nash equilibrium.30 August 20191 of 6Downloaded from http://science.sciencemag.org/ on September 2, 2019Poker has served as a challenge problemfor the fields of artificial intelligence (AI)and game theory for decades (1). In fact,the foundational papers on game theoryused poker to illustrate their concepts(2, 3). The reason for this choice is simple: Noother popular recreational game captures thechallenges of hidden information as effectivelyand as elegantly as poker. Although poker hasbeen useful as a benchmark for new AI andgame-theoretic techniques, the challenge ofhidden information in strategic settings is notlimited to recreational games. The equilibriumconcepts of von Neumann and Nash have beenapplied to many real-world challenges such asauctions, cybersecurity, and pricing.The past two decades have witnessed rapidprogress in the ability of AI systems to play increasingly complex forms of poker (4–6). However, all prior breakthroughs have been limitedto settings involving only two players. Developing a superhuman AI for multiplayer poker wasthe widely recognized main remaining milestone.In this paper we describe Pluribus, an AI capable of defeating elite human professionals in sixplayer no-limit Texas hold’em poker, the mostcommonly played poker format in the world.exact Nash equilibrium. For example, the Nashequilibrium strategy for Rock-Paper-Scissors isto randomly pick Rock, Paper, or Scissors withequal probability. Against such a strategy, thebest that an opponent can do in expectation istie (10). In this simple case, playing the Nashequilibrium also guarantees that the playerwill not win in expectation. However, in morecomplex games, even determining how to tieagainst a Nash equilibrium may be difficult; ifthe opponent ever chooses suboptimal actions,then playing the Nash equilibrium will indeedresult in victory in expectation.In principle, playing the Nash equilibrium canbe combined with opponent exploitation by initially playing the equilibrium strategy and thenover time shifting to a strategy that exploits theopponent’s observed weaknesses (for example,by switching to always playing Paper against anopponent that always plays Rock) (11). However,except in certain restricted ways (12), shifting toan exploitative nonequilibrium strategy opensoneself up to exploitation because the opponentcould also change strategies at any moment.Additionally, existing techniques for opponentexploitation require too many samples to be competitive with human ability outside of small games.Pluribus plays a fixed strategy that does not adaptto the observed tendencies of the opponents.Although a Nash equilibrium strategy is guaranteed to exist in any finite game, efficient algorithms for finding one are only proven toexist for special classes of games, among whichtwo-player zero-sum games are the most prominent. No polynomial-time algorithm is knownfor finding a Nash equilibrium in two-playernon-zero-sum games, and the existence of onewould have sweeping surprising implications incomputational complexity theory (13, 14). Findinga Nash equilibrium in zero-sum games withthree or more players is at least as hard (because

R ES E A RC H R E S EA R C H A R T I C LEbe a specific game-theoretic solution conceptbut rather to create an AI that empiricallyconsistently defeats human opponents, including elite human professionals.The algorithms that we used to constructPluribus, discussed in the next two sections,are not guaranteed to converge to a Nash equilibrium outside of two-player zero-sum games.Nevertheless, we observe that Pluribus plays astrong strategy in multiplayer poker that iscapable of consistently defeating elite humanprofessionals. This shows that even though thetechniques do not have known strong theoretical guarantees on performance outside of thetwo-player zero-sum setting, they are neverthelesscapable of producing superhuman strategies ina wider class of strategic settings.Description of PluribusThe core of Pluribus’s strategy was computedthrough self-play, in which the AI plays againstcopies of itself, without any data of human orprior AI play used as input. The AI starts fromscratch by playing randomly and gradually improves as it determines which actions, and whichprobability distribution over those actions, leadto better outcomes against earlier versions of itsstrategy. Forms of self-play have previously beenused to generate powerful AIs in two-player zerosum games such as backgammon (18), Go (9, 19),Dota 2 (20), StarCraft 2 (21), and two-playerpoker (4–6), though the precise algorithms thatwere used have varied widely. Although it iseasy to construct toy games with more than twoplayers in which commonly used self-play algorithms fail to converge to a meaningful solution(22), in practice self-play has nevertheless beenshown to do reasonably well in some gameswith more than two players (23).Fig. 2. A game tree traversal via Monte Carlo CFR. In this figure,player P1 is traversing the game tree. (Left) A game is simulateduntil an outcome is reached. (Middle) For each P1 decision pointencountered in the simulation in the left panel, P1 exploreseach other action that P1 could have taken and plays out a simulationto the end of the game. P1 then updates its strategy to pick actionswith higher payoff with higher probability. (Right) P1 explores eachother action that P1 could have taken at every new decision pointBrown et al., Science 365, 885–890 (2019)30 August 2019Pluribus’s self-play produces a strategy for theentire game offline, which we refer to as theblueprint strategy. Then during actual play againstopponents, Pluribus improves upon the blueprintstrategy by searching for a better strategy in realtime for the situations in which it finds itselfduring the game. In subsections below, we discuss both of those phases in detail, but first wediscuss abstraction, forms of which are used inboth phases to make them scalable.Abstraction for large imperfectinformation gamesThere are far too many decision points in no-limitTexas hold’em to reason about individually. Toreduce the complexity of the game, we eliminatesome actions from consideration and also bucketsimilar decision points together in a process calledabstraction (24, 25). After abstraction, the bucketeddecision points are treated as identical. We usetwo kinds of abstraction in Pluribus: action abstraction and information abstraction.Action abstraction reduces the number of different actions the AI needs to consider. No-limitTexas hold’em normally allows any whole-dollarbet between 100 and 10,000. However, inpractice there is little difference between betting 200 and betting 201. To reduce the complexityof forming a strategy, Pluribus only considersa few different bet sizes at any given decisionpoint. The exact number of bets it considersvaries between 1 and 14 depending on the situation. Although Pluribus can limit itself to onlybetting one of a few different sizes between 100 and 10,000, when actually playing no-limitpoker, the opponents are not constrained to thosefew options. What happens if an opponent bets 150 while Pluribus has only been trained to consider bets of 100 or 200? Generally, Pluribusencountered in the middle panel, and P1 updates its strategy at thosehypothetical decision points. This process repeats until no new P1decision points are encountered, which in this case is after threesteps but in general may be more. Our implementation of MCCFR(described in the supplementary materials) is equivalent but traversesthe game tree in a depth-first manner. (The percentages in the figureare for illustration purposes only and may not correspond to actualpercentages that the algorithm would compute.)2 of 6Downloaded from http://science.sciencemag.org/ on September 2, 2019a dummy player can be added to the two-playergame to make it a three-player zero-sum game).Even approximating a Nash equilibrium is hard(except in special cases) in theory (15), and ingames with more than two players, even the bestcomplete algorithm can only address games witha handful of possible strategies per player (16).Moreover, even if a Nash equilibrium could becomputed efficiently in a game with more thantwo players, it is not clear that playing such anequilibrium strategy would be wise. If eachplayer in such a game independently computesand plays a Nash equilibrium, the list of strategies that they play (one strategy per player)may not be a Nash equilibrium and playersmight have an incentive to deviate to a differentstrategy. One example of this is the LemonadeStand Game (17), illustrated in Fig. 1, in whicheach player simultaneously picks a point on aring and wants to be as far away as possiblefrom any other player. The Nash equilibrium isfor all players to be spaced uniformly along thering, but there are infinitely many ways this canbe accomplished and therefore infinitely manyNash equilibria. If each player independentlycomputes one of those equilibria, the joint strategy is unlikely to result in all players beingspaced uniformly along the ring. Two-playerzero-sum games are a special case where evenif the players independently compute and selectNash equilibria, the list of strategies is still aNash equilibrium.The shortcomings of Nash equilibria outsideof two-player zero-sum games, and the failureof any other game-theoretic solution concept toconvincingly overcome them, have raised thequestion of what the right goal should even bein such games. In the case of six-player poker,we take the viewpoint that our goal should not

R ES E A RC H R E S EA R C H A R T I C LESelf-play through improved Monte Carlocounterfactual regret minimizationThe blueprint strategy in Pluribus was computedusing a variant of counterfactual regret minimization (CFR) (29). CFR is an iterative self-playalgorithm in which the AI starts by playing completely at random but gradually improves bylearning to beat earlier versions of itself. Everycompetitive Texas hold’em AI for at least thepast 6 years has computed its strategy usingsome variant of CFR (4–6, 23, 28, 30–34). Weuse a form of Monte Carlo CFR (MCCFR) thatsamples actions in the game tree rather thantraversing the entire game tree on each iteration(33, 35–37).On each iteration of the algorithm, MCCFRdesignates one player as the traverser whosecurrent strategy is updated on the iteration. Atthe start of the iteration, MCCFR simulates ahand of poker based on the current strategy ofall players (which is initially completely random).Once the simulated hand is completed, the AIreviews each decision that was made by thetraverser and investigates how much betteror worse it would have done by choosing theother available actions instead. Next, the AIreviews each hypothetical decision that wouldhave been made following those other availableactions and investigates how much better itwould have done by choosing the other available actions, and so on. This traversal of the gametree is illustrated in Fig. 2. Exploring other hypothetical outcomes is possible because the AIknows each player’s strategy for the iterationand can therefore simulate what would haveFig. 3. Perfect-information game search in Rock-Paper-Scissors. (Top) A sequential representationof Rock-Paper-Scissors in which player 1 acts first but does not reveal her action to player 2, whoacts second. The dashed lines between the player 2 nodes signify that player 2 does not knowwhich of those nodes he is in. The terminal values are shown only for player 1. (Bottom) A depictionof the depth-limited subgame if player 1 conducts search (with a depth of one) using the sameapproach as is used in perfect-information games. The approach assumes that after each action,player 2 will play according to the Nash equilibrium strategy of choosing Rock, Paper, and Scissorswith 31 probability each. This results in a value of zero for player 1 regardless of her strategy.Brown et al., Science 365, 885–890 (2019)30 August 2019happened had some other action been choseninstead. This counterfactual reasoning is one ofthe features that distinguishes CFR from otherself-play algorithms that have been deployedin domains such as Go (9), Dota 2 (20), andStarCraft 2 (21).The difference between what the traverserwould have received for choosing an actionversus what the traverser actually achieved (inexpectation) on the iteration is added to thecounterfactual regret for the action. Counterfactual regret represents how much the traverser regrets having not chosen that actionin previous iterations. At the end of the iteration, the traverser’s strategy is updated so thatactions with higher counterfactual regret arechosen with higher probability.For two-player zero-sum games, CFR guarantees that the average strategy played over alliterations converges to a Nash equilibrium, butconvergence to a Nash equilibrium is not guaranteed outside of two-player zero-sum games.Nevertheless, CFR guarantees in all finite gamesthat all counterfactual regrets grow sublinearlyin the number of iterations. This, in turn, guarantees in the limit that the average performanceof CFR on each iteration that was played matchesthe average performance of the best single fixedstrategy in hindsight. CFR is also proven to eliminate iteratively strictly dominated actions in allfinite games (23).Because the difference between counterfactualvalue and expected value is added to counterfactual regret rather than replacing it, the firstiteration in which the agent played completelyrandomly (which is typically a very bad strategy) still influences the counterfactual regrets,and therefore the strategy that is played, foriterations far into the future. In the originalform of CFR, the influence of this first iteration decays at a rate of T1 , where T is the numberof iterations played. To more quickly decay theinfluence of these early “bad” iterations, Pluribususes a recent form of CFR called Linear CFR(38) in early iterations. (We stop the discounting after that because the time cost of doing themultiplications with the discount factor is notworth the benefit later on.) Linear CFR assignsa weight of T to the regret contributions ofiteration T. Therefore, the influence of the first¼ T ðT2þ1Þ .iteration decays at a rate of PT1t¼1 tThis leads to the strategy improving more quicklyin practice while still maintaining a near-identicalworst-case bound on total regret. To speed upthe blueprint strategy computation even further,actions with extremely negative regret are notexplored in 95% of iterations.The blueprint strategy for Pluribus was computed in 8 days on a 64-core server for a totalof 12,400 CPU core hours. It required less than512 GB of memory. At current cloud computingspot instance rates, this would cost about 144to produce. This is in sharp contrast to all theother recent superhuman AI milestones for games,which used large numbers of servers and/or farmsof graphics processing units (GPUs). More memory3 of 6Downloaded from http://science.sciencemag.org/ on September 2, 2019will rely on its search algorithm (described in alater section) to compute a response in real timeto such “off-tree” actions.The other form of abstraction that we use inPluribus is information abstraction, in whichdecision points that are similar in terms of whatinformation has been revealed (in poker, theplayer’s cards and revealed board cards) arebucketed together and treated identically (26–28).For example, a 10-high straight and a 9-highstraight are distinct hands but are neverthelessstrategically similar. Pluribus may bucket thesehands together and treat them identically, therebyreducing the number of distinct situations forwhich it needs to determine a strategy. Information abstraction drastically reduces the complexity of the game, but it may wash away subtledifferences that are important for superhumanperformance. Therefore, during actual play againsthumans, Pluribus uses information abstractiononly to reason about situations on future bettingrounds, never the betting round that it is actuallyin. Information abstraction is also applied duringoffline self-play.

R ES E A RC H R E S EA R C H A R T I C LEand computation would enable a finer-grainedblueprint that would lead to better performancebut would also result in Pluribus using morememory or being slower during real-time search.We set the size of the blueprint strategy abstraction to allow Pluribus to run during live play on amachine with no more than 128 GB of memorywhile storing a compressed form of the blueprintstrategy in memory.Depth-limited search inimperfect-information gamesThe blueprint strategy for the entire game isnecessarily coarse-grained owing to the size andcomplexity of no-limit Texas hold’em. Pluribusonly plays according to this blueprint strategyin the first betting round (of four), where thenumber of decision points is small enough thatthe blueprint strategy can afford to not use information abstraction and have a lot of actionsin the action abstraction. After the first round(and even in the first round if an opponentchooses a bet size that is sufficiently differentfrom the sizes in the blueprint action abstraction), Pluribus instead conducts real-time searchto determine a better, finer-grained strategyfor the current situation it is in. For opponentbets on the first round that are slightly off thetree, Pluribus rounds the bet to a nearby ontree size [using the pseudoharmonic mapping(39)] and proceeds to play according to theblueprint as if the opponent had used the latterbet size.Real-time search has been necessary forachieving superhuman performance in manyperfect-information games, including backgammon (18), chess (8), and Go (9, 19). For example,when determining their next move, chess AIscommonly look some number of moves aheadBrown et al., Science 365, 885–890 (2019)that real-time search is conducted). The leaf nodes are replaced by asequence of new nodes in which each player still in the hand choosesamong k actions, with no player first observing what another playerchooses. For simplicity, k ¼ 2 in the figure. In Pluribus, k ¼ 4. Each action inthat sequence corresponds to a selection of a continuation strategy for thatplayer for the remainder of the game. This effectively leads to a terminalnode (whose value is estimated by rolling out the remainder of the gameaccording to the list of continuation strategies that the players chose).until a leaf node is reached at the depth limit ofthe algorithm’s lookahead. An evaluation function then estimates the value of the board configuration at the leaf node if both players wereto play a Nash equilibrium from that pointforward. In principle, if an AI could accuratelycalculate the value of every leaf node (e.g., win,draw, or loss), this algorithm would choose theoptimal next move.However, search as has been done in perfectinformation games is fundamentally brokenwhen applied to imperfect-information games.For example, consider a sequential form of RockPaper-Scissors, illustrated in Fig. 3, in whichplayer 1 acts first but does not reveal her action toplayer 2, followed by player 2 acting. If player1 were to conduct search that looks just one moveahead, every one of her actions would appear tolead to a leaf node with zero value. After all, ifplayer 2 plays the Nash equilibrium strategy ofchoosing each action with 31 probability, thevalue to player 1 of choosing Rock is zero, asis the value of choosing Scissors. So player 1’ssearch algorithm could choose to always playRock because, given the values of the leaf nodes,this appears to be equally good as any otherstrategy.Indeed, if player 2’s strategy were fixed to always playing the Nash equilibrium, always playing Rock would be an optimal player 1 strategy.However, in reality, player 2 could adjust to astrategy of always playing Paper. In that case,the value of always playing Rock would actually be 1.This example illustrates that in imperfectinformation subgames (the part of the gamein which search is being conducted) (40), leafnodes do not have fixed values. Instead, theirvalues depend on the strategy that the searcher30 August 2019chooses in the subgame (that is, the probabilitiesthat the searcher assigns to his actions in thesubgame). In principle, this could be addressedby having the value of a subgame leaf node bea function of the searcher’s strategy in the subgame, but this is impractical in large games.One alternative is to make the value of a leafnode conditional only on the belief distribution of both players at that point in the game.This was used to generate the two-player pokerAI DeepStack (5). However, this option is extremely expensive because it requires one tosolve huge numbers of subgames that are conditional on beliefs. It becomes even more expensive as the amount of hidden informationor the number of players grows. The two-playerpoker AI Libratus sidestepped this issue by onlydoing real-time search when the remaining gamewas short enough that the depth limit would extend to the end of the game (6). However, as thenumber of players grows, always solving to theend of the game also becomes computationallyprohibitive.Pluribus instead uses a modified form of anapproach that we recently designed—previouslyonly for two-player zero-sum games (41)—inwhich the searcher explicitly considers thatany or all players may shift to different strategies beyond the leaf nodes of a subgame. Specifically, rather than assuming that all playersplay according to a single fixed strategy beyondthe leaf nodes (which results in the leaf nodeshaving a single fixed value), we instead assumethat each player may choose between k different strategies, specialized to each player, toplay for the remainder of the game when aleaf node is reached. In the experiments in thispaper, k ¼ 4 . One of the four continuationstrategies that we use in the experiments is the4 of 6Downloaded from http://science.sciencemag.org/ on September 2, 2019Fig. 4. Real-time search in Pluribus. The subgame shows just two playersfor simplicity. A dashed line between nodes indicates that the player toact does not know which of the two nodes she is in. (Left) The originalimperfect-information subgame. (Right) The transformed subgame that issearched in real time to determine a player’s strategy. An initial chancenode reaches each root node according to the normalized probability thatthe node is reached in the previously computed strategy profile (oraccording to the blueprint strategy profile if this is the first time in the hand

R ES E A RC H R E S EA R C H A R T I C LEWhen playing, Pluribus runs on two IntelHaswell E5-2695 v3 CPUs and uses less than128 GB of memory. For comparison, AlphaGoused 1920 CPUs and 280 GPUs for real-timesearch in its 2016 matches against top Go professional Lee Sedol (43), Deep Blue used 480custom-designed chips in its 1997 matches againsttop chess professional Garry Kasparov (8), andLibratus used 100 CPUs in its 2017 matchesagainst top professionals in two-player poker(6). The amount of time that Pluribus takes toconduct search on a single subgame varies between 1 and 33 s, depending on the particularsituation. On average, Pluribus plays at a rateof 20 s per hand when playing against copies ofitself in six-player poker. This is roughly twiceas fast as professional humans tend to play.Experimental evaluationprecomputed blueprint strategy; another is amodified form of the blueprint strategy in whichthe strategy is biased toward folding; anotheris the blueprint strategy biased toward calling; and the final option is the blueprint strategy biased toward raising. This technique resultsin the searcher finding a strategy that is morebalanced because choosing an unbalanced strategy (e.g., always playing Rock in Rock-PaperScissors) would be punished by an opponentshifting to one of the other continuation strategies (e.g., always playing Paper).Another major challenge of search in imperfectinformation games is that a player’s optimalstrategy for a particular situation depends onwhat the player’s strategy is for every situationthe player could be in from the perspective ofher opponents. For example, suppose the playeris holding the best possible hand. Betting in thissituation could be a good action. But if theplayer bets in this situation only when holding the best possible hand, then the opponentswould know that they should always fold inresponse.Brown et al., Science 365, 885–890 (2019)To cope with this challenge, Pluribus keepstrack of the probability it would have reachedthe current situation with each possible handaccording to its strategy. Regardless of whichhand Pluribus is actually holding, it will firstcalculate how it would act with every possiblehand, being careful to balance its strategy acrossall the hands so as to remain unpredictable tothe opponent. Once this balanced strategy acrossall hands is computed, Pluribus then executesan action for the hand it is actually holding.The structure of a depth-limited imperfectinformation subgame as used in Pluribus isshown in Fig. 4.Pluribus used one of two different forms ofCFR to compute a strategy in the subgame,depending on the size of the subgame and thepart of the game. If the subgame is relativelylarge or it is early in the game, then MonteCarlo Linear CFR is used just as it was for theblueprint strategy computation. Otherwise,Pluribus uses an optimized vector-based formof Linear CFR (38) that samples only chanceevents (such as board cards) (42).30 August 20195 of 6Downloaded from http://science.sciencemag.org/ on September 2, 2019Fig. 5. Performance of Pluribus in the 5 humans 1 AI experiment. The dots show Pluribus'sperformance at the end of each day of play. (Top) The lines show the win rate (solid line) plus orminus the standard error (dashed lines). (Bottom) The lines show the cumulative number of mbbswon (solid line) plus or minus the standard error (dashed lines). The relatively steady performance ofPluribus over the course of the 10,000-hand experiment also suggests that the humans were unableto find exploitable weaknesses in the bot.We evaluated Pluribus against elite human professionals in two formats: five human professionalsplaying with one copy of Pluribus (5H 1AI), andone human professional playing with five copiesof Pluribus (1H 5AI). Each human participanthas won more than 1 million playing pokerprofessionally. Performance was measured byusing the standard metric in this field of AI,milli big blinds per game (mbb/game). This measures how many big blinds (the initial money thesecond player must put into the pot) were wonon average per thousand hands of poker. In allexperiments, we used the variance-reductiontechnique AIVAT (44) to reduce the luck factorin the game (45) and measured

Superhuman AIfor multiplayer poker Noam Brown1,2* and Tuomas Sandholm1,3,4,5* In recent years there have been great strides in artificial intelligence (AI),with games often serving as challenge problems, benchmarks, and milestones for progress. Poker has served for decades as such a