Poker-AI.org

Poker AI and Botting Discussion Forum
It is currently Mon Nov 13, 2023 5:23 pm

All times are UTC




Post new topic Reply to topic  [ 41 posts ]  Go to page 1, 2, 3  Next
Author Message
 Post subject: Very sub-optimal play
PostPosted: Tue Jul 16, 2013 4:41 pm 
Offline
Regular Member

Joined: Sun Mar 03, 2013 11:55 am
Posts: 64
After many years of exploitive bots, I thought I'd have a play with optimal bots. I have yet to produce a bot worth shouting about, let alone unleash on the world, so this is a hypothetical question.

So, optimal bots are produced by some form of strategy convergence. That is, they devise strategies to beat an existing strategy until they can't improve any more. That produces some form of game tree which tells us what to do in each situation. The theory is that we are unbeatable, so if someone plays our strategy, it's a draw, if not, they lose.

So let's say I've build my 400TB game tree for full ring NL, we sit down in EP, UTG fires off a 200BB open shove.

We take a look at our game tree to work out what we should do, but this action is so sub-optimal, that no node exists. It's not something we'd ever do, and therefore we don't know how to respond to it. How do we deal with this kind of situation if an optimal bot is unleashed on the real world?


Top
 Profile  
 
PostPosted: Tue Jul 16, 2013 5:38 pm 
Offline
Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437
Well, if you solved the full game, it would account for that opponent action, albeit suboptimal. However, for abstract games, you would do state translation if a branch doesn't exist in your game.


Top
 Profile  
 
PostPosted: Tue Jul 16, 2013 6:58 pm 
Offline
Regular Member

Joined: Sun Mar 03, 2013 11:55 am
Posts: 64
I can't get my head around how we'd solve that, even in the full game tree.

The situation in my head is this:
For us to solve it, we have to be able to put him on a range, but we assume everyone is playing optimally. As no optimal player makes that move, there's no range we can attribute to that action, and therefore it can't be solved.

Please explain where my thought process is going wrong.


Top
 Profile  
 
PostPosted: Tue Jul 16, 2013 7:00 pm 
Offline
Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437
In the full game tree there are sub-optimal actions, there are all actions in the full game tree. I doesn't mean your strategy will converge to use them all, though. ;)


Top
 Profile  
 
PostPosted: Tue Jul 16, 2013 7:03 pm 
Offline
Regular Member

Joined: Sun Mar 03, 2013 11:55 am
Posts: 64
I understand that, but if an action is never taken, how do we attribute a range to that action to be able to solve it?


Top
 Profile  
 
PostPosted: Tue Jul 16, 2013 7:09 pm 
Offline
Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437
You still probe the action....


Top
 Profile  
 
PostPosted: Tue Jul 16, 2013 8:32 pm 
Offline
Regular Member

Joined: Sun Mar 03, 2013 11:55 am
Posts: 64
But what would the range of that action be?


Top
 Profile  
 
PostPosted: Tue Jul 16, 2013 8:39 pm 
Offline
Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437
What algorithm are you using, CS-CFRM?


Top
 Profile  
 
PostPosted: Tue Jul 16, 2013 8:48 pm 
Offline
Regular Member

Joined: Sun Mar 03, 2013 11:55 am
Posts: 64
This is purely hypothetical at the mo. I don't understand how we can solve a node that has no range associated with it.


Top
 Profile  
 
PostPosted: Tue Jul 16, 2013 9:32 pm 
Offline
Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437
I can only answer pragmatically -- look at probing CFRM.

If s(i) = 0 then there is no strategy update, but you still traverse the tree and update child nodes.


Top
 Profile  
 
PostPosted: Tue Jul 16, 2013 9:52 pm 
Offline
Veteran Member

Joined: Mon Mar 04, 2013 9:40 pm
Posts: 269
why is that difficult to solve? You assume the worst until told otherwise as an unknown..so your range is something like AA-TT, AQ, AK

Oh..nevermind..if they are playing optimally how to do you solve it. I keep forgetting you guys are going for the NE which is way way over my head...


Top
 Profile  
 
PostPosted: Wed Jul 17, 2013 8:49 am 
Offline
Site Admin
User avatar

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 642
Maybe you could get an answer to your question by using amax's cfrm code to solve one card poker.


Top
 Profile  
 
PostPosted: Wed Jul 17, 2013 9:30 am 
Offline
Regular Member

Joined: Sun Mar 03, 2013 11:55 am
Posts: 64
I've got a 1 card poker solver that I wrote. But in 1 card poker there are no actions that are never taken. Every node therefore has an associated range and can be solved.

I'm working 'outside the box' as they say here. Designing a new method of machine learning that specifically aims to (pseudo) solve complex high variance games with massive game trees. But I can't figure out in my head how to deal with this problem.

My eventual aim is to produce a decent 6 Max optimal bot.


Top
 Profile  
 
PostPosted: Wed Jul 17, 2013 11:10 am 
Offline
Site Admin
User avatar

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 642
OneDayItllWork wrote:

But in 1 card poker there are no actions that are never taken. Every node therefore has an associated range and can be solved.


I think http://www.cs.cmu.edu/~ggordon/poker/ disagrees. player 1 never bets on the first round if he holds a 6

- Or do you mean actions that are never taken regardless of holding?
- Could you force your algorithm to take all actions at least once at the beginning of training?


Top
 Profile  
 
PostPosted: Wed Jul 17, 2013 11:38 am 
Offline
Regular Member

Joined: Sun Mar 03, 2013 11:55 am
Posts: 64
spears wrote:
Or do you mean actions that are never taken regardless of holding??

This

If I was to force it to take an action, that still wouldn't be a realistic range for that action, so that node still wouldn't be solvable.

The pseudo optimal strategy needs to perform the action for us to have a pseudo optimal range on that node. If that action is never taken by the strategy, that node can't possibly have a range.


Top
 Profile  
 
PostPosted: Wed Jul 17, 2013 12:00 pm 
Offline
Site Admin
User avatar

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 642
OneDayItllWork wrote:
If I was to force it to take an action, that still wouldn't be a realistic range for that action, so that node still wouldn't be solvable.

The pseudo optimal strategy needs to perform the action for us to have a pseudo optimal range on that node. If that action is never taken by the strategy, that node can't possibly have a range.


I think CFRM and Fictitious Play both force actions to be taken at least once in the way the action probabilities are initialised. As the algorithm runs the probability of unrealistic actions drops quickly, but the response action to the unrealistic action is still in place. If your algorithm is significantly different to CFRM or Fictitious Play then I can see you might have a problem, but without knowing more about the algorithm it is difficult to comment.

Anyway, another suggestion: just use the response of a "nearby" node


Top
 Profile  
 
PostPosted: Wed Jul 17, 2013 2:10 pm 
Offline
Regular Member

Joined: Sun Mar 03, 2013 11:55 am
Posts: 64
Ok, so let's say an action is forced, and Mr_X shoves 32o UTG, then everyone folds to us on the BB. Our optimal response to that is to call with everything. If that's the only time the action is forced, then our response to call with everything will remain in place. While their initial move was definitely suboptimal, so it our response, and it'll never be updated.

Would this problem also occur with fictitious play / CFRM?


Top
 Profile  
 
PostPosted: Wed Jul 17, 2013 2:49 pm 
Offline
Site Admin
User avatar

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 642
In CFRM and FP the action probabilities don't jump from their initial values to 0 or 1 in one iteration so I don't think what you describe actually happens. There has to be some advantage to such a slow method :D


Top
 Profile  
 
PostPosted: Wed Jul 17, 2013 2:55 pm 
Offline
Site Admin
User avatar

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 642
Quote:
The theory is that we are unbeatable, so if someone plays our strategy, it's a draw, if not, they lose.

I don't know if this is relevant to the discussion, but this isn't quite true. There are lots of strategies that will draw with a NE strategy.

Also, I don't know if having more than 2 players affects things either.


Top
 Profile  
 
PostPosted: Wed Jul 17, 2013 3:21 pm 
Offline
Regular Member

Joined: Sun Mar 03, 2013 11:55 am
Posts: 64
spears wrote:
In CFRM and FP the action probabilities don't jump from their initial values to 0 or 1 in one iteration so I don't think what you describe actually happens. There has to be some advantage to such a slow method :D

Either way, we're still not going to know what the correct response is to a shove UTG.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 41 posts ]  Go to page 1, 2, 3  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Powered by phpBB® Forum Software © phpBB Group