Poker-AI.org

Poker AI and Botting Discussion Forum
It is currently Mon Nov 13, 2023 12:29 pm

All times are UTC




Post new topic Reply to topic  [ 11 posts ] 
Author Message
PostPosted: Sun Mar 17, 2013 4:36 pm 
Offline
Junior Member

Joined: Fri Mar 15, 2013 11:55 am
Posts: 24
I was writing this post as a response in another thread, and then suddenly it grew so big I thought it might be better to open a separate thread.

I've often been questioning the usefulness behind mathematically driven models that try to find the perfect equilibrium of a strategy, especially in No Limit poker. This is due to several reasons, one of them being the fact that I like an exploitive style where you take advantage of weaknesses in your opponent. The second one being that I doubt you can find the perfect equilibrium, as the game is just too dynamic.

I offload the burden of finding the optimal strategy - which can be either exploitive or balanced, depending on the opponent - to the range computation. In most situations, I have a pretty good idea of what ranges should look like based on what we know. I think the strategy is quite close to being non-exploitable (and thus the nash equilibrium) in the situations it wants/needs to be.

A good example would be pre-flop 3-betting behavior. You can either approach it mathematically, or empirically through statistics & experience. Of course, the latter approach results in faster computations and it's also easier to adjust the behavior to get maximum expected value.

In that context, I want to say a few words on ranges that are "theoretically" operated by some mathematically driven models.

There is for instance no semi-bluffs range for an advanced player. Of course, he knows the concept, but a bluff is a bluff; you either raise because you want someone to fold, or because you want him to call. In the case of a semi-bluff, the additional equity just increases your odds in case you get a call instead of the fold you wanted. However, all your semi-bluffs will be in your bluffs range as far as the reasoning is concerned. You will choose hands to bluff that are too weak to call/raise, but should still be played according to your overall range.

The same is true for thin value. You either have a value bet, or you don't. Thin value is nothing but the diffuse portion between the checking/calling range and the betting/raising range, where you want to vary your approach randomly to a certain extent, based on many factors. What good players call thin value usually is a situation where they know they can get some more value because of certain reads - it's not a mathematical concept, which makes it seem strange that you would use it in such a mathematically driven approach.

The whole idea behind this is of course the polarization of your range. The more hands you raise from the bottom of your range, the more you will have to raise from the top of your range. If you want to apply a non-polarized strategy, you will have to significantly decrease your aggression and narrow down your range if you're playing against a good player or a balanced strategy.

Either way, there are several points of equilibrium depending on your degree of polarization, and that's where the whole issue seems to hinge in my opinion. The factors for the amount of aggression / polarization are so complex that I doubt you can take all of them into account in an "exhaustive" model. How do you want to find the equilibrium?

Again, as an example the pre-flop 3-betting. There isn't a non-exploitable 3-betting range. A good range is usually around 30%, but this will vary widely depending on your opponent. The response to calls and 4-bets also varies, even more so, depending on your opponents approach. You either 3-bet very rarely - losing a lot of EV - or you have to adjust and play exploitive and exploitable to an extent.

In the end, I just believe that an unexploitable style can never be a winning one, so what is the point of finding this so much sought-after equilibrium?

I'm looking forward very much to elaborations by more experienced botters / researchers than me to shed some light on what I know to be a very naive view on the domain.


Top
 Profile  
 
PostPosted: Sun Mar 17, 2013 5:46 pm 
Offline
Junior Member

Joined: Tue Mar 05, 2013 1:27 pm
Posts: 16
Heuristics wrote:
I doubt you can find the perfect equilibrium, as the game is just too dynamic.


Do you think that the perfect equilibrium is unique? In the case of Kuhn poker for example, it exists an infinite number of equilibrium. Furthermore, what do you mean by dynamic ? If you mean your opponent can change strategy, it doesn't affect the way you compute an equilibrium. So it cannot be a reason to not find one.

Heuristics wrote:
I offload the burden of finding the optimal strategy - which can be either exploitive or balanced, depending on the opponent - to the range computation. In most situations, I have a pretty good idea of what ranges should look like based on what we know. I think the strategy is quite close to being non-exploitable (and thus the nash equilibrium) in the situations it wants/needs to be.


What range are you speaking about ? The range of cards your opponent may hold ? Or the range of cards you play ? Anyway, if you are modeling your opponent and making decision based on this modeling, your strategy is fully exploitive but far away from equilibrium and so exploitable (unless your opponent play an equilibrium strategy).

Heuristics wrote:
A good example would be pre-flop 3-betting behavior. You can either approach it mathematically, or empirically through statistics & experience. Of course, the latter approach results in faster computations and it's also easier to adjust the behavior to get maximum expected value.


The problem is your statistics and experience are opponent dependent. So you cannot compute equilibrium but surely exploitive agent.

Heuristics wrote:
The whole idea behind this is of course the polarization of your range. The more hands you raise from the bottom of your range, the more you will have to raise from the top of your range. If you want to apply a non-polarized strategy, you will have to significantly decrease your aggression and narrow down your range if you're playing against a good player or a balanced strategy.


Raising really weak hands to bluff doesn't mean you should raise almost all your hands.

Heuristics wrote:
In the end, I just believe that an unexploitable style can never be a winning one, so what is the point of finding this so much sought-after equilibrium?


An unexploitable style can never be a loosing one, so that is the point of finding it ;)


Top
 Profile  
 
PostPosted: Sun Mar 17, 2013 7:08 pm 
Offline
Senior Member

Joined: Mon Mar 11, 2013 10:24 pm
Posts: 216
Most of your assumptions are wrong:
1. There is an equilibrium (at least w/o considering rake) - its proven, the question is "only" how to find it or a near-optimal solution
2. Yes, exploiting can make you more $$$ but opens yourself up to exploitation - thats the beauty of GTO solutions
3. Assuming that very good players don't have a preflop semi-bluff 3b range is just wrong. Why do we bluff with hands like 97s instead of 72o - because of the equity when being called...
4. Thin value bets are just value bets, but when you called you will lose the pot more often than after a regular v-bet (but you are still good 50+% against his calling range, otherwise its not a value bet).
5. GTO finding algorithms automatically create solutions that aren't imbalanced, i.e., they raise in the right proportion with hands that can continue and those who cant
6. GTO bots can win because people make mistakes. Even though you dont exploit them to the max, their mistakes are your value. The Limit bot that beat a lot of pro limit player was GTO btw...


Top
 Profile  
 
PostPosted: Sun Mar 17, 2013 9:54 pm 
Offline
Junior Member

Joined: Fri Mar 15, 2013 11:55 am
Posts: 24
proud2bBot wrote:
Most of your assumptions are wrong:
1. There is an equilibrium (at least w/o considering rake) - its proven, the question is "only" how to find it or a near-optimal solution


Yes, but that search is a very theoretical question at this point, and I just don't see the benefit of it applied in practice.

Quote:
2. Yes, exploiting can make you more $$$ but opens yourself up to exploitation - thats the beauty of GTO solutions


I can see that benefit, but I don't think players on the limit where you usually run bots are on a level anywhere near where they adjust well enough to merit the use. Next to the fact that you probably won't find a real GTO, you will thus miss out on many many opportunities to increase your EV.

Quote:
3. Assuming that very good players don't have a preflop semi-bluff 3b range is just wrong. Why do we bluff with hands like 97s instead of 72o - because of the equity when being called...


I didn't say they don't have a semi-bluff range, I said that in their reasoning, a semi-bluff is exactly the same as a bluff. When you determine your bluff range, you look at all the cards you can't raise for value and you can't call, and then you take the top percentage up to how much you want to bluff in that spot, depending on two factors: card removal effect and implied odds / playibility. This means that good and semi-good draws will almost always go into the bluff range as semi-bluffs, but they are NOT a theoretically separate consideration.

Quote:
4. Thin value bets are just value bets, but when you called you will lose the pot more often than after a regular v-bet (but you are still good 50+% against his calling range, otherwise its not a value bet).


Quote:
5. GTO finding algorithms automatically create solutions that aren't imbalanced, i.e., they raise in the right proportion with hands that can continue and those who cant
6. GTO bots can win because people make mistakes. Even though you dont exploit them to the max, their mistakes are your value. The Limit bot that beat a lot of pro limit player was GTO btw...


This is probably true, but Limit is a completely different game - it's close to being completely solved in the game theoretical domain, while No Limit is still very far away from that. Considering rake, the perfect GTO strategy would be actually losing money at a very high rate.

GTO might be a good consideration in some spots, where you want to avoid making big mistakes, but in almost all standard spots, I am convinced that an exploitive strategy serves the goal of making profit better.


Top
 Profile  
 
PostPosted: Mon Mar 18, 2013 6:50 pm 
Offline
Senior Member

Joined: Mon Mar 11, 2013 10:24 pm
Posts: 216
I'm not saying that explotive strategies are bad or anything, The crucial part of them is the opponent model and its very hard to get a solid picture of it with only few observations available. For example, you are playing against villain in a cash game, you are SB and he is BB. Its folded to you and you have 74s - what do you do? GTO will tell you whats optimal here. Exploitive strategies would need to figure out how villain reacts to a limp or a raise and with which hands. Given we have 100 hands in this spot and he was folding 80% of the time, your exploitive bot would tell you to openraise any 2. But if you do that, he will likely adjust after you steal his blinds a couple of times. So the next 20 hands, he is 3betting any 2 and you have to fold all your crap, losing 2bb (assuming you minraise). After 120 hands, he folded still 80 times which is 66%, so it seems still profitable to steal any2 for the bot... This simple example shows how difficult it is to get a good model without the risk of getting owned by opening up to getting exploited heavily.


Top
 Profile  
 
PostPosted: Mon Mar 18, 2013 8:45 pm 
Offline
Junior Member

Joined: Fri Mar 15, 2013 11:55 am
Posts: 24
proud2bBot wrote:
I'm not saying that explotive strategies are bad or anything, The crucial part of them is the opponent model and its very hard to get a solid picture of it with only few observations available. For example, you are playing against villain in a cash game, you are SB and he is BB. Its folded to you and you have 74s - what do you do? GTO will tell you whats optimal here. Exploitive strategies would need to figure out how villain reacts to a limp or a raise and with which hands. Given we have 100 hands in this spot and he was folding 80% of the time, your exploitive bot would tell you to openraise any 2. But if you do that, he will likely adjust after you steal his blinds a couple of times. So the next 20 hands, he is 3betting any 2 and you have to fold all your crap, losing 2bb (assuming you minraise). After 120 hands, he folded still 80 times which is 66%, so it seems still profitable to steal any2 for the bot... This simple example shows how difficult it is to get a good model without the risk of getting owned by opening up to getting exploited heavily.


Hey. Well, that's a good example of a retarded bot. A smarter bot would only vary in a range that's exploitive, but not too exploitable. Of course, you need experience for this.

Don't get me wrong. I want to hear more about GTO and botting, as I'm trying to learn here, and your arguments are very interesting indeed.

As I see it, a GTO approach would be a good baseline, from which you then deviate once you have a better opponent model. However, what I do here is that I look at a solid database (several million hands), cluster opponents into some categories, get rid of extreme cases and categorize opponents into those categories based on their basic stats.

It would seem to me that this approach would be more profitable than just being in the dark with the GTO until you identify ways to exploit?


Top
 Profile  
 
PostPosted: Mon Mar 18, 2013 8:52 pm 
Offline
Senior Member

Joined: Mon Mar 11, 2013 10:24 pm
Posts: 216
Well, thats one area where GTO/explotive are getting mixed. In the recent paper of UoA, they use a range of different optimal strategies, e.g. a nash one, one which is a best response to a calling station and one which is a best response to a maniac. All these best responses are calculated like the GTO variant but vs. a fixed strategy. Next in-game you decide how likely the player fits to the models in your repartoire and select a corresponding strategy.


Top
 Profile  
 
PostPosted: Tue Mar 19, 2013 8:34 am 
Offline
Junior Member

Joined: Tue Mar 05, 2013 1:27 pm
Posts: 16
Agreed with proud2bBot ! You should read some papers about game-theoric counter-strategies. I recommend you the papers in the two following sections.

Frequentist Best Response and Restricted Nash Response :
http://poker.cs.ualberta.ca/publications/NIPS07-rnash.pdf
http://poker.cs.ualberta.ca/publications/johanson.msc.pdf

Data Biased Response
http://poker.cs.ualberta.ca/publications/AISTATS09.pdf


Top
 Profile  
 
PostPosted: Tue Mar 19, 2013 4:40 pm 
Offline
Site Admin
User avatar

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 642
My approach is exclude as much "expert" knowledge as possible. As I don't have much expert knowledge, I don't have much of a choice. Generally speaking, removing expert knowledge is the favoured academic approach. Expert systems tend to be hard to maintain and the expert knowledge is often incomplete.

More mathematical approaches need not be confined to equilibrium calculation. As others have said before there are mathematically based exploitative bots. A bot that calculates the best response to past opponent behaviour is vulnerable when the opponent changes his strategy. So the approach I'm taking is to calculate Nash Equilibria and Restricted Nash Responses. UofA show in one of their papers that playing an exploitative strategy just a little displaced from Nash gives you many of the benefits of a pure exploitative strategy and Nash at the same time. The alternative would be to create an agent that responds very quickly to changes in opponent strategy or anticipates them. I have no clue how to do that.


Top
 Profile  
 
PostPosted: Wed Mar 20, 2013 7:21 am 
Offline
Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437
spears wrote:
So the approach I'm taking is to calculate Nash Equilibria and Restricted Nash Responses. UofA show in one of their papers that playing an exploitative strategy just a little displaced from Nash gives you many of the benefits of a pure exploitative strategy and Nash at the same time.

With ... your new CFRM modeling method? :D

spears wrote:
The alternative would be to create an agent that responds very quickly to changes in opponent strategy or anticipates them. I have no clue how to do that.

I'm not sure what you're implying by 'quickly' but the U of A just published a paper on using the Exp4 algorithm and 'clones' to adapt to opponents. Exp4 has a time-dependent decay.

I've thought about using machine learning to pick up on certain contexts, such as the opponent losing a big hand, or losing by a narrow margin, etc. as well as stack stability or game length. There are so many nuances that people can infer outside of strategy alone. So, instead of simply using the reward against an opponent in a given hand, with a given strategy, the aforementioned atributes would be added to the model, presented in a time-series fashion, and used to select an RNR/DBR.


Top
 Profile  
 
PostPosted: Wed Mar 20, 2013 7:52 pm 
Offline
Junior Member

Joined: Fri Mar 15, 2013 11:55 am
Posts: 24
Thanks so much for all the input, I'll definitely grind my way through some of the papers.

As far as adjusting opponents are concerned:

As a poker player, modeling your opponent first requires you to have an overall game plan for most standard spots. This is what you get when using several opponent models that should cover the majority of opponents you will face.

However, on the next step of improving your game, you will usually look at specific hands and analyse them for deviations. These anomalies will be a finer level of adjustment to your opponent. They usually correspond to reads/notes that you have on your opponent.

While it is possible for very good players to vary between several types of play, almost all players will usually stick to their game basis and adjustments will usually take place in the realm of those anomalies. Detecting them is rather straight forward, and you can detect both abrupt ones (i.e. he suddenly plays one hand completely differently from how he used to play it) or continuous ones (the frequencies in certain spots change).

Using the frequency and speed at which your opponent seems to adjust, you can then make an educated guess as to how quickly you should adjust your strategy, i.e. making drastic changes or just slightly adjusting your frequencies.

I know this sounds easier than it is, but it's doable if you use a decision tree with decision nodes that have various weights and a random factor to them.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 11 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
cron
Powered by phpBB® Forum Software © phpBB Group