Poker-AI.org

Poker AI and Botting Discussion Forum
It is currently Mon Nov 13, 2023 5:27 pm

All times are UTC




Post new topic Reply to topic  [ 1 post ] 
Author Message
 Post subject: Regret Based Pruning
PostPosted: Sun Mar 06, 2016 2:24 am 
Offline
New Member

Joined: Sun Mar 06, 2016 1:10 am
Posts: 1
Hello,

I was wondering if anyone has been able to figure out how to implement regret based pruning as described in this paper:

http://www.cs.cmu.edu/~noamb/papers/15-NIPS-Regret-Based-TR.pdf

I understand the basic premise is to temporarily prune nodes the current player reaches with zero probability when the action leading to that node has negative regret. After not traversing the node for some predetermined number of iterations we then look at the average strategy the opponent would have made over those iterations & (I think) update regrets assuming we made a best response over all of the skipped iterations for that node. At this point it is starting to get pretty fuzzy for me and it doesn't help that I can't really understand all of the symbolic jargon in the paper.

If anyone is able to help me out with concrete examples of how this all works, especially if it could be shown in the context of the Java code I am already using, it would be hugely appreciated. In particular, how do we decide when to start skipping a node? (when the negative regret reaches a certain threshold?). How many iterations should a node be skipped? (I think cumulative negative regret / average change in regret, but how to calculate avg change?) Once we revisit a node, how do we calculate the best response to ensure proper regret updates for that node and all descendant nodes? How do we handle it when regret goes below zero for an action leading to a node closer to the root when we are already pruning one (or many) of its descendant nodes?

To give an idea of where I am at so far, I started with this:

http://modelai.gettysburg.edu/2013/cfr/cfr.pdf

Using the Kuhn Poker variant described within I was able to extend it to work for NLHU Holdem. I'm already pruning actions which the opponent reaches with zero probability and have implemented some very simple multi-threading for certain portions of the code. Still, with my limited (and pretty old) hardware I am only averaging approx 600 iterations per minute.

One more thing. I am aware of the thread on average strategy sampling and while it does seem similar to the regret based pruning described in this paper I don't think it is exactly the same. As well, that thread is over 2 1/2 years without an update and I'm thinking my post has higher visibility if I post here. I hope no one is offended by the new post.

Maybe if people weren't previously aware of the paper on Regret based Pruning someone brilliant will come along and show me the way!

Anyway, thanks for reading. :D


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 1 post ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Powered by phpBB® Forum Software © phpBB Group