Thanks for the responses, I'll hopefully be able to upload the paper within the next month
The agent doesn't perform great in PA, I feel as though quite an important factor in it's current performance is it's preflop strategy which is quite weak (focused mostly on postflop and had to go for a quick preflop strategy in the end).
I made a 6-max table with 5 opponents with a range playing styles; lag, tag, lap, tap and 'strong'. The opponents were based on Xenbot in PA.
I've uploaded a table with the results of around 20k hands per agent.
- GEN uses a generic opponent model for each opponent
- 4k assigns each player to one of four opponent models based on clustered data
- 9k assigns each player to one of nine opponent models based on clustered data
- 4kU and 9kU are identical to above, however every 500 hands they train a unique opponent model for each player and use it instead of the clustered model.
As you can see, the agent loses in all cases.. however, I highlight the reasons for this in my discussion, and have a few simple ideas that should drastically improve it's performance.