There is a rich literature on how to improve MCTS beyond the different confidence bound values presented in by previous blog post on MCTS Confidence Bounds.

##

###

## Parallelisation

##
A number of parallelisation models have been suggested for MCTS.

### Ensemble UCT

###
Ensemble UCT [6] runs multiple instances of the relevant UCT search and combines the results to a single root policy. This means the tree statistic beyond the root are never shared between the instances. This is very similar to root parallelisation [].

Fern and Lewis [6] studied different ways of combining the root statistics including Bagging-inspired voting mechanisms. The conclusion was that a sum weighted on the number of visits performed consistently in the top and other models sometimes performed a lot worse.

### AMAF - RAVE

## Single Player Games

Some work discusses the option of discounting the reward value, using a discount factor, $\gamma$, during a backup. This is considered important in two-player games where the opponent may be unpredictable [2], but less so for single-player games where action selection and state transition is more deterministic.

The more deterministic environment in single-player games also makes it more attractive to chose a child based on the

*maximum simulation reward*for each child, rather than the mean reward. This requires each note to record the maximum reward as well as the mean. In CADIAPLAYER [2] the mean is still used during simulation, but the max is used for the final selection of an action to play.

The CADIAPLAYER also adds the complete search path to the tree, rather than a single-node extension, for searches that lead to optimal solutions.

## Recursive Searches

Instead of doing a large number of simulations and select the final action based on the observed statistics, algorithms such as

*Nested Monte-Carlo Tree Search*(NMCS) [9] make further searches starting from the best node identified in the search before. This increases the proportion of searches around the most promising path but reports better results on some problems than MCTS.
Another recursive solution is

*Nested Policy Rollout Adaptation*[10] which uses a policy to guide the child selection in the typically random rollout phase. It also updates the given policy using a gradient descent step after each recursive call.## Combining MCTS with Learning

MCTS can be effectively combined with RL algorithms for learning RL policies or value functions. Due to their ability to compress information and generalise states and actions, such algorithms can provide global, long-term representations that complement the detailed uncompressed, focused and short-term representation of MCTS.

A typical setup used by AlpaZero [8], Expert Iteration [1] and Dyna-2 [7] is for MCST to provide state, policy and/or reward samples for supervised training of long-term policies or value functions, while the policy and/or value functions provide heuristics to guide the action selection in MCTS.

[10] Christopher D. Rosin (2011) Nested Rollout Policy Adaptation for Monte Carlo Tree Search, in the Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence (IJCAI), pp649-654.

## References

[1] Thomas Anthony and David Barber (2017) Thinking Fast and Slow with Deep Learning and Tree Search, NIPS 2017, pp5360-5370.

[2] Yngvi Bjornsson and Hilmar Finnsson (2009) CADIAPLAYER: A Simulation-Based General Game Player,

[3] B. Bouzy and B. Helmstetter (2004) Advances in Computer Games, vol 135. Springer.

[4] B. Brügmann (1993) Monte Carlo Go, technical report.

[5] Remi Coulom (2006) Efficient selectivity and backup operators in Monte-Carlo tree search, in the proceedings of the 5th International Conference on Computers and Games, pp72-83.

[6] Alan Fern and Paul Lewis (2011) Ensemble Monte-Carlo planning: an empirical study, in the Proceedings of the Twenty-First International Conference on International Conference on Automated Planning and Scheduling, pp58-65.

[7] David Silver, Richard S. Sutton and Martin Müller (2008) Sample-based learning and search with permanent and transient memories, in the proceedings of the 25th international conference on Machine learning (ICML), New York, USA, pp968-975.

[8] David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis (2017) Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm, arXiv:1712.01815 [cs.AI].

[9] Tristan Cazenave (2009) Nested Monte-Carlo Search, in the Proceedings of the 21st International Joint Conference on Artifical intelligence (IJCAI), pp456-461.

[2] Yngvi Bjornsson and Hilmar Finnsson (2009) CADIAPLAYER: A Simulation-Based General Game Player,

[3] B. Bouzy and B. Helmstetter (2004) Advances in Computer Games, vol 135. Springer.

[4] B. Brügmann (1993) Monte Carlo Go, technical report.

[5] Remi Coulom (2006) Efficient selectivity and backup operators in Monte-Carlo tree search, in the proceedings of the 5th International Conference on Computers and Games, pp72-83.

[6] Alan Fern and Paul Lewis (2011) Ensemble Monte-Carlo planning: an empirical study, in the Proceedings of the Twenty-First International Conference on International Conference on Automated Planning and Scheduling, pp58-65.

[7] David Silver, Richard S. Sutton and Martin Müller (2008) Sample-based learning and search with permanent and transient memories, in the proceedings of the 25th international conference on Machine learning (ICML), New York, USA, pp968-975.

[8] David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis (2017) Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm, arXiv:1712.01815 [cs.AI].

[9] Tristan Cazenave (2009) Nested Monte-Carlo Search, in the Proceedings of the 21st International Joint Conference on Artifical intelligence (IJCAI), pp456-461.

[10] Christopher D. Rosin (2011) Nested Rollout Policy Adaptation for Monte Carlo Tree Search, in the Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence (IJCAI), pp649-654.