UCTs
UCTs, or Upper Confidence bounds applied to Trees, are a family of algorithms used for decision making in sequential, stochastic environments, implemented as a selection strategy within Monte Carlo Tree Search (MCTS). The core idea is to treat each node as having an estimated value plus an uncertainty bound, and to select among a node’s children by balancing exploration and exploitation. In the common formulation, the child with the highest value of Q_i + c * sqrt(ln N / n_i) is chosen, where Q_i is the average reward of child i, n_i is the number of times that child has been visited, N is the total visits to the parent, and c is an exploration constant. After a simulation (playout) from a leaf, the result is propagated back up the tree, updating visit counts and value estimates along the path.
UCTs were introduced by Csaba Szepesvári and Rényi Kocsis in 2006 as a principled way to apply
Variants and extensions of UCTs address practical challenges, such as large branching factors (progressive widening), non-stationary
Limitations include reliance on rollout quality, computational demands, and sensitivity to parameter choices. While powerful, UCTs