Why is RL Different than AI/ML Planning? at 3:26 Exploration starts at 19:27 Evaluation an RL algorithm at 40:18

In the slide at 24:18, the last term is the square root of a fraction where the numerator is, e.g. in the context of a game, "twice the natural logarithm of the number of actions taken since the beginning of the game" and the denominator is "the number of times this particular action has been chosen this game", and is meant to give rarely chosen actions a little boost in how likely they are to be chosen. I'm pretty sure that's the correct understanding, anyone care to disagree?

In the slide at 24:18, the last term is the square root of a fraction where the numerator is, e.g. in the context of a game, "twice the natural logarithm of the number of actions taken since the beginning of the game" and the denominator is "the number of times this particular action has been chosen this game", and is meant to give rarely chosen actions a little boost in how likely they are to be chosen. I'm pretty sure that's the correct understanding, anyone care to disagree?

