Monte Carlo Strategies for Fixing Reinforcement Studying Issues | by Oliver S | Sep, 2024

Monte Carlo Strategies for Fixing Reinforcement Studying Issues | by Oliver S | Sep, 2024
Monte Carlo Strategies for Fixing Reinforcement Studying Issues | by Oliver S | Sep, 2024


Dissecting “Reinforcement Studying” by Richard S. Sutton with Customized Python Implementations, Episode III

We proceed our deep dive into Sutton’s nice e book about RL [1] and right here deal with Monte Carlo (MC) strategies. These are capable of be taught from expertise alone, i.e. don’t require any sort of mannequin of the surroundings, as e.g. required by the Dynamic programming (DP) methods we introduced in the previous post.

That is extraordinarily tempting — as usually the mannequin isn’t recognized, or it’s laborious to mannequin the transition chances. Take into account the sport of Blackjack: though we totally perceive the sport and the foundations, fixing it through DP strategies can be very tedious — we must compute every kind of chances, e.g. given the at the moment performed playing cards, how possible is a “blackjack”, how possible is it that one other seven is dealt … By way of MC strategies, we don’t should take care of any of this, and easily play and be taught from expertise.

Picture by Jannis Lucas on Unsplash

As a consequence of not utilizing a mannequin, MC strategies are unbiased. They’re conceptually easy and straightforward to know, however exhibit a excessive variance and can’t be solved in iterative trend (bootstrapping).

As talked about, right here we’ll introduce these strategies following Chapter 5 of Sutton’s e book…

Leave a Reply

Your email address will not be published. Required fields are marked *