Faster sorting algorithms discovered using deep reinforcement learning – Nature
BackgroundAlphaZeroAlphaZero33 is an RL algorithm that leverages MCTS as a policy improvement operator. It consists of (1) a representation network frep that outputs a latent representation ht of the state St; and (2) a prediction network…
Lire la suite...
Lire la suite...