Reinforcement Learning (RL) in real time paradigm

Discussion of chess software programming and technical issues.

Moderators: hgm, Harvey Williamson, bob

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Michael Sherwin
Posts: 3024
Joined: Fri May 26, 2006 1:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Reinforcement Learning (RL) in real time paradigm

Post by Michael Sherwin » Tue Jan 15, 2019 7:50 pm

Guenther wrote:
Tue Jan 15, 2019 4:02 pm
Rein Halbersma wrote:
Tue Jan 15, 2019 3:52 pm

A similar idea has been proposed before I think: https://papers.nips.cc/paper/3722-boots ... search.pdf
It must be over 10 years I have seen Joels name mentioned somewhere, thanks for the paper.
https://www.chessprogramming.org/Bodo
I didn't even know he created 'Meep', because that was shortly before a long hiatus on my side.
https://www.chessprogramming.org/Meep

One of those shorter exchanges between Michael and Bob in the WB forum about Romis learning.
http://www.open-aurec.com/wbforum/viewt ... f=4&t=4835
Thanks Guenther, that WB forum thread says it pretty well! :D
I hate if statements. Pawns demand if statements. Therefore I hate pawns.

PK
Posts: 803
Joined: Mon Jan 15, 2007 10:23 am
Location: Warsza
Contact:

Re: Reinforcement Learning (RL) in real time paradigm

Post by PK » Thu Jan 17, 2019 6:05 pm

Michael, I will be glad if you prove me wrong. But even then I would search for ways to reduce game length. You want to feed transposition table with additional information, trusting that it will help to shape the final search. The idea looks good, as long as this information has a chance to be accessed. If you search 20 plies ahead, then there will be no use for entries from ply 40.

Michael Sherwin
Posts: 3024
Joined: Fri May 26, 2006 1:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Reinforcement Learning (RL) in real time paradigm

Post by Michael Sherwin » Thu Jan 17, 2019 11:43 pm

PK wrote:
Thu Jan 17, 2019 6:05 pm
Michael, I will be glad if you prove me wrong. But even then I would search for ways to reduce game length. You want to feed transposition table with additional information, trusting that it will help to shape the final search. The idea looks good, as long as this information has a chance to be accessed. If you search 20 plies ahead, then there will be no use for entries from ply 40.
There is also back propagation from 40 ply down toward the root and that is why RL gets fantastic results. If it gets done it will have to be by someone else for personal health reasons. :( Thanks for your input! :D
I hate if statements. Pawns demand if statements. Therefore I hate pawns.

Post Reply