Reinforcement Learning (RL) in real time paradigm

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Reinforcement Learning (RL) in real time paradigm

Post by Michael Sherwin »

Guenther wrote: Tue Jan 15, 2019 5:02 pm
Rein Halbersma wrote: Tue Jan 15, 2019 4:52 pm
A similar idea has been proposed before I think: https://papers.nips.cc/paper/3722-boots ... search.pdf
It must be over 10 years I have seen Joels name mentioned somewhere, thanks for the paper.
https://www.chessprogramming.org/Bodo
I didn't even know he created 'Meep', because that was shortly before a long hiatus on my side.
https://www.chessprogramming.org/Meep

One of those shorter exchanges between Michael and Bob in the WB forum about Romis learning.
http://www.open-aurec.com/wbforum/viewt ... f=4&t=4835
Thanks Guenther, that WB forum thread says it pretty well! :D
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
PK
Posts: 893
Joined: Mon Jan 15, 2007 11:23 am
Location: Warsza

Re: Reinforcement Learning (RL) in real time paradigm

Post by PK »

Michael, I will be glad if you prove me wrong. But even then I would search for ways to reduce game length. You want to feed transposition table with additional information, trusting that it will help to shape the final search. The idea looks good, as long as this information has a chance to be accessed. If you search 20 plies ahead, then there will be no use for entries from ply 40.
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Reinforcement Learning (RL) in real time paradigm

Post by Michael Sherwin »

PK wrote: Thu Jan 17, 2019 7:05 pm Michael, I will be glad if you prove me wrong. But even then I would search for ways to reduce game length. You want to feed transposition table with additional information, trusting that it will help to shape the final search. The idea looks good, as long as this information has a chance to be accessed. If you search 20 plies ahead, then there will be no use for entries from ply 40.
There is also back propagation from 40 ply down toward the root and that is why RL gets fantastic results. If it gets done it will have to be by someone else for personal health reasons. :( Thanks for your input! :D
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through