Thanks Guenther, that WB forum thread says it pretty well!Guenther wrote: ↑Tue Jan 15, 2019 5:02 pmIt must be over 10 years I have seen Joels name mentioned somewhere, thanks for the paper.Rein Halbersma wrote: ↑Tue Jan 15, 2019 4:52 pm
A similar idea has been proposed before I think: https://papers.nips.cc/paper/3722-boots ... search.pdf
https://www.chessprogramming.org/Bodo
I didn't even know he created 'Meep', because that was shortly before a long hiatus on my side.
https://www.chessprogramming.org/Meep
One of those shorter exchanges between Michael and Bob in the WB forum about Romis learning.
http://www.open-aurec.com/wbforum/viewt ... f=4&t=4835
Reinforcement Learning (RL) in real time paradigm
Moderators: hgm, Rebel, chrisw
-
- Posts: 3196
- Joined: Fri May 26, 2006 3:00 am
- Location: WY, USA
- Full name: Michael Sherwin
Re: Reinforcement Learning (RL) in real time paradigm
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
-
- Posts: 895
- Joined: Mon Jan 15, 2007 11:23 am
- Location: Warsza
Re: Reinforcement Learning (RL) in real time paradigm
Michael, I will be glad if you prove me wrong. But even then I would search for ways to reduce game length. You want to feed transposition table with additional information, trusting that it will help to shape the final search. The idea looks good, as long as this information has a chance to be accessed. If you search 20 plies ahead, then there will be no use for entries from ply 40.
Pawel Koziol
http://www.pkoziol.cal24.pl/rodent/rodent.htm
http://www.pkoziol.cal24.pl/rodent/rodent.htm
-
- Posts: 3196
- Joined: Fri May 26, 2006 3:00 am
- Location: WY, USA
- Full name: Michael Sherwin
Re: Reinforcement Learning (RL) in real time paradigm
There is also back propagation from 40 ply down toward the root and that is why RL gets fantastic results. If it gets done it will have to be by someone else for personal health reasons. Thanks for your input!PK wrote: ↑Thu Jan 17, 2019 7:05 pm Michael, I will be glad if you prove me wrong. But even then I would search for ways to reduce game length. You want to feed transposition table with additional information, trusting that it will help to shape the final search. The idea looks good, as long as this information has a chance to be accessed. If you search 20 plies ahead, then there will be no use for entries from ply 40.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through