Understanding the power of reinforcement learning

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Understanding the power of reinforcement learning

Post by Michael Sherwin »

trulses wrote:
Michael Sherwin wrote:How reinforcement learning applies to the Dragon position above is that unless black finds the winning move and plays some other move instead black's position is losing. On learning being triggered the entire game is overlaid onto the tree stored on the hard disk. For each node stored on the hard disk there is the reinforcement value. The nodes of the winning side are adjusted upwards and the nodes of the losing side are adjusted downward. This means that bad moves can gain value from this and good moves can lose value. Over time this corrects itself. Since higher nodes are given a larger reward/penalty higher nodes affect the search sooner but eventually the values backpropagate to the root of the current position and when all the alternative moves to the winning move look worse than Qxc3 it will play Qxc3 and win and since those moves then get rewarded it then just plays the winning move as long as it continues to win. But then that line being moved to the hash before each search the winning move starts to affect the search from an earlier node in the game. As long as there is any subtree stored in the learn file no matter how small that subtree might be those nodes with its accumulated scores affect the search.

The learning in RomiChess was intended as a sparring partner for humans. Winboard sends a result command in human versus computer games. Arena does not or did not anyway 11 years ago. So set up a position or start from the starting position and play against Romi and if you beat Romi Romi will play differently. Change sides and Romi will play your winning moves against you and then you will have to win and teach Romi better moves. Then switch sides again and if you win Romi will learn yet more. If Romi wins then Romi is the teacher. It is hard to put into words but basically the engine and human teach each other and is especially good for learning a chosen opening. Anyway in the last 11 years I have received zero reports of Romi being used like intended. That is a shame really because there is no other training system like it in existence as far as I know.
Hey Michael, very interesting stuff, this seems like a table-based monte carlo policy evaluation. Impressive that you would independently discover such a thing on your own.

Did you ever try self-play in romi-chess using this method?

Did you ever try using one learn file with multiple different engines?

Did you always adjust the table by the same centipawn amount, or did you try lowering the centipawn bonus as you got more visits to each position?

You might experience some issues going from one engine to another, since the evaluation becomes fit not just to romi-chess but also the opponents policy. However this is indeed a first step towards the policy evaluation used in A0.

If you wanted to speed up the learning process, you could look into TD(lambda) which uses a mixture of the episode return (win/loss/draw) and the table values visited over the course of the episode to update the table values.
Thanks for the kind words and acknowledgement! I tried self play and since draws are minutely penalized Romi will eventually visit all reasonable lines. The problem is I do not have the resources to let Romi do this at long time controls and short time controls while helpful takes far to many training games.

Romi does what Romi was intended for very well. Beyond that I was just showing the way but back 11 years ago only The Baron followed in Romi's footsteps and that came to naught as well. And as far as the learning being used as a sparring partner I hoped back then that a commercial engine like Chessmaster would pick up on it and offer it to the public. If I could I'd write a Chess GUI specializing in just being a sparring partner as intended.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
giovanni
Posts: 142
Joined: Wed Jul 08, 2015 12:30 pm

Re: Understanding the power of reinforcement learning

Post by giovanni »

Michael Sherwin wrote:
giovanni wrote:
Michael Sherwin wrote:The following position is a bit dated as most strong engines will find the best move using normal search. However 30 years ago, just throwing a dart at the calendar, the best engines could not find the best move. Even RomiChess in 2006 could not find it, Phalanx 22 could. So this example is a bit dated. In 2005 this position is one that I hoped Romi could find with normal search but that did not happen. After I added reinforcement learning and before I added MSMD learning I tested Romi playing the black pieces to see if Romi could find the best move after training a number of games. It took Romi 40 games to find the best move but when she found it (learned it due to reinforcement learning) she won every game. I know that TSCP could also find this winning move after enough training games in the position. The point is if TSCP had reinforcement learning and won a game against SF in this position it would look superhuman. It would look like TSCP thought like a human and did the 'impossible'. It would look as incredible as AlphaZ except it would have done it on equal hardware.

[d]r5k1/pp1bppbp/3p1np1/q5B1/2r1PP2/1NN5/PPPQ2PP/1K1R3R b - - 1 16
Thanks Michael. Could you elaborate a little bit more on this post? I mean how reiforcement learning applies to this position, what is MSMD, etc?
MSMD is as the above post indicates, Monkey See Monkey Do, learning. It merely plays winning lines from past experience upto 180 ply in RomiChess. So Romi can play some very deep lines and use virtually no time on the clock.

How reinforcement learning applies to the Dragon position above is that unless black finds the winning move and plays some other move instead black's position is losing. On learning being triggered the entire game is overlaid onto the tree stored on the hard disk. For each node stored on the hard disk there is the reinforcement value. The nodes of the winning side are adjusted upwards and the nodes of the losing side are adjusted downward. This means that bad moves can gain value from this and good moves can lose value. Over time this corrects itself. Since higher nodes are given a larger reward/penalty higher nodes affect the search sooner but eventually the values backpropagate to the root of the current position and when all the alternative moves to the winning move look worse than Qxc3 it will play Qxc3 and win and since those moves then get rewarded it then just plays the winning move as long as it continues to win. But then that line being moved to the hash before each search the winning move starts to affect the search from an earlier node in the game. As long as there is any subtree stored in the learn file no matter how small that subtree might be those nodes with its accumulated scores affect the search.

The learning in RomiChess was intended as a sparring partner for humans. Winboard sends a result command in human versus computer games. Arena does not or did not anyway 11 years ago. So set up a position or start from the starting position and play against Romi and if you beat Romi Romi will play differently. Change sides and Romi will play your winning moves against you and then you will have to win and teach Romi better moves. Then switch sides again and if you win Romi will learn yet more. If Romi wins then Romi is the teacher. It is hard to put into words but basically the engine and human teach each other and is especially good for learning a chosen opening. Anyway in the last 11 years I have received zero reports of Romi being used like intended. That is a shame really because there is no other training system like it in existence as far as I know.
Thanks for the detaliled explanation. I tried to start a match between stockfish and romichess starting with the dragon position in Arena (linux). Although, the match procedeed smoothly, the 'learn.dat' file was never updated and remained always the same size (280000 bytes) . This means that there was no learning under these conditions?
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Understanding the power of reinforcement learning

Post by Michael Sherwin »

giovanni wrote:
Michael Sherwin wrote:
giovanni wrote:
Michael Sherwin wrote:The following position is a bit dated as most strong engines will find the best move using normal search. However 30 years ago, just throwing a dart at the calendar, the best engines could not find the best move. Even RomiChess in 2006 could not find it, Phalanx 22 could. So this example is a bit dated. In 2005 this position is one that I hoped Romi could find with normal search but that did not happen. After I added reinforcement learning and before I added MSMD learning I tested Romi playing the black pieces to see if Romi could find the best move after training a number of games. It took Romi 40 games to find the best move but when she found it (learned it due to reinforcement learning) she won every game. I know that TSCP could also find this winning move after enough training games in the position. The point is if TSCP had reinforcement learning and won a game against SF in this position it would look superhuman. It would look like TSCP thought like a human and did the 'impossible'. It would look as incredible as AlphaZ except it would have done it on equal hardware.

[d]r5k1/pp1bppbp/3p1np1/q5B1/2r1PP2/1NN5/PPPQ2PP/1K1R3R b - - 1 16
Thanks Michael. Could you elaborate a little bit more on this post? I mean how reiforcement learning applies to this position, what is MSMD, etc?
MSMD is as the above post indicates, Monkey See Monkey Do, learning. It merely plays winning lines from past experience upto 180 ply in RomiChess. So Romi can play some very deep lines and use virtually no time on the clock.

How reinforcement learning applies to the Dragon position above is that unless black finds the winning move and plays some other move instead black's position is losing. On learning being triggered the entire game is overlaid onto the tree stored on the hard disk. For each node stored on the hard disk there is the reinforcement value. The nodes of the winning side are adjusted upwards and the nodes of the losing side are adjusted downward. This means that bad moves can gain value from this and good moves can lose value. Over time this corrects itself. Since higher nodes are given a larger reward/penalty higher nodes affect the search sooner but eventually the values backpropagate to the root of the current position and when all the alternative moves to the winning move look worse than Qxc3 it will play Qxc3 and win and since those moves then get rewarded it then just plays the winning move as long as it continues to win. But then that line being moved to the hash before each search the winning move starts to affect the search from an earlier node in the game. As long as there is any subtree stored in the learn file no matter how small that subtree might be those nodes with its accumulated scores affect the search.

The learning in RomiChess was intended as a sparring partner for humans. Winboard sends a result command in human versus computer games. Arena does not or did not anyway 11 years ago. So set up a position or start from the starting position and play against Romi and if you beat Romi Romi will play differently. Change sides and Romi will play your winning moves against you and then you will have to win and teach Romi better moves. Then switch sides again and if you win Romi will learn yet more. If Romi wins then Romi is the teacher. It is hard to put into words but basically the engine and human teach each other and is especially good for learning a chosen opening. Anyway in the last 11 years I have received zero reports of Romi being used like intended. That is a shame really because there is no other training system like it in existence as far as I know.
Thanks for the detaliled explanation. I tried to start a match between stockfish and romichess starting with the dragon position in Arena (linux). Although, the match procedeed smoothly, the 'learn.dat' file was never updated and remained always the same size (280000 bytes) . This means that there was no learning under these conditions?
Since Romi stores the tree from the original position Romi must know the moves that lead to the dragon position.

[Event "Computer chess game"]
[Site "MASTER"]
[Date "2017.12.14"]
[Round "?"]
[White "Stockfish_8_x64_popcnt"]
[Black "Stockfish_8_x64_popcnt"]
[Result "*"]
[BlackElo "2200"]
[ECO "B78"]
[Opening "Sicilian"]
[Time "16:35:29"]
[Variation "Dragon, Yugoslav, Old Main Line, 11.Bb3 Rfc8"]
[WhiteElo "2200"]
[TimeControl "10+1"]
[Termination "unterminated"]
[PlyCount "31"]
[WhiteType "human"]
[BlackType "human"]

1. e4 c5 2. Nf3 d6 3. d4 cxd4 4. Nxd4 Nf6 5. Nc3 g6 6. Be3 Bg7 7. f3 O-O 8.
Qd2 Nc6 9. Bc4 Bd7 10. O-O-O Qa5 11. Bb3 Rfc8 12. Kb1 Ne5 13. Bg5 Rc5 14. f4 Nc4 15. Bxc4 Rxc4 16. Nb3 *

A note on Stockish. Even if Romi finds the winning move it will most likely still lose to SF that is 1000+ elo stronger than Romi. Romi will then revert back to playing other moves until the winning move looks good again. I don't know how long this will repeat but it might take hundreds of games or maybe thousands before Romi's learning can find and prove a win in this position against SF. Another complication arises if SF is using multiple threads. That is because SF will vary its play requiring more games as well. Eventually though Romi will find the win.

I'm not advertising RomiChess persay but rather the learning system. SF, K, H for example would learn with much more efficiency needing far less training games just do to the fact that they play better moves to begin with. If you insist on trying this experiment against SF either be prepared to let it run long enough, maybe restricting SF to one thread. My suggestion is to seek proof of concept against a lesser engine and work up towards SF. The sooner Romi learns that Qxc3 is the winning move and wins with it multiple times locking in Qxc3 the less games it will need to beat SF with that move.

Edit: I was thinking back to a time when Romi did not have MSMD learning. With MSMD learning and letting Romi play both sides SF will show Romi how to win. But if you let Romi play both sides then proof of concept for just the RL learning will not be obtainable.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
FWCC
Posts: 117
Joined: Wed Aug 22, 2007 4:39 pm

Re: Understanding the power of reinforcement learning

Post by FWCC »

Is Romi a Winboard Engine only? Is there a UCI version?Or must I use Wb2Uci adapter?
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Understanding the power of reinforcement learning

Post by Michael Sherwin »

FWCC wrote:Is Romi a Winboard Engine only? Is there a UCI version?Or must I use Wb2Uci adapter?
UCI does not send a result command when a game ends unless that has changed in the last 10 years. Since Romi uses the result command sent by the winboard protocol to trigger learning RomiChess is a winboard engine.

Arena works in computer vs computer games but does not send a result command in human vs computer unless that has changed in the last 10 years. For computer vs human games use Winboard. Then either play to checkmate or resign or if the human is winning Romi will resign. Winboard will only send the result command if the game is officially over.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Understanding the power of reinforcement learning

Post by Michael Sherwin »

Michael Sherwin wrote:
giovanni wrote:
Michael Sherwin wrote:
giovanni wrote:
Michael Sherwin wrote:The following position is a bit dated as most strong engines will find the best move using normal search. However 30 years ago, just throwing a dart at the calendar, the best engines could not find the best move. Even RomiChess in 2006 could not find it, Phalanx 22 could. So this example is a bit dated. In 2005 this position is one that I hoped Romi could find with normal search but that did not happen. After I added reinforcement learning and before I added MSMD learning I tested Romi playing the black pieces to see if Romi could find the best move after training a number of games. It took Romi 40 games to find the best move but when she found it (learned it due to reinforcement learning) she won every game. I know that TSCP could also find this winning move after enough training games in the position. The point is if TSCP had reinforcement learning and won a game against SF in this position it would look superhuman. It would look like TSCP thought like a human and did the 'impossible'. It would look as incredible as AlphaZ except it would have done it on equal hardware.

[d]r5k1/pp1bppbp/3p1np1/q5B1/2r1PP2/1NN5/PPPQ2PP/1K1R3R b - - 1 16
Thanks Michael. Could you elaborate a little bit more on this post? I mean how reiforcement learning applies to this position, what is MSMD, etc?
MSMD is as the above post indicates, Monkey See Monkey Do, learning. It merely plays winning lines from past experience upto 180 ply in RomiChess. So Romi can play some very deep lines and use virtually no time on the clock.

How reinforcement learning applies to the Dragon position above is that unless black finds the winning move and plays some other move instead black's position is losing. On learning being triggered the entire game is overlaid onto the tree stored on the hard disk. For each node stored on the hard disk there is the reinforcement value. The nodes of the winning side are adjusted upwards and the nodes of the losing side are adjusted downward. This means that bad moves can gain value from this and good moves can lose value. Over time this corrects itself. Since higher nodes are given a larger reward/penalty higher nodes affect the search sooner but eventually the values backpropagate to the root of the current position and when all the alternative moves to the winning move look worse than Qxc3 it will play Qxc3 and win and since those moves then get rewarded it then just plays the winning move as long as it continues to win. But then that line being moved to the hash before each search the winning move starts to affect the search from an earlier node in the game. As long as there is any subtree stored in the learn file no matter how small that subtree might be those nodes with its accumulated scores affect the search.

The learning in RomiChess was intended as a sparring partner for humans. Winboard sends a result command in human versus computer games. Arena does not or did not anyway 11 years ago. So set up a position or start from the starting position and play against Romi and if you beat Romi Romi will play differently. Change sides and Romi will play your winning moves against you and then you will have to win and teach Romi better moves. Then switch sides again and if you win Romi will learn yet more. If Romi wins then Romi is the teacher. It is hard to put into words but basically the engine and human teach each other and is especially good for learning a chosen opening. Anyway in the last 11 years I have received zero reports of Romi being used like intended. That is a shame really because there is no other training system like it in existence as far as I know.
Thanks for the detaliled explanation. I tried to start a match between stockfish and romichess starting with the dragon position in Arena (linux). Although, the match procedeed smoothly, the 'learn.dat' file was never updated and remained always the same size (280000 bytes) . This means that there was no learning under these conditions?
Since Romi stores the tree from the original position Romi must know the moves that lead to the dragon position.

[Event "Computer chess game"]
[Site "MASTER"]
[Date "2017.12.14"]
[Round "?"]
[White "Stockfish_8_x64_popcnt"]
[Black "Stockfish_8_x64_popcnt"]
[Result "*"]
[BlackElo "2200"]
[ECO "B78"]
[Opening "Sicilian"]
[Time "16:35:29"]
[Variation "Dragon, Yugoslav, Old Main Line, 11.Bb3 Rfc8"]
[WhiteElo "2200"]
[TimeControl "10+1"]
[Termination "unterminated"]
[PlyCount "31"]
[WhiteType "human"]
[BlackType "human"]

1. e4 c5 2. Nf3 d6 3. d4 cxd4 4. Nxd4 Nf6 5. Nc3 g6 6. Be3 Bg7 7. f3 O-O 8.
Qd2 Nc6 9. Bc4 Bd7 10. O-O-O Qa5 11. Bb3 Rfc8 12. Kb1 Ne5 13. Bg5 Rc5 14. f4 Nc4 15. Bxc4 Rxc4 16. Nb3 *

A note on Stockish. Even if Romi finds the winning move it will most likely still lose to SF that is 1000+ elo stronger than Romi. Romi will then revert back to playing other moves until the winning move looks good again. I don't know how long this will repeat but it might take hundreds of games or maybe thousands before Romi's learning can find and prove a win in this position against SF. Another complication arises if SF is using multiple threads. That is because SF will vary its play requiring more games as well. Eventually though Romi will find the win.

I'm not advertising RomiChess persay but rather the learning system. SF, K, H for example would learn with much more efficiency needing far less training games just do to the fact that they play better moves to begin with. If you insist on trying this experiment against SF either be prepared to let it run long enough, maybe restricting SF to one thread. My suggestion is to seek proof of concept against a lesser engine and work up towards SF. The sooner Romi learns that Qxc3 is the winning move and wins with it multiple times locking in Qxc3 the less games it will need to beat SF with that move.

Edit: I was thinking back to a time when Romi did not have MSMD learning. With MSMD learning and letting Romi play both sides SF will show Romi how to win. But if you let Romi play both sides then proof of concept for just the RL learning will not be obtainable.
[Event "Romitest"]
[Site "MASTER"]
[Date "2017.12.14"]
[Round "12"]
[White "Stockfish_8_x64_popcnt"]
[Black "RomiChess64P3n2"]
[Result "0-1"]
[BlackElo "2200"]
[ECO "B79"]
[Opening "Sicilian"]
[Time "19:54:40"]
[Variation "Dragon, Yugoslav, Old Main Line, 12.h4 Ne5 13.Kb1 Nc4"]
[WhiteElo "2200"]
[TimeControl "10+1"]
[Termination "normal"]
[PlyCount "115"]
[WhiteType "human"]
[BlackType "human"]

1. e4 c5 2. Nf3 d6 3. d4 cxd4 4. Nxd4 Nf6 5. Nc3 g6 6. Be3 Bg7 7. f3 O-O 8.
Qd2 Nc6 9. Bc4 Bd7 10. O-O-O Qa5 11. Bb3 Rfc8 12. Kb1 Ne5 13. Bg5 Rc5 14.
f4 Nc4 15. Bxc4 Rxc4 16. Nb3 Qxc3 17. bxc3 {(Rh1-e1 Qc3xd2 Nb3xd2 Rc4-d4
c2-c3 Rd4-a4 Bg5xf6 Bg7xf6 Nd2-f3 Bf6-g7 Nf3-d4 Bg7-h6 g2-g3 Bd7-g4 Rd1-d3
Bh6-g7 e4-e5 d6xe5 f4xe5 Ra4-a6 Kb1-a1 h7-h6 Ka1-b1 Ra8-c8) -0.71/20 2}
Nxe4 18. Qe3 {(Bg5xe7 Ne4xd2+ Nb3xd2 Rc4xf4 Be7xd6 Rf4-f2 Rh1-g1 Ra8-c8
Nd2-b3 Bd7-g4 Rd1-f1 Rf2-e2 Rf1-f4 Bg4-e6 Nb3-d4 Bg7xd4 Rf4xd4 Rc8xc3
Rg1-c1 Re2xg2 Kb1-b2 Rc3-c8 Bd6-g3 Rg2-e2) -0.43/20 4} Nxc3+ 19. Kc1
{(Kb1-b2 Nc3xd1+ Kb2-c1 Nd1xe3 Kc1-d2 Ne3-d5 g2-g3 Ra8-c8 Rh1-c1 Bd7-f5
Kd2-e2 Rc4xc2+ Rc1xc2 Rc8xc2+ Ke2-f3 Rc2xh2 g3-g4 Rh2-h3+ Kf3-g2 Bf5xg4
Nb3-a5 Rh3-a3 Na5xb7) -0.47/16} Nxa2+ 20. Kb1 {(Kc1-d2 Bd7-f5 g2-g4 Bf5xg4
Rh1-g1 Bg4xd1 Rg1xd1 e7-e6 Kd2-e1 Na2-c3 Qe3-d3 Ra8-c8 Rd1-d2 h7-h6 Bg5-e7
Nc3-d5 Be7xd6 Bg7-c3 Bd6-e5 Nd5xf4 Be5xf4 Rc4xf4 Qd3-d7 Bc3xd2+ Nb3xd2)
-0.42/20 2} Nc3+ 21. Kc1 {(Kb1-b2 Nc3xd1+ Kb2-c1 Nd1xe3 Kc1-d2 Ne3-d5
Rh1-f1 Ra8-c8 Rf1-f2 h7-h6 Bg5-h4 Nd5xf4 Kd2-d1 Nf4-d5 Bh4xe7 Bd7-g4+
Kd1-c1 Nd5xe7) -0.55/15} Rac8 22. Rd2 {(Rh1-f1 Nc3xd1 Kc1xd1 Rc4xc2 f4-f5
Rc2-b2 f5xg6 Rb2-b1+ Nb3-c1 Bd7-g4+ Kd1-e1 Rc8xc1+ Qe3xc1 Rb1xc1+ Bg5xc1
h7xg6 Bc1-d2 Bg4-e6 Ke1-f2 b7-b5 Rf1-c1 Be6-c4 Rc1-e1 Bg7-f6 h2-h3 Bc4-d5
Kf2-g3 Kg8-f8 Re1-b1 Bf6-e5+ Kg3-f2) 0.00/19 5} Bf5 23. Qxa7 {(Rh1-g1)
-1.56/21 9} Rb4 24. Qa5 {(Bg5xe7 Rb4xb3 c2xb3 Nc3-b5+ Kc1-d1 Nb5xa7 Rd2xd6
Na7-c6 Be7-f6 Bf5-g4+ Kd1-e1 Bg7xf6 Rd6xf6 Rc8-d8 h2-h3 Rd8-d1+ Ke1-f2
Rd1xh1 h3xg4 Rh1-b1 Rf6-d6 Rb1-b2+ Kf2-f3 Rb2xb3+ Kf3-f2 Kg8-f8 Rd6-d7
b7-b5 Rd7-b7 Rb3-b2+ Kf2-f3 b5-b4 g2-g3 Rb2-c2 Kf3-e3 Rc2-c3+ Ke3-f2)
-1.54/22 3} Ra4 25. Qxa4 {(g2-g4 Ra4xa5 Nb3xa5 Rc8-a8 Na5-b3 Bf5-e6 Bg5xe7
Be6xb3 Rd2xd6 Bb3-e6 Rd6-d8+ Ra8xd8 Be7xd8 Be6xg4 Kc1-d2 Bg4-f5 Rh1-e1
Nc3-e4+ Kd2-e2 Ne4-c5 Ke2-d2 Nc5-e6 Bd8-b6 Bg7-f8 Re1-b1 Ne6xf4 Bb6-e3
Nf4-d5 Rb1xb7 Nd5xe3 Kd2xe3 Bf8-d6 Ke3-f3 Bd6xh2) -1.52/21 1} Nxa4 26. Bxe7
{(Kc1-b1 Bg7-c3 Rd2-f2 f7-f6 Bg5-h4 Bc3-b4 Rh1-g1 Bf5-e4 g2-g4 Be4-d5
Nb3-d2 b7-b5 Kb1-c1 Bb4-a3+ Kc1-d1 Na4-b2+ Kd1-c1 Nb2-d3+ Kc1-d1 Nd3xf2+
Bh4xf2 e7-e5 Bf2-e3 Ba3-b4 f4-f5 g6xf5) -1.57/20 3} Bb2+ {(g7b2 c1b1 c8c3
b1a2 f5c2 e7d6 c2b3 a2b1 b3e6 d2b2 e6f5 b1a1 a4b2 a1b2 c3c2 b2b3 c2g2 b3b4
g8g7) +1.86/23 7} 27. Kb1 {(Kc1-d1 Na4-c3+ Kd1-e1 Rc8-e8 Ke1-f1 Re8xe7
Rh1-g1 Bf5-e6 g2-g4 Kg8-g7 f4-f5 Be6-c4+ Kf1-g2 Nc3-e4 Rd2-d1 Bc4-e2 Rd1-e1
Be2xg4 f5xg6 h7xg6) -1.63/16} Rc3 {(c8c3 b1a2 f5c2 e7d6 c2b3 a2b1 b3e6 d2b2
e6f5 b1a1 a4b2 a1b2 c3c2 b2b3 c2g2 b3b4 g8g7) +1.86/24 9} 28. Ka2 {(Nb3-c1)
-1.64/22 2} Bxc2 {(f5c2e7d6 c2b3 a2b1 b3e6 d2b2 e6f5 b1a1 a4b2 a1b2 c3c2
b2b3 c2g2 b3b4 g8g7) +1.86/23 7} 29. Bxd6 {(h2-h4 Bc2xb3+ Ka2-b1 Bb3-a2+
Kb1xa2 Rc3-a3+ Ka2-b1 Ra3-a1+ Kb1-c2 Ra1xh1 Rd2xd6 Kg8-g7 Rd6-d5 Rh1-c1+
Kc2-b3 Na4-b6 Rd5-d8 Nb6-c4 Rd8-d3 Bb2-f6 Be7xf6+ Kg7xf6 Kb3-b4 Kf6-f5
h4-h5 f7-f6 h5xg6 h7xg6) -1.58/18} Bxb3+ {(c2b3a2b1 c3c6 d2b2 a4b2 b1b2
b3d5 d6b4 d5g2 h1d1 b7b5 d1d7 c6e6 b4c3 g8f8 b2b3 f8e8 d7a7 e6e2 b3b4 g2c6
a7c7) +2.16/23 7} 30. Kb1 {(Ka2-b1) -1.64/1} Rc6 {(c3c6 d2b2) +2.11/22 3}
31. Rxb2 {(Rh1-e1 Bb3-e6 Bd6-e5 Bb2xe5 f4xe5 b7-b5 Rd2-d8+ Kg8-g7 Rd8-b8
Be6-f5+ Kb1-a2 b5-b4 Rb8xb4 Rc6-a6 Rb4-d4 Na4-c5+ Ka2-b2 Nc5-d3+ Rd4xd3
Bf5xd3 g2-g4 Ra6-a4 Kb2-c3 Bd3-e4) -1.39/18 2} Nxb2 {(a4b2b1b2 b3d5 d6e5)
+2.15/21 3} 32. Kxb2 {(h2-h4 Nb2-d3 Bd6-a3 Bb3-c2+ Kb1-a2 b7-b5 Rh1-h3
Rc6-c4 Rh3xd3 Bc2xd3 Ba3-d6 Rc4-c2+ Ka2-b3 Rc2xg2 Kb3-b4 Rg2-g4 Kb4-c3
Bd3-c4 Kc3-d4 Rg4xh4 Kd4-e5 Kg8-g7 Bd6-e7 Rh4-h5+ Ke5-e4 Rh5-h2 Be7-b4
h7-h5) -1.43/19 1} Bd5 {(b3d5 d6e7 d5g2 h1d1 g2e4 d1d7 b7b5 b2b3 g8g7 d7d4
c6e6 e7d6 g7f6 b3b4 e4c6 d4d2 e6e1 h2h4) +2.08/18 1} 33. Be5 {(Bd6-b8
Bd5xg2) -1.33/19 1} f6 {(f7f6e5c3 d5g2 h1d1 b7b5 d1d7 h7h5 d7b7 g2f1 b2b3
f1d3 h2h3 g8f8 b7d7 d3e4 c3b4 f8g8) +2.18/18 1} 34. Bc3 {(Rh1-d1 f6xe5
Rd1xd5 e5xf4 Rd5-d4 Rc6-e6 Rd4xf4 Re6-e2+ Kb2-c3 Re2xg2 h2-h4 h7-h5 Rf4-f6
Rg2-g4 Rf6-b6 Kg8-g7 Rb6xb7+ Kg7-h6 Kc3-d3 Rg4xh4 Kd3-e3 g6-g5) -1.29/18 1}
Bxg2 {(d5g2) +2.20/17 1} 35. Rd1 {(Rh1-c1 Rc6-b6+ Kb2-c2 Kg8-f7 Kc2-d3
Rb6-b5 Bc3-e1 Bg2-c6 Rc1-a1 Kf7-e6 Ra1-a5 Rb5-b1 Be1-g3 Rb1-f1 Kd3-e2
Rf1-f3 Ra5-a8 h7-h5 Ra8-a2 Ke6-f5 Ra2-b2 h5-h4 Bg3xh4 Rf3xf4) -1.35/18 1}
Bh3 {(g2h3d1d4 c6b6 b2a3 g8f7 a3a4 h3f5 c3b4 h7h5 d4d2 f5e4 d2d4 b6a6 a4b3
e4f5 d4d5 f7e6) +2.49/16 1} 36. Rd4 {(Rd1-c1 Rc6-d6 Bc3-e1 Rd6-b6+ Kb2-c3
Bh3-g2 Rc1-d1 Rb6-c6+ Kc3-b4 Bg2-h3 Rd1-d3 Bh3-f5 Rd3-c3 Rc6-b6+ Kb4-a3
Kg8-f7 Be1-f2 Rb6-a6+ Ka3-b2 Bf5-e4 Rc3-c7+ Kf7-e6 Rc7xh7 Ke6-f5) -1.23/18
1} Rb6+ {(c6b6 b2a3 g8f7 a3a4 f7e6 a4a5 b6c6 a5b4 h7h5 d4c4 c6c4 b4c4 h3f1)
+2.69/15} 37. Rb4 {(Bc3-b4 Kg8-f7) -1.02/20 1} Rxb4+ {(b6b4 c3b4 g8f7 b2b3
b7b5 b4c5 h3f1 b3c3 f7e6 c5d4 f1c4) +3.42/20} 38. Bxb4 {(Kb2-c1 Rb4xf4
Kc1-d2 Kg8-f7 Bc3-a5 b7-b5 Kd2-c3 Rf4-c4+ Kc3-b2 f6-f5 Ba5-b6 f5-f4 Bb6-g1
f4-f3 Kb2-a3 Bh3-e6 Ka3-b2 Be6-f5 Kb2-b3 Rc4-c2 Kb3-b4 f3-f2 Bg1xf2 Rc2xf2
h2-h3 Bf5xh3 Kb4xb5 Kf7-e6 Kb5-c5 Rf2-f4 Kc5-b5 Ke6-d5) -1.09/22 1} Kf7
{(g8f7 b2c3 f7e6 c3c4 e6f5 b4d2 h3f1 c4c5 b7b5 d2c3 h7h6 h2h4 f1d3 c5b4
d3c4 c3d4 c4d3) +3.12/21} 39. Kc3 {(Bb4-c3 Kf7-e6 Kb2-c2 f6-f5 Kc2-b3
Bh3-f1 Kb3-b4 Ke6-d5 Bc3-f6 Kd5-e4 Bf6-g5 b7-b6 Kb4-c3 Ke4-e3 Bg5-h6 Ke3-f3
Bh6-g5 Kf3-g2 h2-h4 Kg2-g3 Kc3-b2 Kg3-f3 Kb2-c3 Kf3-g4) -1.31/23} Ke6
{(f7e6c3c4 e6f5 b4c3 h3f1 c4c5 b7b5 h2h4 f1d3 c3d4 d3c4 d4b2 h7h6 b2d4 c4d3
c5b4) +3.14/20 1} 40. Bf8 {(Kc3-d3 Ke6-f5 Bb4-c3 h7-h6 Kd3-e3 g6-g5 f4xg5
h6xg5 Bc3-d4 b7-b5 Bd4-b6 Kf5-g4 Ke3-f2 f6-f5 Bb6-c5 f5-f4 Bc5-f8 Kg4-h5
Kf2-f3 Bh3-e6 Kf3-e4 Kh5-g6 Bf8-e7 Be6-f5+ Ke4-e5 Bf5-h3) -1.34/21 1} b5
{(b7b5 c3d4) +2.99/18 1} 41. Kb4 {(Kc3-d2 Ke6-f5 Bf8-e7 h7-h6 Kd2-c3 Bh3-f1
Kc3-d4 g6-g5 f4xg5 h6xg5 Kd4-e3 Bf1-h3 Ke3-d4 Bh3-g2 Kd4-c5 Bg2-f1 Kc5-d4
Kf5-g6 Kd4-e3 Bf1-h3 Ke3-f2 Kg6-f5 Kf2-f3 Kf5-e5 Kf3-g3) -1.52/22 1} Bf1
{(h3f1 h2h4 e6f5 f8e7 f1c4 b4c5 c4e2 c5b4 h7h6 b4c5 h6h5) +2.96/19} 42. Kc5
{(Kb4-b3 Ke6-f5 Bf8-e7 h7-h6 Kb3-c2 g6-g5 Kc2-d2 g5xf4 Kd2-e1 Bf1-c4 Ke1-f2
Bc4-d5 Be7-d6 Kf5-e4 Bd6-e7 h6-h5 Be7-d6 Bd5-e6 Bd6-e7 Ke4-e5 Kf2-f3
Be6-d5+ Kf3-f2 Ke5-f5 Be7-b4 Bd5-c4 Bb4-e7 Kf5-e4) -1.70/27 1} Kf5 {(e6f5
f8d6 h7h6 c5d4 g6g5 f4g5 f6g5 d6b4 g5g4 b4d6 h6h5 d4d5 h5h4 d5d4 f1c4)
+3.03/18} 43. Bd6 {(Bf8-h6 Kf5-g4) -1.77/24} h6 {(h7h6 d6e7 g6g5 f4g5 h6g5
c5d5 f1e2 d5c5 f5e5 e7d6 e5e6 d6c7 e2c4) +3.06/18 1} 44. Be7 {(Bd6-c7
Kf5-e4) -1.59/26 1} g5 {(g6g5 f4g5 h6g5 c5d5 f1e2 d5c5 f5e6 e7d8 e6e5 d8c7
e5f5 h2h3 f5e6 c7d8 e6e5) +3.14/19} 45. fxg5 {(Kc5-d5 g5xf4 Kd5-d4 h6-h5
Be7-d8 Bf1-e2 Bd8-b6 Kf5-g4 Bb6-c7 f6-f5 Bc7-d6 Be2-f1 Bd6-e5 b5-b4 Be5-b8
Bf1-e2 h2-h3+ Kg4xh3 Bb8xf4 Kh3-g4 Bf4-e5 h5-h4 Kd4-e3 Be2-c4 Ke3-f2 f5-f4
Be5-d4) -1.85/22 1} hxg5 {(h6g5 c5d5 f1e2 d5c5 f5e6 e7d8 e6f7 c5b4 f7g6
d8b6 f6f5 b6c7 g6h5 c7e5 f5f4 b4c5 f4f3 c5b4 h5h4) +3.48/19 1} 46. Kd4
{(Be7-d6 Kf5-g4 Bd6-e7 f6-f5 Kc5-d5 f5-f4 Kd5-e4 Bf1-g2+ Ke4-d3 Kg4-h5
Kd3-e2 Bg2-d5 Ke2-f2 Bd5-e6 Be7-b4 Kh5-g4 Kf2-g2 Be6-d5+ Kg2-f2 Kg4-h4
Bb4-e7 Bd5-c4 Be7-b4 Bc4-e6 Kf2-g1 Be6-h3 Bb4-e1+ Kh4-h5) -1.85/24} Bc4
{(f1c4e7d8 f5g6 h2h3 c4f1 d4c5 f6f5 h3h4 g5g4 d8c7 g6h5 c7g3) +3.48/17 1}
47. Bd8 {(Be7-d6 Kf5-g4 Kd4-e3 f6-f5 Bd6-a3 f5-f4+ Ke3-f2 Kg4-h5 Kf2-f3
Bc4-d5+ Kf3-e2 Bd5-e6 Ke2-f2 Kh5-g4 Ba3-b4 Kg4-h4 Kf2-g1 Be6-h3 Bb4-e1+
Kh4-h5 Kg1-f2 Kh5-g6 Be1-b4 Kg6-f5 Bb4-f8 Kf5-g4 Bf8-e7 Kg4-h4 Kf2-e1
Bh3-e6 Ke1-f2) -1.99/28 1} Be2 {(c4e2 d4e3 e2f1 d8a5 f5g4 a5b4 f6f5 e3d4
f5f4 b4e7 f4f3) +3.73/17} 48. Be7 {(Kd4-c5 Kf5-g6 Kc5-d4 Be2-f1 Bd8-e7
f6-f5 Kd4-e3 f5-f4+ Ke3-e4 Bf1-g2+ Ke4-d3 Kg6-f5 Kd3-e2 Bg2-h3 Ke2-d3
Bh3-f1+ Kd3-d2 g5-g4 Kd2-e1 Bf1-h3 Be7-d6 Kf5-e4 Bd6-c5 Ke4-d5 Bc5-e7
Kd5-e5 Ke1-f2 Ke5-f5 Be7-b4 Kf5-e4 Bb4-c3) -2.25/25} Ke6 {(f5e6 e7b4 f6f5
h2h3 f5f4 b4a5 e2h5 d4c5 h5e8 c5d4 e6f5 a5e1 e8f7 d4c5) +3.95/21 1} 49. Bb4
{(Be7-a3 f6-f5 Kd4-e3 Be2-g4 Ke3-f2 Bg4-h3 Ba3-c1 f5-f4 Kf2-f3 Ke6-f5
Bc1-a3 g5-g4+ Kf3-e2 Kf5-e4 Ba3-c5 Ke4-d5 Bc5-f2 Bh3-g2 Bf2-h4 Kd5-e4
Bh4-e1 Bg2-h3 Be1-h4 Ke4-f5 Ke2-d3 Kf5-e5 Kd3-c2 Ke5-e4 Kc2-d2) -1.84/26 1}
f5 {(f6f5) +3.97/20 1} 50. Be1 {(Kd4-e3 Be2-f1 Ke3-f3 Bf1-h3 Bb4-d2 f5-f4
Kf3-e4 Bh3-g2+ Ke4-d4 Ke6-f5 Bd2-b4 Bg2-f1 Kd4-c3 Kf5-e4 Kc3-d2 Ke4-f3
Bb4-e7 g5-g4 Be7-d6 Kf3-e4 Bd6-c5 Bf1-h3 Bc5-b6 Ke4-d5 Kd2-c3 Bh3-f1 Kc3-b3
Kd5-e4 Kb3-a3 Ke4-f3 Bb6-c7) -1.84/26 1} f4 {(f5f4 h2h3 e6f5 d4c5 e2d3 c5d4
d3c4 e1f2 c4f1 h3h4 g5g4 h4h5 g4g3 f2e1 g3g2 e1f2 f1e2 h5h6 f5g6 d4e4 g6h6
e4f4 b5b4) +4.00/19} 51. h4 {(Be1-a5 Ke6-f5 Kd4-c3 Kf5-g4 Kc3-d2 Kg4-f3
Ba5-d8 g5-g4 Bd8-g5 b5-b4 Kd2-e1 Be2-c4 Ke1-d1 Kf3-e4 Bg5-d8 b4-b3 Bd8-b6
Ke4-f3 Bb6-c7 b3-b2 Kd1-c2 Kf3-e3 Bc7-b6+ Ke3-e2 Bb6-a7 f4-f3 Kc2xb2 f3-f2)
-1.84/24 1} gxh4 {(g5h4 e1h4 b5b4 h4e1 b4b3 e1c3 e2f3 c3a1 f3g2 a1c3 e6f5
c3b2 f4f3 d4e3 f5e6 b2c3) +4.46/21} 52. Bxh4 {(Be1-a5 h4-h3 Kd4-c3 h3-h2
Kc3-d2 h2-h1Q Kd2xe2 Qh1-e4+ Ke2-f2 Qe4-d4+ Kf2-g2 b5-b4 Kg2-f3 Qd4-d5+
Kf3-g4 Qd5-f5+ Kg4-h4 Qf5xa5 Kh4-g4 b4-b3 Kg4-h3 b3-b2 Kh3-g4 b2-b1Q
Kg4xf4) -3.87/18 1} b4 {(b5b4 h4e1 b4b3 e1c3 e2f3 d4c4 f3d5 c4d4 f4f3 d4e3
d5e4 c3a1 e6d5 a1g7 d5c4) +4.41/22} 53. Be1 {(Bh4-d8 f4-f3) -3.65/19 1} b3
{(b4b3 e1c3 e2f3 c3b2 f3c6 d4c4 f4f3 c4d3 e6d6 d3e3 c6e4 b2g7 d6d5)
+4.29/22} 54. Kc3 {(Be1-h4 b3-b2 Kd4-c3 b2-b1Q Bh4-g5 Qb1-c1+ Kc3-b3
Qc1-c4+ Kb3-b2 Be2-d1 Kb2-a1 Qc4-b3 Bg5-f6 Bd1-c2 Bf6-b2 Qb3-a4+ Bb2-a3
Qa4xa3+) -47.36/22 1} Bc4 {(e2c4 e1f2 e6d5 f2h4 d5e4 h4f2 c4d5 f2a7 d5c4)
+4.49/21 1} 55. Bf2 {(Kc3-b2) -50.65/21 1} Kd5 {(e6d5c3d2 f4f3 d2c3 d5e4
f2a7 e4f4 a7f2 f4g4) +4.51/20 1} 56. Kb2 {(Bf2-b6 Kd5-e4 Bb6-a7 f4-f3
Ba7-c5 Ke4-f4 Bc5-a3 Kf4-g4 Kc3xc4 f3-f2 Kc4-b4 f2-f1Q Kb4xb3 Qf1-b5+
Kb3-c3 Qb5-e5+ Kc3-b3 Qe5-e6+ Kb3-a4 Qe6-c4+ Ka4-a5 Qc4-d5+ Ka5-a4 Qd5-c6+
Ka4-b3 Qc6-f3+ Kb3-a2 Qf3-d5+ Ka2-b2 Qd5-g2+ Kb2-b3 Qg2-b7+ Kb3-c4 Qb7-e4+
Kc4-b3 Qe4-f5 Ba3-b4 Qf5-d3+ Kb3-b2 Qd3-c4 Bb4-d2) -50.65/26} Ke4
{(d5e4b2c3 e4f3 f2d4 c4d5 d4g1 f3e2 c3b2 f4f3 b2c1 f3f2 g1f2 e2f2) +6.31/19
1} 57. Bb6 {(Bf2-a7 f4-f3 Kb2-c3 Ke4-f4 Ba7-d4 Bc4-d5 Bd4-f2 Kf4-g4 Bf2-e3
Kg4-g3 Kc3-b2 Kg3-g2 Kb2-c3 f3-f2 Be3xf2 Kg2xf2 Kc3-d3 Bd5-c4+ Kd3-c3
Kf2-e3 Kc3xc4 b3-b2 Kc4-d5 Ke3-f4 Kd5-e6 Kf4-f3 Ke6-f7 b2-b1Q Kf7-e8
Qb1-b5+ Ke8-f8 Qb5-b8+ Kf8-f7 Qb8-b7+ Kf7-g6 Qb7-b1+ Kg6-f7 Qb1-f5+ Kf7-g7
Kf3-g4) -119.94/25 1} Kd3 {(e4d3 b6f2 d3e2 f2a7 c4f7 a7d4 f4f3 b2c3 f3f2
d4f2 e2f2 c3d3 f7d5) +7.34/22} 58. Bf2 {(Bb6-c7 f4-f3 Bc7-g3 Kd3-e2 Kb2-c3
f3-f2 Bg3xf2 Ke2xf2 Kc3-b2 Kf2-e3 Kb2-c3 Ke3-e4 Kc3-d2 Ke4-d4 Kd2-d1 Kd4-d3
Kd1-e1 b3-b2 Ke1-f2 Kd3-e4 Kf2-g3 Ke4-f5 Kg3-f2 b2-b1Q Kf2-e3 Qb1-e4+
Ke3-d2 Qe4-d3+ Kd2-c1 Qd3-e2 Kc1-b1 Bc4-b3 Kb1-a1 Qe2-a2+) -M17/27 White
resigns} 0-1
[pgn][Event "Romitest"]
[Site "MASTER"]
[Date "2017.12.14"]
[Round "12"]
[White "Stockfish_8_x64_popcnt"]
[Black "RomiChess64P3n2"]
[Result "0-1"]
[BlackElo "2200"]
[ECO "B79"]
[Opening "Sicilian"]
[Time "19:54:40"]
[Variation "Dragon, Yugoslav, Old Main Line, 12.h4 Ne5 13.Kb1 Nc4"]
[WhiteElo "2200"]
[TimeControl "10+1"]
[Termination "normal"]
[PlyCount "115"]
[WhiteType "human"]
[BlackType "human"]

1. e4 c5 2. Nf3 d6 3. d4 cxd4 4. Nxd4 Nf6 5. Nc3 g6 6. Be3 Bg7 7. f3 O-O 8.
Qd2 Nc6 9. Bc4 Bd7 10. O-O-O Qa5 11. Bb3 Rfc8 12. Kb1 Ne5 13. Bg5 Rc5 14.
f4 Nc4 15. Bxc4 Rxc4 16. Nb3 Qxc3 17. bxc3 {(Rh1-e1 Qc3xd2 Nb3xd2 Rc4-d4
c2-c3 Rd4-a4 Bg5xf6 Bg7xf6 Nd2-f3 Bf6-g7 Nf3-d4 Bg7-h6 g2-g3 Bd7-g4 Rd1-d3
Bh6-g7 e4-e5 d6xe5 f4xe5 Ra4-a6 Kb1-a1 h7-h6 Ka1-b1 Ra8-c8) -0.71/20 2}
Nxe4 18. Qe3 {(Bg5xe7 Ne4xd2+ Nb3xd2 Rc4xf4 Be7xd6 Rf4-f2 Rh1-g1 Ra8-c8
Nd2-b3 Bd7-g4 Rd1-f1 Rf2-e2 Rf1-f4 Bg4-e6 Nb3-d4 Bg7xd4 Rf4xd4 Rc8xc3
Rg1-c1 Re2xg2 Kb1-b2 Rc3-c8 Bd6-g3 Rg2-e2) -0.43/20 4} Nxc3+ 19. Kc1
{(Kb1-b2 Nc3xd1+ Kb2-c1 Nd1xe3 Kc1-d2 Ne3-d5 g2-g3 Ra8-c8 Rh1-c1 Bd7-f5
Kd2-e2 Rc4xc2+ Rc1xc2 Rc8xc2+ Ke2-f3 Rc2xh2 g3-g4 Rh2-h3+ Kf3-g2 Bf5xg4
Nb3-a5 Rh3-a3 Na5xb7) -0.47/16} Nxa2+ 20. Kb1 {(Kc1-d2 Bd7-f5 g2-g4 Bf5xg4
Rh1-g1 Bg4xd1 Rg1xd1 e7-e6 Kd2-e1 Na2-c3 Qe3-d3 Ra8-c8 Rd1-d2 h7-h6 Bg5-e7
Nc3-d5 Be7xd6 Bg7-c3 Bd6-e5 Nd5xf4 Be5xf4 Rc4xf4 Qd3-d7 Bc3xd2+ Nb3xd2)
-0.42/20 2} Nc3+ 21. Kc1 {(Kb1-b2 Nc3xd1+ Kb2-c1 Nd1xe3 Kc1-d2 Ne3-d5
Rh1-f1 Ra8-c8 Rf1-f2 h7-h6 Bg5-h4 Nd5xf4 Kd2-d1 Nf4-d5 Bh4xe7 Bd7-g4+
Kd1-c1 Nd5xe7) -0.55/15} Rac8 22. Rd2 {(Rh1-f1 Nc3xd1 Kc1xd1 Rc4xc2 f4-f5
Rc2-b2 f5xg6 Rb2-b1+ Nb3-c1 Bd7-g4+ Kd1-e1 Rc8xc1+ Qe3xc1 Rb1xc1+ Bg5xc1
h7xg6 Bc1-d2 Bg4-e6 Ke1-f2 b7-b5 Rf1-c1 Be6-c4 Rc1-e1 Bg7-f6 h2-h3 Bc4-d5
Kf2-g3 Kg8-f8 Re1-b1 Bf6-e5+ Kg3-f2) 0.00/19 5} Bf5 23. Qxa7 {(Rh1-g1)
-1.56/21 9} Rb4 24. Qa5 {(Bg5xe7 Rb4xb3 c2xb3 Nc3-b5+ Kc1-d1 Nb5xa7 Rd2xd6
Na7-c6 Be7-f6 Bf5-g4+ Kd1-e1 Bg7xf6 Rd6xf6 Rc8-d8 h2-h3 Rd8-d1+ Ke1-f2
Rd1xh1 h3xg4 Rh1-b1 Rf6-d6 Rb1-b2+ Kf2-f3 Rb2xb3+ Kf3-f2 Kg8-f8 Rd6-d7
b7-b5 Rd7-b7 Rb3-b2+ Kf2-f3 b5-b4 g2-g3 Rb2-c2 Kf3-e3 Rc2-c3+ Ke3-f2)
-1.54/22 3} Ra4 25. Qxa4 {(g2-g4 Ra4xa5 Nb3xa5 Rc8-a8 Na5-b3 Bf5-e6 Bg5xe7
Be6xb3 Rd2xd6 Bb3-e6 Rd6-d8+ Ra8xd8 Be7xd8 Be6xg4 Kc1-d2 Bg4-f5 Rh1-e1
Nc3-e4+ Kd2-e2 Ne4-c5 Ke2-d2 Nc5-e6 Bd8-b6 Bg7-f8 Re1-b1 Ne6xf4 Bb6-e3
Nf4-d5 Rb1xb7 Nd5xe3 Kd2xe3 Bf8-d6 Ke3-f3 Bd6xh2) -1.52/21 1} Nxa4 26. Bxe7
{(Kc1-b1 Bg7-c3 Rd2-f2 f7-f6 Bg5-h4 Bc3-b4 Rh1-g1 Bf5-e4 g2-g4 Be4-d5
Nb3-d2 b7-b5 Kb1-c1 Bb4-a3+ Kc1-d1 Na4-b2+ Kd1-c1 Nb2-d3+ Kc1-d1 Nd3xf2+
Bh4xf2 e7-e5 Bf2-e3 Ba3-b4 f4-f5 g6xf5) -1.57/20 3} Bb2+ {(g7b2 c1b1 c8c3
b1a2 f5c2 e7d6 c2b3 a2b1 b3e6 d2b2 e6f5 b1a1 a4b2 a1b2 c3c2 b2b3 c2g2 b3b4
g8g7) +1.86/23 7} 27. Kb1 {(Kc1-d1 Na4-c3+ Kd1-e1 Rc8-e8 Ke1-f1 Re8xe7
Rh1-g1 Bf5-e6 g2-g4 Kg8-g7 f4-f5 Be6-c4+ Kf1-g2 Nc3-e4 Rd2-d1 Bc4-e2 Rd1-e1
Be2xg4 f5xg6 h7xg6) -1.63/16} Rc3 {(c8c3 b1a2 f5c2 e7d6 c2b3 a2b1 b3e6 d2b2
e6f5 b1a1 a4b2 a1b2 c3c2 b2b3 c2g2 b3b4 g8g7) +1.86/24 9} 28. Ka2 {(Nb3-c1)
-1.64/22 2} Bxc2 {(f5c2e7d6 c2b3 a2b1 b3e6 d2b2 e6f5 b1a1 a4b2 a1b2 c3c2
b2b3 c2g2 b3b4 g8g7) +1.86/23 7} 29. Bxd6 {(h2-h4 Bc2xb3+ Ka2-b1 Bb3-a2+
Kb1xa2 Rc3-a3+ Ka2-b1 Ra3-a1+ Kb1-c2 Ra1xh1 Rd2xd6 Kg8-g7 Rd6-d5 Rh1-c1+
Kc2-b3 Na4-b6 Rd5-d8 Nb6-c4 Rd8-d3 Bb2-f6 Be7xf6+ Kg7xf6 Kb3-b4 Kf6-f5
h4-h5 f7-f6 h5xg6 h7xg6) -1.58/18} Bxb3+ {(c2b3a2b1 c3c6 d2b2 a4b2 b1b2
b3d5 d6b4 d5g2 h1d1 b7b5 d1d7 c6e6 b4c3 g8f8 b2b3 f8e8 d7a7 e6e2 b3b4 g2c6
a7c7) +2.16/23 7} 30. Kb1 {(Ka2-b1) -1.64/1} Rc6 {(c3c6 d2b2) +2.11/22 3}
31. Rxb2 {(Rh1-e1 Bb3-e6 Bd6-e5 Bb2xe5 f4xe5 b7-b5 Rd2-d8+ Kg8-g7 Rd8-b8
Be6-f5+ Kb1-a2 b5-b4 Rb8xb4 Rc6-a6 Rb4-d4 Na4-c5+ Ka2-b2 Nc5-d3+ Rd4xd3
Bf5xd3 g2-g4 Ra6-a4 Kb2-c3 Bd3-e4) -1.39/18 2} Nxb2 {(a4b2b1b2 b3d5 d6e5)
+2.15/21 3} 32. Kxb2 {(h2-h4 Nb2-d3 Bd6-a3 Bb3-c2+ Kb1-a2 b7-b5 Rh1-h3
Rc6-c4 Rh3xd3 Bc2xd3 Ba3-d6 Rc4-c2+ Ka2-b3 Rc2xg2 Kb3-b4 Rg2-g4 Kb4-c3
Bd3-c4 Kc3-d4 Rg4xh4 Kd4-e5 Kg8-g7 Bd6-e7 Rh4-h5+ Ke5-e4 Rh5-h2 Be7-b4
h7-h5) -1.43/19 1} Bd5 {(b3d5 d6e7 d5g2 h1d1 g2e4 d1d7 b7b5 b2b3 g8g7 d7d4
c6e6 e7d6 g7f6 b3b4 e4c6 d4d2 e6e1 h2h4) +2.08/18 1} 33. Be5 {(Bd6-b8
Bd5xg2) -1.33/19 1} f6 {(f7f6e5c3 d5g2 h1d1 b7b5 d1d7 h7h5 d7b7 g2f1 b2b3
f1d3 h2h3 g8f8 b7d7 d3e4 c3b4 f8g8) +2.18/18 1} 34. Bc3 {(Rh1-d1 f6xe5
Rd1xd5 e5xf4 Rd5-d4 Rc6-e6 Rd4xf4 Re6-e2+ Kb2-c3 Re2xg2 h2-h4 h7-h5 Rf4-f6
Rg2-g4 Rf6-b6 Kg8-g7 Rb6xb7+ Kg7-h6 Kc3-d3 Rg4xh4 Kd3-e3 g6-g5) -1.29/18 1}
Bxg2 {(d5g2) +2.20/17 1} 35. Rd1 {(Rh1-c1 Rc6-b6+ Kb2-c2 Kg8-f7 Kc2-d3
Rb6-b5 Bc3-e1 Bg2-c6 Rc1-a1 Kf7-e6 Ra1-a5 Rb5-b1 Be1-g3 Rb1-f1 Kd3-e2
Rf1-f3 Ra5-a8 h7-h5 Ra8-a2 Ke6-f5 Ra2-b2 h5-h4 Bg3xh4 Rf3xf4) -1.35/18 1}
Bh3 {(g2h3d1d4 c6b6 b2a3 g8f7 a3a4 h3f5 c3b4 h7h5 d4d2 f5e4 d2d4 b6a6 a4b3
e4f5 d4d5 f7e6) +2.49/16 1} 36. Rd4 {(Rd1-c1 Rc6-d6 Bc3-e1 Rd6-b6+ Kb2-c3
Bh3-g2 Rc1-d1 Rb6-c6+ Kc3-b4 Bg2-h3 Rd1-d3 Bh3-f5 Rd3-c3 Rc6-b6+ Kb4-a3
Kg8-f7 Be1-f2 Rb6-a6+ Ka3-b2 Bf5-e4 Rc3-c7+ Kf7-e6 Rc7xh7 Ke6-f5) -1.23/18
1} Rb6+ {(c6b6 b2a3 g8f7 a3a4 f7e6 a4a5 b6c6 a5b4 h7h5 d4c4 c6c4 b4c4 h3f1)
+2.69/15} 37. Rb4 {(Bc3-b4 Kg8-f7) -1.02/20 1} Rxb4+ {(b6b4 c3b4 g8f7 b2b3
b7b5 b4c5 h3f1 b3c3 f7e6 c5d4 f1c4) +3.42/20} 38. Bxb4 {(Kb2-c1 Rb4xf4
Kc1-d2 Kg8-f7 Bc3-a5 b7-b5 Kd2-c3 Rf4-c4+ Kc3-b2 f6-f5 Ba5-b6 f5-f4 Bb6-g1
f4-f3 Kb2-a3 Bh3-e6 Ka3-b2 Be6-f5 Kb2-b3 Rc4-c2 Kb3-b4 f3-f2 Bg1xf2 Rc2xf2
h2-h3 Bf5xh3 Kb4xb5 Kf7-e6 Kb5-c5 Rf2-f4 Kc5-b5 Ke6-d5) -1.09/22 1} Kf7
{(g8f7 b2c3 f7e6 c3c4 e6f5 b4d2 h3f1 c4c5 b7b5 d2c3 h7h6 h2h4 f1d3 c5b4
d3c4 c3d4 c4d3) +3.12/21} 39. Kc3 {(Bb4-c3 Kf7-e6 Kb2-c2 f6-f5 Kc2-b3
Bh3-f1 Kb3-b4 Ke6-d5 Bc3-f6 Kd5-e4 Bf6-g5 b7-b6 Kb4-c3 Ke4-e3 Bg5-h6 Ke3-f3
Bh6-g5 Kf3-g2 h2-h4 Kg2-g3 Kc3-b2 Kg3-f3 Kb2-c3 Kf3-g4) -1.31/23} Ke6
{(f7e6c3c4 e6f5 b4c3 h3f1 c4c5 b7b5 h2h4 f1d3 c3d4 d3c4 d4b2 h7h6 b2d4 c4d3
c5b4) +3.14/20 1} 40. Bf8 {(Kc3-d3 Ke6-f5 Bb4-c3 h7-h6 Kd3-e3 g6-g5 f4xg5
h6xg5 Bc3-d4 b7-b5 Bd4-b6 Kf5-g4 Ke3-f2 f6-f5 Bb6-c5 f5-f4 Bc5-f8 Kg4-h5
Kf2-f3 Bh3-e6 Kf3-e4 Kh5-g6 Bf8-e7 Be6-f5+ Ke4-e5 Bf5-h3) -1.34/21 1} b5
{(b7b5 c3d4) +2.99/18 1} 41. Kb4 {(Kc3-d2 Ke6-f5 Bf8-e7 h7-h6 Kd2-c3 Bh3-f1
Kc3-d4 g6-g5 f4xg5 h6xg5 Kd4-e3 Bf1-h3 Ke3-d4 Bh3-g2 Kd4-c5 Bg2-f1 Kc5-d4
Kf5-g6 Kd4-e3 Bf1-h3 Ke3-f2 Kg6-f5 Kf2-f3 Kf5-e5 Kf3-g3) -1.52/22 1} Bf1
{(h3f1 h2h4 e6f5 f8e7 f1c4 b4c5 c4e2 c5b4 h7h6 b4c5 h6h5) +2.96/19} 42. Kc5
{(Kb4-b3 Ke6-f5 Bf8-e7 h7-h6 Kb3-c2 g6-g5 Kc2-d2 g5xf4 Kd2-e1 Bf1-c4 Ke1-f2
Bc4-d5 Be7-d6 Kf5-e4 Bd6-e7 h6-h5 Be7-d6 Bd5-e6 Bd6-e7 Ke4-e5 Kf2-f3
Be6-d5+ Kf3-f2 Ke5-f5 Be7-b4 Bd5-c4 Bb4-e7 Kf5-e4) -1.70/27 1} Kf5 {(e6f5
f8d6 h7h6 c5d4 g6g5 f4g5 f6g5 d6b4 g5g4 b4d6 h6h5 d4d5 h5h4 d5d4 f1c4)
+3.03/18} 43. Bd6 {(Bf8-h6 Kf5-g4) -1.77/24} h6 {(h7h6 d6e7 g6g5 f4g5 h6g5
c5d5 f1e2 d5c5 f5e5 e7d6 e5e6 d6c7 e2c4) +3.06/18 1} 44. Be7 {(Bd6-c7
Kf5-e4) -1.59/26 1} g5 {(g6g5 f4g5 h6g5 c5d5 f1e2 d5c5 f5e6 e7d8 e6e5 d8c7
e5f5 h2h3 f5e6 c7d8 e6e5) +3.14/19} 45. fxg5 {(Kc5-d5 g5xf4 Kd5-d4 h6-h5
Be7-d8 Bf1-e2 Bd8-b6 Kf5-g4 Bb6-c7 f6-f5 Bc7-d6 Be2-f1 Bd6-e5 b5-b4 Be5-b8
Bf1-e2 h2-h3+ Kg4xh3 Bb8xf4 Kh3-g4 Bf4-e5 h5-h4 Kd4-e3 Be2-c4 Ke3-f2 f5-f4
Be5-d4) -1.85/22 1} hxg5 {(h6g5 c5d5 f1e2 d5c5 f5e6 e7d8 e6f7 c5b4 f7g6
d8b6 f6f5 b6c7 g6h5 c7e5 f5f4 b4c5 f4f3 c5b4 h5h4) +3.48/19 1} 46. Kd4
{(Be7-d6 Kf5-g4 Bd6-e7 f6-f5 Kc5-d5 f5-f4 Kd5-e4 Bf1-g2+ Ke4-d3 Kg4-h5
Kd3-e2 Bg2-d5 Ke2-f2 Bd5-e6 Be7-b4 Kh5-g4 Kf2-g2 Be6-d5+ Kg2-f2 Kg4-h4
Bb4-e7 Bd5-c4 Be7-b4 Bc4-e6 Kf2-g1 Be6-h3 Bb4-e1+ Kh4-h5) -1.85/24} Bc4
{(f1c4e7d8 f5g6 h2h3 c4f1 d4c5 f6f5 h3h4 g5g4 d8c7 g6h5 c7g3) +3.48/17 1}
47. Bd8 {(Be7-d6 Kf5-g4 Kd4-e3 f6-f5 Bd6-a3 f5-f4+ Ke3-f2 Kg4-h5 Kf2-f3
Bc4-d5+ Kf3-e2 Bd5-e6 Ke2-f2 Kh5-g4 Ba3-b4 Kg4-h4 Kf2-g1 Be6-h3 Bb4-e1+
Kh4-h5 Kg1-f2 Kh5-g6 Be1-b4 Kg6-f5 Bb4-f8 Kf5-g4 Bf8-e7 Kg4-h4 Kf2-e1
Bh3-e6 Ke1-f2) -1.99/28 1} Be2 {(c4e2 d4e3 e2f1 d8a5 f5g4 a5b4 f6f5 e3d4
f5f4 b4e7 f4f3) +3.73/17} 48. Be7 {(Kd4-c5 Kf5-g6 Kc5-d4 Be2-f1 Bd8-e7
f6-f5 Kd4-e3 f5-f4+ Ke3-e4 Bf1-g2+ Ke4-d3 Kg6-f5 Kd3-e2 Bg2-h3 Ke2-d3
Bh3-f1+ Kd3-d2 g5-g4 Kd2-e1 Bf1-h3 Be7-d6 Kf5-e4 Bd6-c5 Ke4-d5 Bc5-e7
Kd5-e5 Ke1-f2 Ke5-f5 Be7-b4 Kf5-e4 Bb4-c3) -2.25/25} Ke6 {(f5e6 e7b4 f6f5
h2h3 f5f4 b4a5 e2h5 d4c5 h5e8 c5d4 e6f5 a5e1 e8f7 d4c5) +3.95/21 1} 49. Bb4
{(Be7-a3 f6-f5 Kd4-e3 Be2-g4 Ke3-f2 Bg4-h3 Ba3-c1 f5-f4 Kf2-f3 Ke6-f5
Bc1-a3 g5-g4+ Kf3-e2 Kf5-e4 Ba3-c5 Ke4-d5 Bc5-f2 Bh3-g2 Bf2-h4 Kd5-e4
Bh4-e1 Bg2-h3 Be1-h4 Ke4-f5 Ke2-d3 Kf5-e5 Kd3-c2 Ke5-e4 Kc2-d2) -1.84/26 1}
f5 {(f6f5) +3.97/20 1} 50. Be1 {(Kd4-e3 Be2-f1 Ke3-f3 Bf1-h3 Bb4-d2 f5-f4
Kf3-e4 Bh3-g2+ Ke4-d4 Ke6-f5 Bd2-b4 Bg2-f1 Kd4-c3 Kf5-e4 Kc3-d2 Ke4-f3
Bb4-e7 g5-g4 Be7-d6 Kf3-e4 Bd6-c5 Bf1-h3 Bc5-b6 Ke4-d5 Kd2-c3 Bh3-f1 Kc3-b3
Kd5-e4 Kb3-a3 Ke4-f3 Bb6-c7) -1.84/26 1} f4 {(f5f4 h2h3 e6f5 d4c5 e2d3 c5d4
d3c4 e1f2 c4f1 h3h4 g5g4 h4h5 g4g3 f2e1 g3g2 e1f2 f1e2 h5h6 f5g6 d4e4 g6h6
e4f4 b5b4) +4.00/19} 51. h4 {(Be1-a5 Ke6-f5 Kd4-c3 Kf5-g4 Kc3-d2 Kg4-f3
Ba5-d8 g5-g4 Bd8-g5 b5-b4 Kd2-e1 Be2-c4 Ke1-d1 Kf3-e4 Bg5-d8 b4-b3 Bd8-b6
Ke4-f3 Bb6-c7 b3-b2 Kd1-c2 Kf3-e3 Bc7-b6+ Ke3-e2 Bb6-a7 f4-f3 Kc2xb2 f3-f2)
-1.84/24 1} gxh4 {(g5h4 e1h4 b5b4 h4e1 b4b3 e1c3 e2f3 c3a1 f3g2 a1c3 e6f5
c3b2 f4f3 d4e3 f5e6 b2c3) +4.46/21} 52. Bxh4 {(Be1-a5 h4-h3 Kd4-c3 h3-h2
Kc3-d2 h2-h1Q Kd2xe2 Qh1-e4+ Ke2-f2 Qe4-d4+ Kf2-g2 b5-b4 Kg2-f3 Qd4-d5+
Kf3-g4 Qd5-f5+ Kg4-h4 Qf5xa5 Kh4-g4 b4-b3 Kg4-h3 b3-b2 Kh3-g4 b2-b1Q
Kg4xf4) -3.87/18 1} b4 {(b5b4 h4e1 b4b3 e1c3 e2f3 d4c4 f3d5 c4d4 f4f3 d4e3
d5e4 c3a1 e6d5 a1g7 d5c4) +4.41/22} 53. Be1 {(Bh4-d8 f4-f3) -3.65/19 1} b3
{(b4b3 e1c3 e2f3 c3b2 f3c6 d4c4 f4f3 c4d3 e6d6 d3e3 c6e4 b2g7 d6d5)
+4.29/22} 54. Kc3 {(Be1-h4 b3-b2 Kd4-c3 b2-b1Q Bh4-g5 Qb1-c1+ Kc3-b3
Qc1-c4+ Kb3-b2 Be2-d1 Kb2-a1 Qc4-b3 Bg5-f6 Bd1-c2 Bf6-b2 Qb3-a4+ Bb2-a3
Qa4xa3+) -47.36/22 1} Bc4 {(e2c4 e1f2 e6d5 f2h4 d5e4 h4f2 c4d5 f2a7 d5c4)
+4.49/21 1} 55. Bf2 {(Kc3-b2) -50.65/21 1} Kd5 {(e6d5c3d2 f4f3 d2c3 d5e4
f2a7 e4f4 a7f2 f4g4) +4.51/20 1} 56. Kb2 {(Bf2-b6 Kd5-e4 Bb6-a7 f4-f3
Ba7-c5 Ke4-f4 Bc5-a3 Kf4-g4 Kc3xc4 f3-f2 Kc4-b4 f2-f1Q Kb4xb3 Qf1-b5+
Kb3-c3 Qb5-e5+ Kc3-b3 Qe5-e6+ Kb3-a4 Qe6-c4+ Ka4-a5 Qc4-d5+ Ka5-a4 Qd5-c6+
Ka4-b3 Qc6-f3+ Kb3-a2 Qf3-d5+ Ka2-b2 Qd5-g2+ Kb2-b3 Qg2-b7+ Kb3-c4 Qb7-e4+
Kc4-b3 Qe4-f5 Ba3-b4 Qf5-d3+ Kb3-b2 Qd3-c4 Bb4-d2) -50.65/26} Ke4
{(d5e4b2c3 e4f3 f2d4 c4d5 d4g1 f3e2 c3b2 f4f3 b2c1 f3f2 g1f2 e2f2) +6.31/19
1} 57. Bb6 {(Bf2-a7 f4-f3 Kb2-c3 Ke4-f4 Ba7-d4 Bc4-d5 Bd4-f2 Kf4-g4 Bf2-e3
Kg4-g3 Kc3-b2 Kg3-g2 Kb2-c3 f3-f2 Be3xf2 Kg2xf2 Kc3-d3 Bd5-c4+ Kd3-c3
Kf2-e3 Kc3xc4 b3-b2 Kc4-d5 Ke3-f4 Kd5-e6 Kf4-f3 Ke6-f7 b2-b1Q Kf7-e8
Qb1-b5+ Ke8-f8 Qb5-b8+ Kf8-f7 Qb8-b7+ Kf7-g6 Qb7-b1+ Kg6-f7 Qb1-f5+ Kf7-g7
Kf3-g4) -119.94/25 1} Kd3 {(e4d3 b6f2 d3e2 f2a7 c4f7 a7d4 f4f3 b2c3 f3f2
d4f2 e2f2 c3d3 f7d5) +7.34/22} 58. Bf2 {(Bb6-c7 f4-f3 Bc7-g3 Kd3-e2 Kb2-c3
f3-f2 Bg3xf2 Ke2xf2 Kc3-b2 Kf2-e3 Kb2-c3 Ke3-e4 Kc3-d2 Ke4-d4 Kd2-d1 Kd4-d3
Kd1-e1 b3-b2 Ke1-f2 Kd3-e4 Kf2-g3 Ke4-f5 Kg3-f2 b2-b1Q Kf2-e3 Qb1-e4+
Ke3-d2 Qe4-d3+ Kd2-c1 Qd3-e2 Kc1-b1 Bc4-b3 Kb1-a1 Qe2-a2+) -M17/27 White
resigns} 0-1
[/pgn]

RomiChess won its first game against SF 8 after just 6 games with the black pieces! SF 8 was using 8 threads. Romi was using both types of learning. What more do you guys at the top want? Start giving your fans what they want! :idea:
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
shrapnel
Posts: 1339
Joined: Fri Nov 02, 2012 9:43 am
Location: New Delhi, India

Re: Understanding the power of reinforcement learning

Post by shrapnel »

Michael Sherwin wrote:
FWCC wrote:Is Romi a Winboard Engine only? Is there a UCI version?Or must I use Wb2Uci adapter?
UCI does not send a result command when a game ends unless that has changed in the last 10 years. Since Romi uses the result command sent by the winboard protocol to trigger learning RomiChess is a winboard engine.

Arena works in computer vs computer games but does not send a result command in human vs computer unless that has changed in the last 10 years. For computer vs human games use Winboard. Then either play to checkmate or resign or if the human is winning Romi will resign. Winboard will only send the result command if the game is officially over.
It would be great if you made a UCI Engine as Komodo and Houdini don't seem very interested as it doesn't appear that AlphaZero will be commercialized any time soon. So, they have no competition, and we the end-users are forced to use bullock-cart Age chess engines.
i7 5960X @ 4.1 Ghz, 64 GB G.Skill RipJaws RAM, Twin Asus ROG Strix OC 11 GB Geforce 2080 Tis
User avatar
Ozymandias
Posts: 1535
Joined: Sun Oct 25, 2009 2:30 am

Re: Understanding the power of reinforcement learning

Post by Ozymandias »

shrapnel wrote:the end-users are forced to use bullock-cart Age chess engines.
I wouldn't put it like that, but I share the sentiment. Hopefully, some part of the enthusiasm that's take over this board (almost half of the new threads have something to do with AZ) will trickle down to, at least, one of the programmers involved in SF, or any of its forks. I'd be surprised if we don't see some short of learning, being adopted by a fish, sooner rather than later.
giovanni
Posts: 142
Joined: Wed Jul 08, 2015 12:30 pm

Re: Understanding the power of reinforcement learning

Post by giovanni »

Michael Sherwin wrote:
giovanni wrote:
Michael Sherwin wrote:The following position is a bit dated as most strong engines will find the best move using normal search. However 30 years ago, just throwing a dart at the calendar, the best engines could not find the best move. Even RomiChess in 2006 could not find it, Phalanx 22 could. So this example is a bit dated. In 2005 this position is one that I hoped Romi could find with normal search but that did not happen. After I added reinforcement learning and before I added MSMD learning I tested Romi playing the black pieces to see if Romi could find the best move after training a number of games. It took Romi 40 games to find the best move but when she found it (learned it due to reinforcement learning) she won every game. I know that TSCP could also find this winning move after enough training games in the position. The point is if TSCP had reinforcement learning and won a game against SF in this position it would look superhuman. It would look like TSCP thought like a human and did the 'impossible'. It would look as incredible as AlphaZ except it would have done it on equal hardware.

[d]r5k1/pp1bppbp/3p1np1/q5B1/2r1PP2/1NN5/PPPQ2PP/1K1R3R b - - 1 16
Thanks Michael. Could you elaborate a little bit more on this post? I mean how reiforcement learning applies to this position, what is MSMD, etc?
MSMD is as the above post indicates, Monkey See Monkey Do, learning. It merely plays winning lines from past experience upto 180 ply in RomiChess. So Romi can play some very deep lines and use virtually no time on the clock.

How reinforcement learning applies to the Dragon position above is that unless black finds the winning move and plays some other move instead black's position is losing. On learning being triggered the entire game is overlaid onto the tree stored on the hard disk. For each node stored on the hard disk there is the reinforcement value. The nodes of the winning side are adjusted upwards and the nodes of the losing side are adjusted downward. This means that bad moves can gain value from this and good moves can lose value. Over time this corrects itself. Since higher nodes are given a larger reward/penalty higher nodes affect the search sooner but eventually the values backpropagate to the root of the current position and when all the alternative moves to the winning move look worse than Qxc3 it will play Qxc3 and win and since those moves then get rewarded it then just plays the winning move as long as it continues to win. But then that line being moved to the hash before each search the winning move starts to affect the search from an earlier node in the game. As long as there is any subtree stored in the learn file no matter how small that subtree might be those nodes with its accumulated scores affect the search.

The learning in RomiChess was intended as a sparring partner for humans. Winboard sends a result command in human versus computer games. Arena does not or did not anyway 11 years ago. So set up a position or start from the starting position and play against Romi and if you beat Romi Romi will play differently. Change sides and Romi will play your winning moves against you and then you will have to win and teach Romi better moves. Then switch sides again and if you win Romi will learn yet more. If Romi wins then Romi is the teacher. It is hard to put into words but basically the engine and human teach each other and is especially good for learning a chosen opening. Anyway in the last 11 years I have received zero reports of Romi being used like intended. That is a shame really because there is no other training system like it in existence as far as I know.
Hi, Mike. Thanks again for your help with Romichess. I ran a short 92 games match between Romichess and Stockfish (4 cores) at 90''+ 2'' increment. Programs were playing alternatively both sides of the Dragon position. Romichess started to play Qxc3 were early, probably because it saw that it was successful in Stockfish hands. Despite this move, however first few games were quite unsuccessful for Romichess, but toward the end of the match things changed drastically:

00=00=00=0=000=000=0=1=0=0=00=01=111=0==11=11=

The sample size is very small, but the performance of the first 15 games (17%) versus the last 15 ones (73%) already gets a significant p (8-e06) when running a Ttest among the two distributions, though I haven't corrected for multiple hypothesis. It seems to me that clearly you have got something and I was wondering how you would recommend to extract knowledge from this kind of matches and make it available to chess players. I mean in this case it easy pretty clear that Qxc3 is the starting move, but then is there a way to automatically get a tree of the whole plan that Black needs to follow?
Thanks again for your help.
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Understanding the power of reinforcement learning

Post by Michael Sherwin »

giovanni wrote:
Michael Sherwin wrote:
giovanni wrote:
Michael Sherwin wrote:The following position is a bit dated as most strong engines will find the best move using normal search. However 30 years ago, just throwing a dart at the calendar, the best engines could not find the best move. Even RomiChess in 2006 could not find it, Phalanx 22 could. So this example is a bit dated. In 2005 this position is one that I hoped Romi could find with normal search but that did not happen. After I added reinforcement learning and before I added MSMD learning I tested Romi playing the black pieces to see if Romi could find the best move after training a number of games. It took Romi 40 games to find the best move but when she found it (learned it due to reinforcement learning) she won every game. I know that TSCP could also find this winning move after enough training games in the position. The point is if TSCP had reinforcement learning and won a game against SF in this position it would look superhuman. It would look like TSCP thought like a human and did the 'impossible'. It would look as incredible as AlphaZ except it would have done it on equal hardware.

[d]r5k1/pp1bppbp/3p1np1/q5B1/2r1PP2/1NN5/PPPQ2PP/1K1R3R b - - 1 16
Thanks Michael. Could you elaborate a little bit more on this post? I mean how reiforcement learning applies to this position, what is MSMD, etc?
MSMD is as the above post indicates, Monkey See Monkey Do, learning. It merely plays winning lines from past experience upto 180 ply in RomiChess. So Romi can play some very deep lines and use virtually no time on the clock.

How reinforcement learning applies to the Dragon position above is that unless black finds the winning move and plays some other move instead black's position is losing. On learning being triggered the entire game is overlaid onto the tree stored on the hard disk. For each node stored on the hard disk there is the reinforcement value. The nodes of the winning side are adjusted upwards and the nodes of the losing side are adjusted downward. This means that bad moves can gain value from this and good moves can lose value. Over time this corrects itself. Since higher nodes are given a larger reward/penalty higher nodes affect the search sooner but eventually the values backpropagate to the root of the current position and when all the alternative moves to the winning move look worse than Qxc3 it will play Qxc3 and win and since those moves then get rewarded it then just plays the winning move as long as it continues to win. But then that line being moved to the hash before each search the winning move starts to affect the search from an earlier node in the game. As long as there is any subtree stored in the learn file no matter how small that subtree might be those nodes with its accumulated scores affect the search.

The learning in RomiChess was intended as a sparring partner for humans. Winboard sends a result command in human versus computer games. Arena does not or did not anyway 11 years ago. So set up a position or start from the starting position and play against Romi and if you beat Romi Romi will play differently. Change sides and Romi will play your winning moves against you and then you will have to win and teach Romi better moves. Then switch sides again and if you win Romi will learn yet more. If Romi wins then Romi is the teacher. It is hard to put into words but basically the engine and human teach each other and is especially good for learning a chosen opening. Anyway in the last 11 years I have received zero reports of Romi being used like intended. That is a shame really because there is no other training system like it in existence as far as I know.
Hi, Mike. Thanks again for your help with Romichess. I ran a short 92 games match between Romichess and Stockfish (4 cores) at 90''+ 2'' increment. Programs were playing alternatively both sides of the Dragon position. Romichess started to play Qxc3 were early, probably because it saw that it was successful in Stockfish hands. Despite this move, however first few games were quite unsuccessful for Romichess, but toward the end of the match things changed drastically:

00=00=00=0=000=000=0=1=0=0=00=01=111=0==11=11=

The sample size is very small, but the performance of the first 15 games (17%) versus the last 15 ones (73%) already gets a significant p (8-e06) when running a Ttest among the two distributions, though I haven't corrected for multiple hypothesis. It seems to me that clearly you have got something and I was wondering how you would recommend to extract knowledge from this kind of matches and make it available to chess players. I mean in this case it easy pretty clear that Qxc3 is the starting move, but then is there a way to automatically get a tree of the whole plan that Black needs to follow?
Thanks again for your help.
I never really thought as far ahead to think about mining the learn file for plans. But, why not! Just write a standalone program that analyses the learn file to extract what you want.

RomiChess really does serve the purpose it was designed for. If I were a younger man I'd pick up the ball and run for another 20 yards. I got the first down and now it is time for someone else to get the touchdown. From all the clues I think the A-Team already kicked a field goal. :D
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through