On-line engine blitz tourney August
Moderator: Ras
-
Modern Times
- Posts: 3803
- Joined: Thu Jun 07, 2012 11:02 pm
Re: On-line engine blitz tourney August
I don't think your engine is using a book ?
-
flok
Re: On-line engine blitz tourney August
That's correct. I've disabled it while I'm chasing the evaluation (if any) bug.Modern Times wrote:I don't think your engine is using a book ?
-
flok
Re: On-line engine blitz tourney August
Yeah I've already let it play tens of games to phalanx via your server (guest-something, lost them all) but I need an experienced eye who can judge if it got better or not.hgm wrote:If you want to know how it does it would be much easier for you to just let it play a match against Fairy-Max locally. Then you are not dependent on people visiting the ICS and willing to challenge you...
-
hgm
- Posts: 28454
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: On-line engine blitz tourney August
Telling whether you got better or not is not efficiently done by playing against an engine that crushes you, and then using an 'expert eye' to tell you why. It is done by playing against an opponent of similar strength, and see if you can now beat it where before you lost.
In addition, it is not clear to me why you would want to use the ICS to play Phalanx against DeepBrutePos. Is that because you are running them on different computers? It is usually much easier to play them against each other on the same computer (without invoking an ICS). Especially if you need a few thousand games to get a statistically significant result, rather than just determining which engine was more lucky this time.
So forget about Phalanx and Crafty. Can you beat Fairy-Max? Can you beat TSCP? Can you beat HoiChess?
If you play Chess yourself, your own eye should be experienced enough to tell you why DeepBrutePos lost a game (say the one in the blitz tourney against micro-Max), as it quite simply blundered material away in extremely simple ways (e.g. allowing a skewer on K+Q by a Rook). If you really want to improve it, you should figure out why it allowed that to happen. Why didn't it pick one of the many moves that prevented it?
In addition, it is not clear to me why you would want to use the ICS to play Phalanx against DeepBrutePos. Is that because you are running them on different computers? It is usually much easier to play them against each other on the same computer (without invoking an ICS). Especially if you need a few thousand games to get a statistically significant result, rather than just determining which engine was more lucky this time.
So forget about Phalanx and Crafty. Can you beat Fairy-Max? Can you beat TSCP? Can you beat HoiChess?
If you play Chess yourself, your own eye should be experienced enough to tell you why DeepBrutePos lost a game (say the one in the blitz tourney against micro-Max), as it quite simply blundered material away in extremely simple ways (e.g. allowing a skewer on K+Q by a Rook). If you really want to improve it, you should figure out why it allowed that to happen. Why didn't it pick one of the many moves that prevented it?
-
flok
Re: On-line engine blitz tourney August
Yes. I have not found yet an engine playing as bad as the program itself. Playing against itself is not a good test as "invalid code playing "invalid code" would give totally different results than "invalid code" against "correct code".hgm wrote:Telling whether you got better or not is not efficiently done by playing against an engine that crushes you, and then using an 'expert eye' to tell you why. It is done by playing against an opponent of similar strength, and see if you can now beat it where before you lost.
Exactly that.hgm wrote:In addition, it is not clear to me why you would want to use the ICS to play Phalanx against DeepBrutePos. Is that because you are running them on different computers?
Yes. On the other hand: it is so weak right now that it always fails anyway.hgm wrote:It is usually much easier to play them against each other on the same computer (without invoking an ICS). Especially if you need a few thousand games to get a statistically significant result, rather than just determining which engine was more lucky this time.
Did not try tscp but against all the others it fails miserably.hgm wrote:So forget about Phalanx and Crafty. Can you beat Fairy-Max? Can you beat TSCP? Can you beat HoiChess?
I've begun with that.hgm wrote:If you play Chess yourself, your own eye should be experienced enough to tell you why DeepBrutePos lost a game (say the one in the blitz tourney against micro-Max), as it quite simply blundered material away in extremely simple ways (e.g. allowing a skewer on K+Q by a Rook). If you really want to improve it, you should figure out why it allowed that to happen. Why didn't it pick one of the many moves that prevented it?
1 core/thread (altough I've verified that with multiple threads it behaves the same), maximum depth 1, then 2 and so on. Show score for each move while it goes through them.
By the way, regarding the pseudo-code at https://en.wikipedia.org/wiki/Alpha%E2% ... Pseudocode I've read at some places that the evaluation at depth x is always from the root-color(!) point of view, not from the point of view of the color which is about to move at depth (which I read at other places). What do you think is the right way?
-
hgm
- Posts: 28454
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: On-line engine blitz tourney August
It depends on if you use minimax or negamax. The most common implementation is negamax (because then you do not need separate code for odd and even plies), and in that case you have to evaluate from the POV of the side to move in the position you evaluate.
What I found the most efficient way to debug a search is to put in some conditional print statements,
if(PATH) printf(...);
where PATH is a condition that is only true along the path to a certain node. Like
#define PATH ply == 1 || path[1] == MOVE1 && (level == 2 || path[[2] == MOVE2 && (level == 3 || ...) )
where path holds the move played at level == i. Then in a position where it plays a strange move I first set PATH to only the root (level == 1), so that after every move it searched it prints level, depth, iteration depth (if you do IID), the move, its score and the maximum score so far. That allows you to see which move seems to have a wrong score (the good move could have too low a score, or the played move too high a score). Then you extend PATH with the move with the wrong score, to see how things went in the position after it etc. By following the path that obviously has a wrong score (e.g. it loses you a Queen, but the score is +1), you sooner or later end up in a node that produces the erroneous score (usually by forgetting to search the critical move, e.g. the capture of the Queen, e.g. because you messed up move sorting and it gets pushed out of the move list, or is replaced by a duplicate of another move, or whatever).
What I found the most efficient way to debug a search is to put in some conditional print statements,
if(PATH) printf(...);
where PATH is a condition that is only true along the path to a certain node. Like
#define PATH ply == 1 || path[1] == MOVE1 && (level == 2 || path[[2] == MOVE2 && (level == 3 || ...) )
where path holds the move played at level == i. Then in a position where it plays a strange move I first set PATH to only the root (level == 1), so that after every move it searched it prints level, depth, iteration depth (if you do IID), the move, its score and the maximum score so far. That allows you to see which move seems to have a wrong score (the good move could have too low a score, or the played move too high a score). Then you extend PATH with the move with the wrong score, to see how things went in the position after it etc. By following the path that obviously has a wrong score (e.g. it loses you a Queen, but the score is +1), you sooner or later end up in a node that produces the erroneous score (usually by forgetting to search the critical move, e.g. the capture of the Queen, e.g. because you messed up move sorting and it gets pushed out of the move list, or is replaced by a duplicate of another move, or whatever).
-
flok
Re: On-line engine blitz tourney August
And for minimax?hgm wrote:It depends on if you use minimax or negamax. The most common implementation is negamax (because then you do not need separate code for odd and even plies), and in that case you have to evaluate from the POV of the side to move in the position you evaluate.
Yeah that's what I do too. printf is for me the way to go.What I found the most efficient way to debug a search is to put in some conditional print statements,
-
hgm
- Posts: 28454
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: On-line engine blitz tourney August
For minimax you would need the score of the root side to move, if you max in the root and min on the next ply, etc.flok wrote:And for minimax?
Yes, but the important thing is to limit the output to what is relevant. In a recursive search, every printf you put in can be called a million times per second. You only want that printf to print in a few nodes.Yeah that's what I do too. printf is for me the way to go.
-
jshriver
- Posts: 1371
- Joined: Wed Mar 08, 2006 9:41 pm
- Location: Morgantown, WV, USA
Re: On-line engine blitz tourney August
I have phalanx running against it now. If I had to guess, your eval function is broken in terms of piece value for queen and for attack vs defense.flok wrote: This version is playing on H.G.Muller's server. So if anyone is willing to give it a try? It still is extremely weak (no idea why) but maybe someone can determine if it became better.
The games start often with the queen coming out full force with bishop or knight to maintain the center very strongly.
But around 8-12 ply in loses your queen in a very unbalanced trade almost like it values the queen as a pawn or bishop/knight.
Hope that helps!
-Josh
-
flok
Re: On-line engine blitz tourney August
I've added code which shows me the "path" taken while searching. Other engines have this by default, mine did not.jshriver wrote:I have phalanx running against it now. If I had to guess, your eval function is broken in terms of piece value for queen and for attack vs defense.
The games start often with the queen coming out full force with bishop or knight to maintain the center very strongly.
But around 8-12 ply in loses your queen in a very unbalanced trade almost like it values the queen as a pawn or bishop/knight.
Anyway it seems to follow very strange paths:
Code: Select all
1 -230 444 1594556 H7-H5 F1-B5 H8-H7 D1-F3 E7-E6It does not make sense to me.