On-line engine blitz tourney August

Modern Times · Post by **Modern Times** » Fri Aug 30, 2013 11:14 am

I don't think your engine is using a book ?

flok · Post by **flok** » Fri Aug 30, 2013 11:38 am

Modern Times wrote:I don't think your engine is using a book ?

That's correct. I've disabled it while I'm chasing the evaluation (if any) bug.

flok · Post by **flok** » Fri Aug 30, 2013 1:16 pm

hgm wrote:If you want to know how it does it would be much easier for you to just let it play a match against Fairy-Max locally. Then you are not dependent on people visiting the ICS and willing to challenge you...

Yeah I've already let it play tens of games to phalanx via your server (guest-something, lost them all) but I need an experienced eye who can judge if it got better or not.

hgm · Post by **hgm** » Fri Aug 30, 2013 1:56 pm

Telling whether you got better or not is not efficiently done by playing against an engine that crushes you, and then using an 'expert eye' to tell you why. It is done by playing against an opponent of similar strength, and see if you can now beat it where before you lost.

In addition, it is not clear to me why you would want to use the ICS to play Phalanx against DeepBrutePos. Is that because you are running them on different computers? It is usually much easier to play them against each other on the same computer (without invoking an ICS). Especially if you need a few thousand games to get a statistically significant result, rather than just determining which engine was more lucky this time.

So forget about Phalanx and Crafty. Can you beat Fairy-Max? Can you beat TSCP? Can you beat HoiChess?

If you play Chess yourself, your own eye should be experienced enough to tell you why DeepBrutePos lost a game (say the one in the blitz tourney against micro-Max), as it quite simply blundered material away in extremely simple ways (e.g. allowing a skewer on K+Q by a Rook). If you really want to improve it, you should figure out why it allowed that to happen. Why didn't it pick one of the many moves that prevented it?

flok · Post by **flok** » Fri Aug 30, 2013 4:40 pm

hgm wrote:Telling whether you got better or not is not efficiently done by playing against an engine that crushes you, and then using an 'expert eye' to tell you why. It is done by playing against an opponent of similar strength, and see if you can now beat it where before you lost.

Yes. I have not found yet an engine playing as bad as the program itself. Playing against itself is not a good test as "invalid code playing "invalid code" would give totally different results than "invalid code" against "correct code".

hgm wrote:In addition, it is not clear to me why you would want to use the ICS to play Phalanx against DeepBrutePos. Is that because you are running them on different computers?

Exactly that.

hgm wrote:It is usually much easier to play them against each other on the same computer (without invoking an ICS). Especially if you need a few thousand games to get a statistically significant result, rather than just determining which engine was more lucky this time.

Yes. On the other hand: it is so weak right now that it always fails anyway.

hgm wrote:So forget about Phalanx and Crafty. Can you beat Fairy-Max? Can you beat TSCP? Can you beat HoiChess?

Did not try tscp but against all the others it fails miserably.

hgm wrote:If you play Chess yourself, your own eye should be experienced enough to tell you why DeepBrutePos lost a game (say the one in the blitz tourney against micro-Max), as it quite simply blundered material away in extremely simple ways (e.g. allowing a skewer on K+Q by a Rook). If you really want to improve it, you should figure out why it allowed that to happen. Why didn't it pick one of the many moves that prevented it?

I've begun with that.
1 core/thread (altough I've verified that with multiple threads it behaves the same), maximum depth 1, then 2 and so on. Show score for each move while it goes through them.

By the way, regarding the pseudo-code at https://en.wikipedia.org/wiki/Alpha%E2% ... Pseudocode I've read at some places that the evaluation at depth x is always from the root-color(!) point of view, not from the point of view of the color which is about to move at depth (which I read at other places). What do you think is the right way?

hgm · Post by **hgm** » Fri Aug 30, 2013 5:15 pm

It depends on if you use minimax or negamax. The most common implementation is negamax (because then you do not need separate code for odd and even plies), and in that case you have to evaluate from the POV of the side to move in the position you evaluate.

What I found the most efficient way to debug a search is to put in some conditional print statements,

if(PATH) printf(...);

where PATH is a condition that is only true along the path to a certain node. Like

#define PATH ply == 1 || path[1] == MOVE1 && (level == 2 || path[[2] == MOVE2 && (level == 3 || ...) )

where path holds the move played at level == i. Then in a position where it plays a strange move I first set PATH to only the root (level == 1), so that after every move it searched it prints level, depth, iteration depth (if you do IID), the move, its score and the maximum score so far. That allows you to see which move seems to have a wrong score (the good move could have too low a score, or the played move too high a score). Then you extend PATH with the move with the wrong score, to see how things went in the position after it etc. By following the path that obviously has a wrong score (e.g. it loses you a Queen, but the score is +1), you sooner or later end up in a node that produces the erroneous score (usually by forgetting to search the critical move, e.g. the capture of the Queen, e.g. because you messed up move sorting and it gets pushed out of the move list, or is replaced by a duplicate of another move, or whatever).

flok · Post by **flok** » Fri Aug 30, 2013 5:22 pm

hgm wrote:It depends on if you use minimax or negamax. The most common implementation is negamax (because then you do not need separate code for odd and even plies), and in that case you have to evaluate from the POV of the side to move in the position you evaluate.

And for minimax?

What I found the most efficient way to debug a search is to put in some conditional print statements,

Yeah that's what I do too. printf is for me the way to go.

hgm · Post by **hgm** » Fri Aug 30, 2013 5:44 pm

flok wrote:And for minimax?

For minimax you would need the score of the root side to move, if you max in the root and min on the next ply, etc.

Yeah that's what I do too. printf is for me the way to go.

Yes, but the important thing is to limit the output to what is relevant. In a recursive search, every printf you put in can be called a million times per second. You only want that printf to print in a few nodes.

jshriver · Post by **jshriver** » Fri Aug 30, 2013 8:57 pm

flok wrote: This version is playing on H.G.Muller's server. So if anyone is willing to give it a try? It still is extremely weak (no idea why) but maybe someone can determine if it became better.

I have phalanx running against it now. If I had to guess, your eval function is broken in terms of piece value for queen and for attack vs defense.

The games start often with the queen coming out full force with bishop or knight to maintain the center very strongly.

But around 8-12 ply in loses your queen in a very unbalanced trade almost like it values the queen as a pawn or bishop/knight.

Hope that helps!

-Josh

flok · Post by **flok** » Sun Sep 01, 2013 1:22 pm

jshriver wrote:I have phalanx running against it now. If I had to guess, your eval function is broken in terms of piece value for queen and for attack vs defense.

The games start often with the queen coming out full force with bishop or knight to maintain the center very strongly.

But around 8-12 ply in loses your queen in a very unbalanced trade almost like it values the queen as a pawn or bishop/knight.

I've added code which shows me the "path" taken while searching. Other engines have this by default, mine did not.
Anyway it seems to follow very strange paths:

Code: Select all

1 -230 444 1594556 H7-H5 F1-B5 H8-H7 D1-F3 E7-E6

(response to e2-e4).
It does not make sense to me.

On-line engine blitz tourney August

Re: On-line engine blitz tourney August

Re: On-line engine blitz tourney August

Re: On-line engine blitz tourney August

Re: On-line engine blitz tourney August

Re: On-line engine blitz tourney August

Re: On-line engine blitz tourney August

Re: On-line engine blitz tourney August

Re: On-line engine blitz tourney August

Re: On-line engine blitz tourney August

Re: On-line engine blitz tourney August