Protector (dev) vs Critter 1.6a, short time control match.

Isaac · Post by **Isaac** » Mon Mar 10, 2014 4:43 pm

I've downloaded the latest dev version of Protector at http://sourceforge.net/p/protector/code/HEAD/tree/, I have compiled it with gcc 4.8.1 (with ./configure, make and make install). I have tested it in the command-line, seemed to work fine.
I wanted to test it against Critter 1.6a at a short time control (30 seconds per game +0.05 second per move). I used cutechess in the following way:

Code: Select all

./cutechess-cli -rounds 1000 -tournament gauntlet -repeat -pgnout crittervsprotector.pgn -resign movecount=3 score=400 -draw movenumber=34 movecount=20 score=20 -concurrency 1 -openings file=/home/isaac/Downloads/fishtest-master/worker/testing/8moves_v3.pgn format=pgn order=random plies=16 -engine name=critter cmd=/usr/games/critter -engine name=protector cmd=/usr/games/protector -each proto=uci option.Threads=1 option.OwnBook=false option.Hash=128  tc=30+0.05

However the result I am getting seems way off of what it should: 31 wins, 1 loss, 13 draws for Critter against Protector. This translates as an elo difference of 279.6 (I don't know how to calculate the error bars yet) and a likelihood of superiority of 100%.
I am wondering whether I've done something wrong, either with the compilation or the cutechess settings.
I'd appreciate if someone else would try something similar.
My computer is an Intel E6300 @1.86 Ghz, core 2 duo.
You can download all the games in pgn format at

Code: Select all

http://speedy.sh/AN94h/crittervsprotector.pgn

.

felix · Post by **felix** » Mon Mar 10, 2014 11:38 pm

Hi Isaak

The result is ok. Critter is much better than Protector.

BUT seeing your parameters your resign = 400 is way too strict.

Why don't you try with Arena to tune your parameters? This way you can see how the engines evaluate and enjoy the games.

Some time ago i used resign = -650 but surprise! some games with that score were draw and was a bad adjudication!! (Some engines tends to be over optimistic or over pesimistic)

Now i am again on resign = -900 and everything is better

Isaac · Post by **Isaac** » Tue Mar 11, 2014 12:48 am

felix wrote:Hi Isaak

The result is ok. Critter is much better than Protector.

BUT seeing your parameters your resign = 400 is way too strict.

Why don't you try with Arena to tune your parameters? This way you can see how the engines evaluate and enjoy the games.

Some time ago i used resign = -650 but surprise! some games with that score were draw and was a bad adjudication!! (Some engines tends to be over optimistic or over pesimistic)

Now i am again on resign = -900 and everything is better

Ola Félix,
I am not so sure that the result is ok, because of the "short time control" CCRL rating list: Protector 1.5.0 has a rating of 3090 and critter has a rating of 3228. The difference is less than 150 bayeselo. Hmm I don't know how this would translate in elo...
And I am using a stronger version than the "1.5.0" so I'd expect a closer rating difference. But I get 279 elo difference!

Resign at 400 cp means that both engines must evaluate the position as at least +4 for 3 consecutive moves. Of course sometimes they are both wrong and the game is wrongly adjudicated as a win instead of a draw, but this is rare and overall should not modify the rating difference noticeably. For instance, they use the exact same criteria for adjudication in the fishtest (for Stockfish improvement) and the system seems to work fine. Out of the 45 games that I've uploaded played in that match, I am sure you will find 0 of such wrongly adjudicated games.

And why I don't use Arena? Because I've read that it's not good to test engine vs engine, especially at fast time controls (my time control is very fast), because engines can lose on time and there are other problems as well I believe.
I've heard that cutechess is the best way to test engine vs engine. I think they use cutechess also in the TCEC tournament.
I can still enjoy to replay the games once they are finished, although I don't see what the engines are thinking. (I watch the TCEC not to get bored

).

felix · Post by **felix** » Tue Mar 11, 2014 12:57 am

My mistake

You use 30 seconds + 5 millisecons increment. I use 30 seconds + 500 milliseconds.

Well. Some time ago, to define what milliseconds to use, i try 5 milliseconds like fishtest does. Houdini had no problem at all, but stockfish starts to do very stupid moves. Never saw a time lost, but a lot of check-mate on 1!!

I think the problem is not Arena. The engines have some issues.

Anyway, I make a lot of tournaments with my engines all the time and believe me: +200 rating points at fast time controls make a BIG difference.

Seeing the engines play is better than tv!

Good luck on your tests!

Isaac · Post by **Isaac** » Tue Mar 11, 2014 3:19 am

felix wrote: Well. Some time ago, to define what milliseconds to use, i try 5 milliseconds like fishtest does. Houdini had no problem at all, but stockfish starts to do very stupid moves. Never saw a time lost, but a lot of check-mate on 1!!

Yes, I've experienced and reported the exact same behavior with Stockfish. Now it may have been fixed, I don't know.
On my tablet for example (with droidfish as interface), I gave Stockfish 15 seconds for the whole game with no time increment per move. When it reached 0 second, it would not forfeit on time due to the interface droidfish, but it would start making very stupid moves and eventually get checkmated by.... me.

felix wrote:I think the problem is not Arena. The engines have some issues.

Anyway, I make a lot of tournaments with my engines all the time and believe me: +200 rating points at fast time controls make a BIG difference.

Seeing the engines play is better than tv!

Good luck on your tests!

Arena has some problems. There is at least one reason why they don't use Arena for fast games rating lists. I am not sure about the CCRL but I wouldn't be surprised if they avoid Arena.
+200 elo in short time control or long time control should not lead to different scores. Yes, it is a big difference. What I get is a difference of 279 elo when I was expecting a difference under 100 elo. There's something wrong for sure but I cannot figure it out yet.

I also share your opinion about watching tv being more boring than watching engines playing each other. I actually enjoy watching engines playing against each other a lot. The TCEC is really great for me.

Have fun.

Protector (dev) vs Critter 1.6a, short time control match.

Protector (dev) vs Critter 1.6a, short time control match.

Re: Protector (dev) vs Critter 1.6a, short time control matc

Re: Protector (dev) vs Critter 1.6a, short time control matc

Re: Protector (dev) vs Critter 1.6a, short time control matc

Re: Protector (dev) vs Critter 1.6a, short time control matc