Performances of engines in the Endgame

Uri Blass · Post by **Uri Blass** » Fri Feb 01, 2013 6:25 am

jdart wrote:This is not very surprising. Search in Stockfish is very selective and it reaches extreme depths in the endgame very quickly. Extra depth will probably help it there, compared to other engines, and probably helps more than the loss in precision from the selectivity hurts. (I don't much about the eval function of Stockfish, certainly not compared to other programs, so I can't say how much of a factor that is. I know it has recognizers for some common endgames, but that is pretty common nowadays).

--Jon

It is not surprising that stockfish is relatively better in the endgame but
I do not think that it is because of the fact that it reachs bigger depths because reaching bigger depths by stockfish happens also in the opening.

It seems that for some reason stockfish has a superior search in the endgame.

It may be interesting to have a test suite of random mates in 40 from tablebases to see if stockfish perform better than other programs against tablebases(meaning winning against tablebases more often than other programs).

My guess is that stockfish is going to perform better but it may be interesting to find out if I am right.

I would like to see if the advantage of stockfish in tablebase positions becomes bigger at longer time control.

My guess may be wrong but my guess is that we are going to have something like this after testing 1000 positions

1)Stockfish 2.3.1 850 wins(3 minutes per move)
2)Houdini3 800 wins(3 minutes per move)
3)Houdini3 500 wins(3 seconds per move)
3)Stockfish 2.3.1 450 wins(3 seconds per move)

Master Om · Post by **Master Om** » Fri Feb 01, 2013 10:30 am

Fantastic Job!!. I had doubt that Komodo was better in endgames , now am confirmed.
After Looking at the Post I feel Komodo and Hiarcs are most Balanced engine.!!

Ajedrecista · Post by **Ajedrecista** » Fri Feb 01, 2013 11:45 am

Hello Kai:

Laskos wrote:
Jouni wrote:Please can You repeat with tablebases too see which one benefits most from them?
Tablebases effect is a pain to measure. Here is an example
Code: Select all
    Program                            Score     %     Elo     +   -    Draws

  1 Houdini 3  Scorpio         : 24159.5/48300  50.0   3000    2   2   64.4 %
  2 Houdini 3                  : 24140.5/48300  50.0   3000    2   2   64.4 %
They amount to no more than 2 Elo points, maybe a bit more if playing from late endgame positions, but I am not going to try to measure the effect. Tablebases are good for analysis mostly.

I could not agree more with you: the effect of EGTB is negligible in strength during game play but maybe useful for GUI adjudications; infinite analysis with few pieces OTB might benefit a little more.

Code: Select all

LOS_and_Elo_uncertainties_calculator, ® 2012-2013.

----------------------------------------------------------------
Calculation of Elo uncertainties in a match between two engines:
----------------------------------------------------------------

(The input and output data is referred to the first engine).

Please write down non-negative integers.

Maximum number of games supported: 2147483647.

Write down the number of wins (up to 1825361100):

8607

Write down the number of loses (up to 1825361100):

8588

Write down the number of draws (up to 2147466452):

31105

 Write down the confidence level (in percentage) between 65% and 99.9% (it will be rounded up to 0.01%):

95

Write down the clock rate of the CPU (in GHz), only for timing the elapsed time of the calculations:

3

---------------------------------------
Elo interval for 95.00 % confidence:

Elo rating difference:      0.14 Elo

Lower rating difference:   -1.71 Elo
Upper rating difference:    1.99 Elo

Lower bound uncertainty:   -1.85 Elo
Upper bound uncertainty:    1.85 Elo
Average error:        +/-   1.85 Elo

K = (average error)*[sqrt(n)] =  406.31

Elo interval: ]  -1.71,    1.99[
---------------------------------------

Number of games of the match:     48300
Score: 50.02 %
Elo rating difference:    0.14 Elo
Draw ratio: 64.40 %

************************************************************************
        Sample standard deviation:  0.1357 % of the points of the match.
1.9600 sample standard deviations:  0.2661 % of the points of the match.

                 (Corresponding to 95.00 % confidence).
************************************************************************

 Error bars were calculated with two-sided tests; values are rounded up to 0.01 Elo, or 0.01 in the case of K.

-------------------------------------------------------------------
Calculation of likelihood of superiority (LOS) in a one-sided test:
-------------------------------------------------------------------

LOS (taking into account draws) is always calculated, if possible.

LOS (not taking into account draws) is only calculated if wins + loses < 16001.

LOS (average value) is calculated only when LOS (not taking into account draws) is calculated.
__________________________________________

LOS:  55.76 % (taking into account draws).
__________________________________________

These values of LOS are rounded up to 0.01%

End of the calculations. Approximated elapsed time:   34 ms.

Thanks for using LOS_and_Elo_uncertainties_calculator. Press Enter to exit.

LOS ~ 55.76% after 48300 games. 1.96-sigma are 128.5 points more less while the difference of scores between engines are only 19 points. My programme obtains ~ +0.14 ± 1.85 Elo for 95% confidence. Nobody can assure that EGTB give a dramatical increase of Elo rating... but of course it is nice to know when is a draw and when is mate in 34!

Regards from Spain.

Ajedrecista.

Ajedrecista · Post by **Ajedrecista** » Fri Feb 01, 2013 12:41 pm

Hello:

I recalled that there was an endgame tournament in ImmortalChess Forum a while ago. I searched in the archive of that forum and I found it:

Endgame tournament 2011 from Kevin

I know that is more than one year old and some of the engines were updated by their authors. Anyway, here is the final result:

At that time Komodo 3 won that tournament and Houdini 2 was in the middle of the list. SF did not do well.

I do not have the PGN file of that tournament but I can calculate the rating performances of that Round Robin tournament with my own Fortran 95 programme: I have supposed that each engine played 414 games.

Code: Select all

Elo_ratings_for_Round_Robin_tournaments, ® 2012.

Write down the full name of the Notepad (including .txt), up to 64 characters:

Points.txt

Write down the number of engines of the Round Robin tournament (up to 64):

7

Write down the number of games of each engine (up to 400000):

414

Write down your desired mean of ratings:

3000

Write down the clock rate of the CPU (in GHz), only for timing the elapsed time of the calculations:

3

End of the calculations. Elo performances have been saved in Elo_rating_list.txt file.

Approximated elapsed time:   20 ms.

Thanks for using Elo_ratings_for_Round_Robin_tournaments. Press Enter to exit.

Code: Select all

Round Robin with  7 engines and    414 games per engine.
Total number of games:      1449 games.
 
 3022.42 (engine 01).
 3003.49 (engine 02).
 3002.79 (engine 03).
 3001.39 (engine 04).
 2993.00 (engine 05).
 2988.80 (engine 06).
 2988.10 (engine 07).
 
Mean of ratings:  3000.00 Elo.

I used an average rating of 3000; these rating performances should be very similar to EloSTAT ratings.

Regards from Spain.

Ajedrecista.

Laskos · Post by **Laskos** » Fri Feb 01, 2013 2:04 pm

Ajedrecista wrote:
I know that is more than one year old and some of the engines were updated by their authors. Anyway, here is the final result:

At that time Komodo 3 won that tournament and Houdini 2 was in the middle of the list. SF did not do well.

We have to keep in mind that even if an engine overperforms in the endgame, it can still be below a generally stronger engine which underperforms in the endgame. The only strange thing is the performance of Komodo 3, but it could be a statistical fluke, all the engines here are within 2SD error margins.

gladius · Post by **gladius** » Fri Feb 01, 2013 2:11 pm

Laskos wrote:
Ajedrecista wrote:
I know that is more than one year old and some of the engines were updated by their authors. Anyway, here is the final result:

At that time Komodo 3 won that tournament and Houdini 2 was in the middle of the list. SF did not do well.

We have to keep in mind that even if an engine overperforms in the endgame, it can still be below a generally stronger engine which underperforms in the endgame. The only strange thing is the performance of Komodo 3, but it could be a statistical fluke, all the engines here are within 2SD error margins.

Also, there were definitely changes that affected the endgame play between 2.1.1 and 2.3.1. Especially a new eval term for king-pawn distance in the endgame that could have helped significantly.

Laskos · Post by **Laskos** » Tue Feb 05, 2013 9:42 am

Uri Blass wrote:
jdart wrote:This is not very surprising. Search in Stockfish is very selective and it reaches extreme depths in the endgame very quickly. Extra depth will probably help it there, compared to other engines, and probably helps more than the loss in precision from the selectivity hurts. (I don't much about the eval function of Stockfish, certainly not compared to other programs, so I can't say how much of a factor that is. I know it has recognizers for some common endgames, but that is pretty common nowadays).

--Jon
It is not surprising that stockfish is relatively better in the endgame but
I do not think that it is because of the fact that it reachs bigger depths because reaching bigger depths by stockfish happens also in the opening.

It seems that for some reason stockfish has a superior search in the endgame.

It may be interesting to have a test suite of random mates in 40 from tablebases to see if stockfish perform better than other programs against tablebases(meaning winning against tablebases more often than other programs).

My guess is that stockfish is going to perform better but it may be interesting to find out if I am right.

I would like to see if the advantage of stockfish in tablebase positions becomes bigger at longer time control.

My guess may be wrong but my guess is that we are going to have something like this after testing 1000 positions

1)Stockfish 2.3.1 850 wins(3 minutes per move)
2)Houdini3 800 wins(3 minutes per move)
3)Houdini3 500 wins(3 seconds per move)
3)Stockfish 2.3.1 450 wins(3 seconds per move)

I somehow tested the top engines without tablebases on tablebase (3-4-5) wins against a Houdini 3 enabled with 3-4-5 tablebases.
1 second/move

Code: Select all

Komodo 5         935.5/1000
Stockfish 2.3.1  917.0/1000
Critter 1.6      886.0/1000
Houdini 3        882.0/1000
Rybka 4.1        851.0/1000

Seems that Komodo and Stockfish are the best finding tablebase wins, Rybka is the worst.

Laskos · Post by **Laskos** » Tue Feb 05, 2013 2:02 pm

Laskos wrote:
Uri Blass wrote:
jdart wrote:This is not very surprising. Search in Stockfish is very selective and it reaches extreme depths in the endgame very quickly. Extra depth will probably help it there, compared to other engines, and probably helps more than the loss in precision from the selectivity hurts. (I don't much about the eval function of Stockfish, certainly not compared to other programs, so I can't say how much of a factor that is. I know it has recognizers for some common endgames, but that is pretty common nowadays).

--Jon
It is not surprising that stockfish is relatively better in the endgame but
I do not think that it is because of the fact that it reachs bigger depths because reaching bigger depths by stockfish happens also in the opening.

It seems that for some reason stockfish has a superior search in the endgame.

It may be interesting to have a test suite of random mates in 40 from tablebases to see if stockfish perform better than other programs against tablebases(meaning winning against tablebases more often than other programs).

My guess is that stockfish is going to perform better but it may be interesting to find out if I am right.

I would like to see if the advantage of stockfish in tablebase positions becomes bigger at longer time control.

My guess may be wrong but my guess is that we are going to have something like this after testing 1000 positions

1)Stockfish 2.3.1 850 wins(3 minutes per move)
2)Houdini3 800 wins(3 minutes per move)
3)Houdini3 500 wins(3 seconds per move)
3)Stockfish 2.3.1 450 wins(3 seconds per move)
I somehow tested the top engines without tablebases on tablebase (3-4-5) wins against a Houdini 3 enabled with 3-4-5 tablebases.
1 second/move
Code: Select all
Komodo 5         935.5/1000
Stockfish 2.3.1  917.0/1000
Critter 1.6      886.0/1000
Houdini 3        882.0/1000
Rybka 4.1        851.0/1000
Seems that Komodo and Stockfish are the best finding tablebase wins, Rybka is the worst.

At 4s/move Stockfish leads, and improves significantly from 1s/move.

Code: Select all

Stockfish 2.3.1  941.5/1000  +24.5 
Komodo 5         938.0/1000   +2.5
Critter 1.6      902.0/1000  +16.0
Houdini 3        889.5/1000   +7.5
Rybka 4.1        853.0/1000   +2.0

It hints to that Stockfish relies on search in the endgame and Komodo on knowledge. Keeping in mind that error margins are still pretty large.

Jouni · Post by **Jouni** » Tue Feb 05, 2013 4:27 pm

Interesting. This seems to confirm my assumption, that Rybka is engine which needs most tablebases! Funnily even if SF is good without them it draw against Critter with +5,61 evaluation in TCEC

gladius · Post by **gladius** » Tue Feb 05, 2013 6:02 pm

Jouni wrote:Interesting. This seems to confirm my assumption, that Rybka is engine which needs most tablebases! Funnily even if SF is good without them it draw against Critter with +5,61 evaluation in TCEC

Yes, that was due to lacking a specific piece of KBPKP knowledge. Next release will have that solved though

.

Performances of engines in the Endgame

Re: Performances of engines in the Endgame

Re: Performances of engines in the Endgame

Re: Performances of engines in the endgame.

Re: Performances of engines in the endgame.

Re: Performances of engines in the endgame.

Re: Performances of engines in the endgame.

Re: Performances of engines in the Endgame

Re: Performances of engines in the Endgame

Re: Performances of engines in the Endgame

Re: Performances of engines in the Endgame