Stockfish 2.1 running for the IPON

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

Re: Stockfish 2.1 running for the IPON

Post by IWB »

Hi Marco
mcostalba wrote: I think is the final bayes elo passage that takes the score down, ...
Whatever it is, it is not Bayeselo.

If you download the individual.7z you find a file named "rating.dat". This is the exact result out of Elostat for the full IPON Database. While Bayes has 8 ELo difference, Elostat has 9.

First of all we are dicsussing a differnce of 15 Elo, which is a bit ... . Then we have to decide if the IPON is low or CEGT high, than we have to see if it is 2.1 which is to high/low or 2.01 which is too high/low and then it is impossible to decide if IPON or CEGT are "right" or "wrong" ...

I cant check to be sure right now, but IIRC the 2.01 was a bit higer in the IPON than SWCR and CEGT 40/20 (or to low in ... - but within error bar all the time).

At the end I wait for the CEGT 40/20 to compare as this fits much better to IPON than the CEGT BLitz. :-)

I dont see anything to worry about at the moment.

Bye
Ingo

PS: Today I have to operate Komodo playing vs Stockfish in Thuringia ... crossing fingers please :-)
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: Stockfish 2.1 running for the IPON

Post by mcostalba »

IWB wrote: I dont see anything to worry about at the moment.
No, I am not worry at all, I didn't intend to raise all this discussions, I really think yours and Franks results are much better than what we did internally.

Actually we are now trying to change our testing procedures just because they are not aligned with yours and I think yours are better.

Regarding Bayes elo it was just an observation, mostly becuase I don't understand very well how it works, but I think it is the best way to measure effective engine strength.

Good luck with Komodo :-)
IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

Re: Stockfish 2.1 running for the IPON

Post by IWB »

mcostalba wrote:
Good luck with Komodo :-)
I sense some irony here ;-) Anyhow, thanks, I need it!

Bye
Ingo
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: Stockfish 2.1 running for the IPON

Post by mcostalba »

IWB wrote:
mcostalba wrote:
Good luck with Komodo :-)
I sense some irony here ;-) Anyhow, thanks, I need it!

Bye
Ingo
No !!!!! Believe me not at all !!

BTW as you know very well by statistic the chance for Komodo to beat SF are absolutely not small. If you add a different hardware and a different opening book the chance are even higher !
IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

Re: Stockfish 2.1 running for the IPON

Post by IWB »

mcostalba wrote:
IWB wrote:
mcostalba wrote:
Good luck with Komodo :-)
I sense some irony here ;-) Anyhow, thanks, I need it!

Bye
Ingo
No !!!!! Believe me not at all !!

BTW as you know very well by statistic the chance for Komodo to beat SF are absolutely not small. If you add a different hardware and a different opening book the chance are even higher !
Of course we dont know for sure but I estimate the software/hardware advantage of Stockfish with 100 Elo. The book is different, it is Erdo ... while I am happy to have a book :-)

Now the game starts ...

Bye
Ingo
IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

Re: Stockfish 2.1 running for the IPON

Post by IWB »

IWB wrote:
Now the game starts ...
... and ended in a draw:

[Event "20. Thüringer Computerschachturnier"]
[Site "Triebes"]
[Date "2011.05.09"]
[Round "2"]
[White "Komodo"]
[Black "Stockfish 2.1.1"]
[WhiteElo ""]
[BlackElo "2928"]
[Result "1/2-1/2"]

1. d4 {book 0s} d5 {10s (Nf6)} 2. c4 {book 0s} e6 {7s (c6)}
3. Nc3 {book 0s} Nf6 {6s (c6)} 4. Nf3 {book 0s} c6 {8s
(Nc6)} 5. e3 {book 0s} Nbd7 {6s} 6. Qc2 {book 0s} Bd6 {7s}
7. Be2 {book 0s} O-O {7s} 8. O-O {book 0s} dxc4 {7s}
9. Bxc4 {book 0s} b5 {9s} 10. Be2 {book 0s} Bb7 {6s}
11. Rd1 {+0.10/21 2:13m} Qc7 {8s} 12. h3 {+0.16/21 2:16m}
b4 {19s (Rfe8)} 13. Na4 {+0.04/22 2:16m} c5 {15s} 14. dxc5
{+0.05/23 1:54m} Rac8 {7s} 15. Qd2 {+0.24/22 2:18m} Bxc5
{29s} 16. Nxc5 {+0.18/22 47s} Qxc5 {16s} 17. Qe1 {+0.27/24
3:37m} Qc2 {16s (Bxf3)} 18. Bd3 {+0.47/23 2:27m} Qc7 {9s
(Qc5)} 19. Qxb4 {+0.29/22 1:40m} Bxf3 {4:23m} 20. gxf3
{+0.29/27 1:42m} Ne5 {7:10m} 21. Be2 {+0.08/25 3:03m} Qc2
{5:37m} 22. Bd2 {+0.13/26 5s} Qg6+ {8:15m (a5)} 23. Kh1
{+0.30/23 3:50m} Qh5 {6:21m} 24. Qf4 {+0.32/26 2:40m} Rc2
{7:06m} 25. Qg3 {+0.33/27 3:20m} Ne4 {6:34m (Rxb2)}
26. fxe4 {+0.40/23 26s} Qxe2 {15s} 27. Qxe5 {+0.31/25
2:15m} Rxd2 {2:44m} 28. Rxd2 {+0.15/27 5s} Qxd2 {9s}
29. Kg2 {+0.29/27 4:19m} Rc8 {5:22m} 30. Qd4 {+0.17/27
1:58m} Qxd4 {2:40m (Qc2)} 31. exd4 {+0.54/23 32s} Kf8
{13:57m (Rc2)} 32. Kf3 {+0.33/24 1:54m} Rc2 {7:42m} 33. Rb1
{+0.32/28 41s} Ke7 {8:47m} 34. Ke3 {+0.37/28 32s} g5 {8:16m
(Kd6)} 35. a4 {+0.37/25 2:02m} h6 {10:25m} 36. e5 {+0.36/29
0s} Kd7 {5:25m} 37. h4 {+0.17/30 22s} gxh4 {4:49m} 38. Rh1
{+0.13/31 1:00m} Rxb2 {4:26m} 39. Rxh4 {+0.12/31 1s} Ra2
{3:45m} 40. Rxh6 {+0.05/30 4:12m} Rxa4 {9s} 41. Rh8
{+0.05/29 2:05m} Ra1 {1:30m (a5)} 42. Ke4 {+0.05/28 2:04m}
Re1+ {1:10m (a5)} 43. Kd3 {+0.05/29 1:08m} Rd1+ {32s}
44. Kc3 {+0.05/29 14s} Rc1+ {1:48m (a5)} 45. Kb3 {0.00/30
1:38m} Rf1 {51s} 46. Ra8 {0.00/32 41s} Kc6 {8s} 47. Rxa7
{0.00/32 1:17m} Rxf2 {8s (Kd5)} 48. Kc4 {+0.05/32 1:08m}
Rf3 {9s (Rf1)} 1/2-1/2

:-)

Bye
Ingo
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Stockfish 2.1 running for the IPON

Post by Laskos »

mcostalba wrote:
Regarding Bayes elo it was just an observation, mostly becuase I don't understand very well how it works, but I think it is the best way to measure effective engine strength.
Marco, I am not sure that Bayeselo is the best tool. Here is an example

Elostat:

Code: Select all

    Program                            Score     %    Av.Op.  Elo    +   -    Draws

  1 Rybka 4w32                     : 408.5/720  56.7   3176   3224   21  21   31.2 %
  2 Fire 1.31 w32                  : 311.5/720  43.3   3224   3176   21  21   31.2 %

Bayeselo

Code: Select all

Rank Name            Elo    +    - games score oppo. draws
   1 Rybka 4w32       23   11   11   720   57%   -23   31%
   2 Fire 1.31 w32   -23   11   11   720   43%    23   31%
What is wrong here with Bayeselo are 95% confidence intervals. In 2 engines match I can calculate them by hand, or use monte carlo for >2 engines. The correct margins (+/- 21) are given by Elostat. Bayeselo gives weird +/- 11 for 2 standard deviations, which even by calculation in my mind are wrong. I saw also rating irregularities with Bayeselo.

For Ingo: are you using a general "offset" command for all engines, or particularly for each engine?

Anyway, for now I will stick with Elostat.

Kai
ernest
Posts: 2053
Joined: Wed Mar 08, 2006 8:30 pm

Re: Stockfish 2.1 running for the IPON

Post by ernest »

Laskos wrote:Bayeselo gives weird +/- 11 for 2 standard deviations,
This is really weird! (if true...)
I wonder how Rémy Coulom explains that...