Stockfish test on 2 cores with PONDER ON is running

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

Stockfish test on 2 cores with PONDER ON is running

Post by IWB »

... at the usual place:

http://www.inwoba.de

In case another interesting enigne is released I might postphone this test.

Bye
Ingo
Jouni
Posts: 3656
Joined: Wed Mar 08, 2006 8:15 pm
Full name: Jouni Uski

Re: Stockfish test on 2 cores with PONDER ON is running

Post by Jouni »

Very interesting, because in my test Stockfish scales better with 2 CPU than R3!

Jouni
Uri Blass
Posts: 10892
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Stockfish test on 2 cores with PONDER ON is running

Post by Uri Blass »

Interesting
The only program that has more than 50% against stockfish 2T at this point of time is
Rybka3 1T and not Rybka3 2T

STOCK171_2T_1

Stockfish 1.7.1 JA 2T - Rybka 3 mp 2T (2942) 10.5 - 10.5 50.00% Perf=2942
Stockfish 1.7.1 JA 2T - Rybka 3 mp (2898) 9.5 - 11.5 45.24% Perf=2865
Stockfish 1.7.1 JA 2T - Naum 4.2 2T (2881) 13.5 - 7.5 64.29% Perf=2983
Stockfish 1.7.1 JA 2T - Deep Shredder 12 2T (2833) 15.0 - 6.0 71.43% Perf=2992
Stockfish 1.7.1 JA 2T - Naum 4.2 (2819) 15.5 - 5.5 73.81% Perf=2998
Stockfish 1.7.1 JA 2T - Deep Shredder 12 (2800) 13.5 - 7.5 64.29% Perf=2902
Stockfish 1.7.1 JA 2T - Komodo64 1.0 JA (2780) 16.5 - 4.5 78.57% Perf=3005
Stockfish 1.7.1 JA 2T - Zappa Mexico II 2T (2773) 14.5 - 6.5 69.05% Perf=2912
Stockfish 1.7.1 JA 2T - Zappa Mexico II (2710) 17.0 - 4.0 80.95% Perf=2961
Stockfish 1.7.1 JA 2T - Protector 1.3.2 JA (2699) 16.0 - 5.0 76.19% Perf=2901
Stockfish 1.7.1 JA 2T - Onno-1-1-1 (2684) 17.5 - 2.5 87.50% Perf=3022
Stockfish 1.7.1 JA 2T - Spark-0.3 VC(a) (2673) 18.0 - 2.0 90.00% Perf=3054
Stockfish 1.7.1 JA 2T - Deep Sjeng WC2008 (2673) 17.0 - 4.0 80.95% Perf=2924
194.0 - 77.0 71.59% Perf=2942
ernest
Posts: 2053
Joined: Wed Mar 08, 2006 8:30 pm

Re: Stockfish test on 2 cores with PONDER ON is running

Post by ernest »

Uri Blass wrote:The only program that has more than 50% against stockfish 2T
You are getting carried away, Uri!!!
How can you declare anything after matches with 21 games???... :shock:
Jouni
Posts: 3656
Joined: Wed Mar 08, 2006 8:15 pm
Full name: Jouni Uski

Re: Stockfish test on 2 cores with PONDER ON is running

Post by Jouni »

Not quite at Rybka level yet, but close:

449.5 - 189.5 70.34% Perf=2932 (R3 2942)

But if engine wins all matches it is the strongest even without rating calculations!

BTW Ingo how is your testing done? Do You start all matches manually?
And how are live scores calculated?

Jouni
Uri Blass
Posts: 10892
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Stockfish test on 2 cores with PONDER ON is running

Post by Uri Blass »

ernest wrote:
Uri Blass wrote:The only program that has more than 50% against stockfish 2T
You are getting carried away, Uri!!!
How can you declare anything after matches with 21 games???... :shock:
I can clearly show the facts

The facts are still the same after 50 games or 51 games against both programs

Stockfish 1.7.1 JA 2T - Rybka 3 mp 2T (2942) 26.0 - 24.0 52.00% Perf=2955
Stockfish 1.7.1 JA 2T - Rybka 3 mp (2898) 24.5 - 26.5 48.04% Perf=2885
IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

Re: Stockfish test on 2 cores with PONDER ON is running

Post by IWB »

Hello
Jouni wrote:Not quite at Rybka level yet, but close:

449.5 - 189.5 70.34% Perf=2932 (R3 2942)

But if engine wins all matches it is the strongest even without rating calculations!
Stockfish 1.7.1 won all SINGLE matches, even vs. Rybka 3 but WITH a proper calculation it is behind R3 ...
Jouni wrote: BTW Ingo how is your testing done? Do You start all matches manually?
And how are live scores calculated?
Jouni
Uff, long story short ... : I make ONE Tourney and let 4 (at the moment) Quads crunch on it at the same time (for the single test I can even start 2 GUIs simultaniously on each Quad). The Tourney is stored on an identical mapped drive for all the computers (even most engines are installed just from one comp on that mapped drive! All the other Comps/GUIs do not need an additional installation - if there is no copy protection). The trick is the Shredder Classic 4 GUI it is supporting such things including Elo calculations which are very basic and much less sophisticated like Bayeselo. It might be at the end that a close Elo winner will become a close Elo looser with a proper Bayes calculation.
The problem are the CB engines. This I have to play in the CB GUI which is really painful as you can only start ONE GUI (2 if you use different users) and you can only play on one computer/user - you are not able to use multiple computers for ONE tourney.
If I would have to play all that manual like in the CB-GUI I would stop testing in such extend - it would be much to much work! Right now you see by yourself that I start a tourney and leave it alone for some day (just checking from time to time if an engine crashed)

... and that was the short story!

Bye
Ingo
IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

Re: Stockfish test on 2 cores with PONDER ON is running

Post by IWB »

Hello
Uri Blass wrote:
I can clearly show the facts

The facts are still the same after 50 games or 51 games against both programs

Stockfish 1.7.1 JA 2T - Rybka 3 mp 2T (2942) 26.0 - 24.0 52.00% Perf=2955
Stockfish 1.7.1 JA 2T - Rybka 3 mp (2898) 24.5 - 26.5 48.04% Perf=2885
Yes, but keep in mind that there is another fact that R3 and R2 2T are just 44 Elo away with my conditions. Especially if Stockfish 1.7.1 2T is in between those two there is a certain likelihood that it will win a short match of just 100 games vs the presumed stronger one and lose vs the lower rated version. No problem here for me ... and it is still not over!

Bye
Ingo
Uri Blass
Posts: 10892
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Stockfish test on 2 cores with PONDER ON is running

Post by Uri Blass »

Jouni wrote:Not quite at Rybka level yet, but close:

449.5 - 189.5 70.34% Perf=2932 (R3 2942)

But if engine wins all matches it is the strongest even without rating calculations!

BTW Ingo how is your testing done? Do You start all matches manually?
And how are live scores calculated?

Jouni
It is not clear after 1000 games(and the gap is only 2 elo)

713.5 - 286.5 71.35% Perf=2940

Stockfish 1.7.1 JA 2T - Rybka 3 mp 2T (2942) 41.5 - 35.5 53.90% Perf=2969
Stockfish 1.7.1 JA 2T - Rybka 3 mp (2898) 40.5 - 36.5 52.60% Perf=2916
IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

Re: Stockfish test on 2 cores with PONDER ON is running

Post by IWB »

Hi
Uri Blass wrote: It is not clear after 1000 games(and the gap is only 2 elo)
These two Engines are to close to become 'clear' in ranking.

Dont look to a single Elo point. If might very well be that Bayeselo later on is throwing this 'guess' into another order.


Bye
Ingo