Page 1 of 4
Komodo run - Ingo list revisited
Posted: Fri Nov 08, 2013 1:27 pm
by Don
Ingo was kind enough to run a development version of Komodo against his standard list.
To be sure, his test is not very favorable to Komodo which excells at longer time controls and this is a blitz time control list.
Komodo beat every single program on the list including Houdini, but falls just short of Houdini due to the fact that Houdini does slightly better against weak program at this time control.
Here is his results:
Code: Select all
Final Result one on one of 113300
1 Houdini 3 STD 3000 10 10 3000 78% 2771 27%
2 K113300 2994 10 10 3000 78% 2771 30%
3 Stockfish 4 2948 10 10 3000 73% 2774 37%
4 Critter 1.4a 2909 9 9 3000 68% 2776 40%
5 Gull 2.2 2908 9 9 3000 68% 2776 40%
6 Deep Rybka 4.1 2882 9 9 3000 64% 2777 42%
7 Hannibal 1.4a 2799 9 9 3000 52% 2781 43%
8 Chiron 1.5 2780 9 9 3000 50% 2782 40%
9 Protector 1.5.0 2773 9 9 3000 49% 2782 45%
10 Naum 4.2 2770 9 9 3000 49% 2783 40%
11 HIARCS 14 WCSC 32b 2750 9 9 3000 45% 2784 41%
12 Deep Shredder 12 2734 9 9 3000 43% 2784 38%
13 Jonny 6.00 2732 9 9 3000 43% 2785 38%
14 Deep Sjeng c't 2010 32b 2715 9 9 3000 41% 2785 40%
15 Spike 1.4 32b 2708 9 9 3000 40% 2786 41%
16 spark-1.0 2698 9 9 3000 38% 2786 39%
17 Deep Junior 13.3 2678 9 9 3000 36% 2787 33%
18 Booot 5.2.0 2676 9 9 3000 35% 2787 37%
19 Quazar 0.4 2667 9 9 3000 34% 2788 35%
20 Zappa Mexico II 2656 10 10 3000 33% 2788 35%
21 Toga II 3.0 32b 2646 9 9 3000 31% 2789 36%
Here are the invidual guantlet results against the top few programs, note that Komodo wins almost 55% against Houdini 3 even at this blitze time control. We are now quite curious about how Houdini beats weaker programs more decisively than us.
Code: Select all
150.0 ( 82.0 : 68.0) Houdini 3 STD 3000
150.0 ( 85.0 : 65.0) Stockfish 4 2948
150.0 ( 95.5 : 54.5) Critter 1.4a 2909
150.0 ( 94.0 : 56.0) Gull 2.2 2908
150.0 (107.0 : 43.0) Deep Rybka 4.1 2882
150.0 (117.0 : 33.0) Hannibal 1.4a 2799
150.0 (119.0 : 31.0) Chiron 1.5 2780
150.0 (113.5 : 36.5) Protector 1.5.0 2773
150.0 (120.5 : 29.5) Naum 4.2 2770
150.0 (121.5 : 28.5) HIARCS 14 WCSC 32b 2750
150.0 (119.0 : 31.0) Deep Shredder 12 2734
150.0 (122.5 : 27.5) Jonny 6.00 2732
Re: Komodo run - Ingo list revisitied.
Posted: Fri Nov 08, 2013 1:37 pm
by Werewolf
Can you estimate how many elo you've gained since K6 then?
Re: Komodo run - Ingo list revisitied.
Posted: Fri Nov 08, 2013 1:39 pm
by Don
Werewolf wrote:Can you estimate how many elo you've gained since K6 then?
On Ingo's list we have gained 20 ELO.
Re: Komodo run - Ingo list revisitied.
Posted: Fri Nov 08, 2013 1:49 pm
by Milos
Don wrote:
Here are the invidual guantlet results against the top few programs, note that Komodo wins almost 55% against Houdini 3 even at this blitze time control. We are now quite curious about how Houdini beats weaker programs more decisively than us.
I don't know if you ppl just pretend or like to stress only the facts the go into your benefit.
All the tests so far including short, medium and moderetly long TCs suggest both Komodo 6 and SFdev have 4% over default H3 in direct matches.
However H3 is with contempt 1. Depending on the average rating of the whole field this contempt brings more overall rating than what H3 looses to Komodo and SF.
If you ran the same matches with contempt 0 you'd see that H3 is 2-3% stronger than both latest SFdef and K6.
However, you prefer to pretend that your program is the strongest in direct matches with Houdini and than suggest some flaw or whatever in rating lists methods since H3 still has better rating.
Re: Komodo run - Ingo list revisitied.
Posted: Fri Nov 08, 2013 1:53 pm
by Don
Milos wrote:Don wrote:
Here are the invidual guantlet results against the top few programs, note that Komodo wins almost 55% against Houdini 3 even at this blitze time control. We are now quite curious about how Houdini beats weaker programs more decisively than us.
I don't know if you ppl just pretend or like to stress only the facts the go into your benefit.
All the tests so far including short, medium and moderetly long TCs suggest both Komodo 6 and SFdev have 4% over default H3 in direct matches.
However H3 is with contempt 1. Depending on the average rating of the whole field this contempt brings more overall rating than what H3 looses to Komodo and SF.
If you ran the same matches with contempt 0 you'd see that H3 is 2-3% stronger than both latest SFdef and K6.
However, you prefer to pretend that your program is the strongest in direct matches with Houdini and than suggest some flaw or whatever in rating lists methods since H3 still has better rating.
Thank you.
Re: Komodo run - Ingo list revisited.
Posted: Fri Nov 08, 2013 2:01 pm
by Ajedrecista
Hello:
Don wrote:Ingo was kind enough to run a development version of Komodo against his standard list.
To be sure, his test is not very favorable to Komodo which excells at longer time controls and this is a blitz time control list.
Komodo beat every single program on the list including Houdini, but falls just short of Houdini due to the fact that Houdini does slightly better against weak program at this time control.
Here is his results:
Code: Select all
Final Result one on one of 113300
1 Houdini 3 STD 3000 10 10 3000 78% 2771 27%
2 K113300 2994 10 10 3000 78% 2771 30%
3 Stockfish 4 2948 10 10 3000 73% 2774 37%
4 Critter 1.4a 2909 9 9 3000 68% 2776 40%
5 Gull 2.2 2908 9 9 3000 68% 2776 40%
6 Deep Rybka 4.1 2882 9 9 3000 64% 2777 42%
7 Hannibal 1.4a 2799 9 9 3000 52% 2781 43%
8 Chiron 1.5 2780 9 9 3000 50% 2782 40%
9 Protector 1.5.0 2773 9 9 3000 49% 2782 45%
10 Naum 4.2 2770 9 9 3000 49% 2783 40%
11 HIARCS 14 WCSC 32b 2750 9 9 3000 45% 2784 41%
12 Deep Shredder 12 2734 9 9 3000 43% 2784 38%
13 Jonny 6.00 2732 9 9 3000 43% 2785 38%
14 Deep Sjeng c't 2010 32b 2715 9 9 3000 41% 2785 40%
15 Spike 1.4 32b 2708 9 9 3000 40% 2786 41%
16 spark-1.0 2698 9 9 3000 38% 2786 39%
17 Deep Junior 13.3 2678 9 9 3000 36% 2787 33%
18 Booot 5.2.0 2676 9 9 3000 35% 2787 37%
19 Quazar 0.4 2667 9 9 3000 34% 2788 35%
20 Zappa Mexico II 2656 10 10 3000 33% 2788 35%
21 Toga II 3.0 32b 2646 9 9 3000 31% 2789 36%
Here are the invidual guantlet results against the top few programs, note that Komodo wins almost 55% against Houdini 3 even at this blitze time control. We are now quite curious about how Houdini beats weaker programs more decisively than us.
Code: Select all
150.0 ( 82.0 : 68.0) Houdini 3 STD 3000
150.0 ( 85.0 : 65.0) Stockfish 4 2948
150.0 ( 95.5 : 54.5) Critter 1.4a 2909
150.0 ( 94.0 : 56.0) Gull 2.2 2908
150.0 (107.0 : 43.0) Deep Rybka 4.1 2882
150.0 (117.0 : 33.0) Hannibal 1.4a 2799
150.0 (119.0 : 31.0) Chiron 1.5 2780
150.0 (113.5 : 36.5) Protector 1.5.0 2773
150.0 (120.5 : 29.5) Naum 4.2 2770
150.0 (121.5 : 28.5) HIARCS 14 WCSC 32b 2750
150.0 (119.0 : 31.0) Deep Shredder 12 2734
150.0 (122.5 : 27.5) Jonny 6.00 2732
I have translated these charts to IPON standard offset: 2800 for Shredder 12. I did the sums mentally, so it is possible that I went wrong somewhere:
Code: Select all
Final Result one on one of 113300
1 Houdini 3 STD 3066 10 10 3000 78% 2837 27%
2 K113300 3060 10 10 3000 78% 2837 30%
3 Stockfish 4 3014 10 10 3000 73% 2840 37%
4 Critter 1.4a 2975 9 9 3000 68% 2842 40%
5 Gull 2.2 2974 9 9 3000 68% 2842 40%
6 Deep Rybka 4.1 2948 9 9 3000 64% 2843 42%
7 Hannibal 1.4a 2865 9 9 3000 52% 2847 43%
8 Chiron 1.5 2846 9 9 3000 50% 2848 40%
9 Protector 1.5.0 2839 9 9 3000 49% 2848 45%
10 Naum 4.2 2836 9 9 3000 49% 2849 40%
11 HIARCS 14 WCSC 32b 2816 9 9 3000 45% 2850 41%
12 Deep Shredder 12 2800 9 9 3000 43% 2850 38%
13 Jonny 6.00 2798 9 9 3000 43% 2851 38%
14 Deep Sjeng c't 2010 32b 2781 9 9 3000 41% 2851 40%
15 Spike 1.4 32b 2774 9 9 3000 40% 2852 41%
16 spark-1.0 2764 9 9 3000 38% 2852 39%
17 Deep Junior 13.3 2744 9 9 3000 36% 2853 33%
18 Booot 5.2.0 2742 9 9 3000 35% 2853 37%
19 Quazar 0.4 2733 9 9 3000 34% 2854 35%
20 Zappa Mexico II 2722 10 10 3000 33% 2854 35%
21 Toga II 3.0 32b 2712 9 9 3000 31% 2855 36%
Code: Select all
150.0 ( 82.0 : 68.0) Houdini 3 STD 3066
150.0 ( 85.0 : 65.0) Stockfish 4 3014
150.0 ( 95.5 : 54.5) Critter 1.4a 2975
150.0 ( 94.0 : 56.0) Gull 2.2 2974
150.0 (107.0 : 43.0) Deep Rybka 4.1 2948
150.0 (117.0 : 33.0) Hannibal 1.4a 2865
150.0 (119.0 : 31.0) Chiron 1.5 2846
150.0 (113.5 : 36.5) Protector 1.5.0 2839
150.0 (120.5 : 29.5) Naum 4.2 2836
150.0 (121.5 : 28.5) HIARCS 14 WCSC 32b 2816
150.0 (119.0 : 31.0) Deep Shredder 12 2800
150.0 (122.5 : 27.5) Jonny 6.00 2798
It looks like Jonny 6.00 and Hannibal 1.4a are now included if I am not wrong. Disregarding this issue, this development version of Komodo has earned around 24 Elo plus/minus uncertainties (around ± 14 Elo taking into account a difference between two normal distributions of 3036 ± 10 and 3060 ± 10, writing from memory) since version 5.1r2 or similar. Am I right?
Anyway, well done Komodo team! I wish Don a speedy recovery.
Regards from Spain.
Ajedrecista.
Re: Komodo run - Ingo list revisitied.
Posted: Fri Nov 08, 2013 2:18 pm
by Don
Milos wrote:Don wrote:
Here are the invidual guantlet results against the top few programs, note that Komodo wins almost 55% against Houdini 3 even at this blitze time control. We are now quite curious about how Houdini beats weaker programs more decisively than us.
I don't know if you ppl just pretend or like to stress only the facts the go into your benefit.
All the tests so far including short, medium and moderetly long TCs suggest both Komodo 6 and SFdev have 4% over default H3 in direct matches.
However H3 is with contempt 1. Depending on the average rating of the whole field this contempt brings more overall rating than what H3 looses to Komodo and SF.
If you ran the same matches with contempt 0 you'd see that H3 is 2-3% stronger than both latest SFdef and K6.
However, you prefer to pretend that your program is the strongest in direct matches with Houdini and than suggest some flaw or whatever in rating lists methods since H3 still has better rating.
I think you explained why Houdini does better against weak programs, it's probably the aggressive contempt factor. Komodo probably respects other programs way too much.
And I agree with you that Komodo is way too strong for Houdini to have contempt for it.
I doubt Ingo would run this test again as it costs him precious electricity which is expensive where he lives, but if I could convince him to do so do you believe setting Houdini to contempt zero will increase it's overall rating on this list?
Don
Re: Komodo run - Ingo list revisitied.
Posted: Fri Nov 08, 2013 2:26 pm
by Milos
Don wrote:Milos wrote:Don wrote:
Here are the invidual guantlet results against the top few programs, note that Komodo wins almost 55% against Houdini 3 even at this blitze time control. We are now quite curious about how Houdini beats weaker programs more decisively than us.
I don't know if you ppl just pretend or like to stress only the facts the go into your benefit.
All the tests so far including short, medium and moderetly long TCs suggest both Komodo 6 and SFdev have 4% over default H3 in direct matches.
However H3 is with contempt 1. Depending on the average rating of the whole field this contempt brings more overall rating than what H3 looses to Komodo and SF.
If you ran the same matches with contempt 0 you'd see that H3 is 2-3% stronger than both latest SFdef and K6.
However, you prefer to pretend that your program is the strongest in direct matches with Houdini and than suggest some flaw or whatever in rating lists methods since H3 still has better rating.
I think you explained why Houdini does better against weak programs, it's probably the aggressive contempt factor. Komodo probably respects other programs way too much.
And I agree with you that Komodo is way too strong for Houdini to have contempt for it.
I doubt Ingo would run this test again as it costs him precious electricity which is expensive where he lives, but if I could convince him to do so do you believe setting Houdini to contempt zero will increase it's overall rating on this list?
Don
I don't believe it would help H3 on Ingo's list, on the contrary. There are too many weak opponents (300Elo weaker) so high contempt there brings more points overall (more wins instead of draws) than what H3 looses against SF and Komodo (there it has rougly 6% of the games as losses that would be draws with contempt 0).
Re: Komodo run - Ingo list revisitied.
Posted: Fri Nov 08, 2013 2:33 pm
by Don
Milos wrote:Don wrote:Milos wrote:Don wrote:
Here are the invidual guantlet results against the top few programs, note that Komodo wins almost 55% against Houdini 3 even at this blitze time control. We are now quite curious about how Houdini beats weaker programs more decisively than us.
I don't know if you ppl just pretend or like to stress only the facts the go into your benefit.
All the tests so far including short, medium and moderetly long TCs suggest both Komodo 6 and SFdev have 4% over default H3 in direct matches.
However H3 is with contempt 1. Depending on the average rating of the whole field this contempt brings more overall rating than what H3 looses to Komodo and SF.
If you ran the same matches with contempt 0 you'd see that H3 is 2-3% stronger than both latest SFdef and K6.
However, you prefer to pretend that your program is the strongest in direct matches with Houdini and than suggest some flaw or whatever in rating lists methods since H3 still has better rating.
I think you explained why Houdini does better against weak programs, it's probably the aggressive contempt factor. Komodo probably respects other programs way too much.
And I agree with you that Komodo is way too strong for Houdini to have contempt for it.
I doubt Ingo would run this test again as it costs him precious electricity which is expensive where he lives, but if I could convince him to do so do you believe setting Houdini to contempt zero will increase it's overall rating on this list?
Don
I don't believe it would help H3 on Ingo's list, on the contrary. There are too many weak opponents (300Elo weaker) so high contempt there brings more points overall (more wins instead of draws) than what H3 looses against SF and Komodo (there it has rougly 6% of the games as losses that would be draws with contempt 0).
So it's probably the case that Komodo would actually top this list if Houdini's contempt was zero. Houdini is optimized to do well on lists.
I'll see if Ingo is willing to run another test with contempt = 0 for Houdini.
Re: Komodo run - Ingo list revisitied.
Posted: Fri Nov 08, 2013 2:40 pm
by Milos
Don wrote:Milos wrote:Don wrote:Milos wrote:Don wrote:
Here are the invidual guantlet results against the top few programs, note that Komodo wins almost 55% against Houdini 3 even at this blitze time control. We are now quite curious about how Houdini beats weaker programs more decisively than us.
I don't know if you ppl just pretend or like to stress only the facts the go into your benefit.
All the tests so far including short, medium and moderetly long TCs suggest both Komodo 6 and SFdev have 4% over default H3 in direct matches.
However H3 is with contempt 1. Depending on the average rating of the whole field this contempt brings more overall rating than what H3 looses to Komodo and SF.
If you ran the same matches with contempt 0 you'd see that H3 is 2-3% stronger than both latest SFdef and K6.
However, you prefer to pretend that your program is the strongest in direct matches with Houdini and than suggest some flaw or whatever in rating lists methods since H3 still has better rating.
I think you explained why Houdini does better against weak programs, it's probably the aggressive contempt factor. Komodo probably respects other programs way too much.
And I agree with you that Komodo is way too strong for Houdini to have contempt for it.
I doubt Ingo would run this test again as it costs him precious electricity which is expensive where he lives, but if I could convince him to do so do you believe setting Houdini to contempt zero will increase it's overall rating on this list?
Don
I don't believe it would help H3 on Ingo's list, on the contrary. There are too many weak opponents (300Elo weaker) so high contempt there brings more points overall (more wins instead of draws) than what H3 looses against SF and Komodo (there it has rougly 6% of the games as losses that would be draws with contempt 0).
So it's probably the case that Komodo would actually top this list if Houdini's contempt was zero. Houdini is optimized to do well on lists.
I'll see if Ingo is willing to run another test with contempt = 0 for Houdini.
I pretty sure RH has the same kind of setup with avarage opponent rating as in Ingo list or CCRL, and after he's satisfied with contempt 0 strength of the engine, he then optimizes contempt to provide highest rating and than normalizes that one to 1
.