glaurang epsilon 5 in strategic test..too legit to quit

ozziejoe · Post by **ozziejoe** » Mon Jul 02, 2007 1:16 pm

Tord has described the new glaurung as having weak evaluation. Well, it looks like it might have already passed the old glaurung. This engine gets my vote for most likely to improve in the next year. (comparison of how the new glaurung epsilon did versus the old glaurung is below the rating table)

40 moves, 4 minutes repeating; strategic opening repertoire (white repertoire built largely around 1nf3...2 c4...3. g3; black repertoire build on nimzo indian, bogo indian, and caro kann). core-duo centrino 2.13

1 Rybkav2.3.2.mp.w32 : 3041 28 27 496 77.8 % 2823 30.6 %
2 Rybkav2.2n2.mp.w32 : 2975 28 28 434 69.2 % 2834 31.1 %
3 TogaII (1.3x4) : 2849 24 24 496 50.3 % 2847 36.1 %
4 EngineShredder10UCI : 2833 25 26 496 47.7 % 2849 30.8 %
5 Strelka : 2821 26 26 496 45.9 % 2850 30.8 %
6 Glaurung_win_32 (e5) : 2813 30 30 372 45.6 % 2844 30.4 %
7 Glaurung-SMP-32bit : 2778 28 28 434 38.1 % 2862 30.2 %
8 Spike1.2 : 2773 26 26 496 38.2 % 2856 32.1 %
9 Naum : 2765 25 25 496 37.0 % 2857 36.9 %

6 Glaurung_win_32 (E5) : 2813 372 (+113,=113,-146), 45.6 %

EngineShredder10UCI : 62 (+ 18,= 17,- 27), 42.7 %
Naum : 62 (+ 27,= 23,- 12), 62.1 %
TogaII : 62 (+ 12,= 24,- 26), 38.7 %
Spike1.2 : 62 (+ 27,= 19,- 16), 58.9 %
Strelka : 62 (+ 23,= 18,- 21), 51.6 %
Rybkav2.3.2.mp.w32 : 62 (+ 6,= 12,- 44), 19.4 %

7 Glaurung-SMP-32bit : 2778 434 (+100,=131,-203), 38.1 %

EngineShredder10UCI : 62 (+ 13,= 18,- 31), 35.5 %
Naum : 62 (+ 19,= 23,- 20), 49.2 %
TogaII : 62 (+ 12,= 27,- 23), 41.1 %
Spike1.2 : 62 (+ 22,= 21,- 19), 52.4 %
Strelka : 62 (+ 23,= 16,- 23), 50.0 %
Rybkav2.2n2.mp.w32 : 62 (+ 7,= 12,- 43), 21.0 %
Rybkav2.3.2.mp.w32 : 62 (+ 4,= 14,- 44), 17.7 %

Tony Thomas · Post by **Tony Thomas** » Mon Jul 02, 2007 7:54 pm

Is it possible that the use of old SMP version(instead of single 1.2.1) skewed the results a little. Or did the new version passed the old version running on 2 CPU's?

Tord Romstad · Post by **Tord Romstad** » Mon Jul 02, 2007 9:35 pm

ozziejoe wrote:Tord has described the new glaurung as having weak evaluation.

Well, the evaluation is reasonably bug-free, which is a very significant strength, but it is also simple and poorly tuned, which is evidently a weakness.

Well, it looks like it might have already passed the old glaurung.

Quite possibly, but this is certainly not because of a very good evaluation.

This engine gets my vote for most likely to improve in the next year.

Hard to say. Glaurung 2 certainly has potential to become much stronger than Glaurung 1.2.1, but I am not sure how much time and motivation I will have to work on it.

At any rate, thanks for posting these results, and foryour interest and your confidence in Glaurung's abilities.

Tord

ozziejoe · Post by **ozziejoe** » Mon Jul 02, 2007 11:27 pm

Both were running on two CPU's. The results are not quite significant, but you can see that epsilon did better then the old glarung in every head to head match except against toga...

Hope you have time to continue to develop glarung. You are not far from passing toga

Uri Blass · Post by **Uri Blass** » Tue Jul 03, 2007 9:08 am

Tord Romstad wrote:
ozziejoe wrote:Tord has described the new glaurung as having weak evaluation.
Well, the evaluation is reasonably bug-free, which is a very significant strength, but it is also simple and poorly tuned, which is evidently a weakness.

If you can say that you have not counter productive code in the evaluation
of glaurung then I guess that it is better than what most programmers can say.

I can say that part of the improvement in the latest version of movei relative to the public version is probably because of removing some evaluation code and it is not because of searching more nodes per second(I tested at fixed number of nodes to decide that the knowledge seems to be counter productive).

I am talking about improvement from 421 to 423(estimated as 23 elo improvement based on CCRL blitz results)
423 has a symmetric evaluation when 421 has not symmetric evaluation because of bugs but my first try to make 421 symmetric seemed counter productive based on my tests and only when I removed knowledge I got positive results in my tests.

I may have another counter productive knowledge and recently I got
result of 101-99 for the version with less knowledge at fixed number of nodes between 2 versions that I tested.

Uri

pichy · Post by **pichy** » Tue Jul 03, 2007 9:23 am

Uri Blass wrote:
Tord Romstad wrote:
ozziejoe wrote:Tord has described the new glaurung as having weak evaluation.
Well, the evaluation is reasonably bug-free, which is a very significant strength, but it is also simple and poorly tuned, which is evidently a weakness.

If you can say that you have not counter productive code in the evaluation
of glaurung then I guess that it is better than what most programmers can say.

I can say that part of the improvement in the latest version of movei relative to the public version is probably because of removing some evaluation code and it is not because of searching more nodes per second(I tested at fixed number of nodes to decide that the knowledge seems to be counter productive).

I am talking about improvement from 421 to 423(estimated as 23 elo improvement based on CCRL blitz results)
423 has a symmetric evaluation when 421 has not symmetric evaluation because of bugs but my first try to make 421 symmetric seemed counter productive based on my tests and only when I removed knowledge I got positive results in my tests.

I may have another counter productive knowledge and recently I got
result of 101-99 for the version with less knowledge at fixed number of nodes between 2 versions that I tested.

Uri

Uri, Here is an interesting results movei423 with progress 10 10 10 versus movei425 with progress 40 40 40 .

PS: Also movei425 progress 40 40 40 performed better against Wildcat7.0, Fruit 2.0, and Scorpio 1.9

http://64.68.157.89/forum/viewtopic.php ... highlight=

Engine Score Mo Mo S-B
1: Movei00_8_425 29.0/50 ·················································· 110=111011====1101111==101100=10=011=0==0010100011 609.00
2: Movei00_8_423 21.0/50 001=000100====0010000==010011=01=100=1==1101011100 ·················································· 609.00

50 games played / Tournament finished
Name of the tournament: Arena tournament
Site/ Country: Jorge, United States
Level: Blitz 2/1
Hardware: AMD Athlon(tm) Processor 1202 MHz with 512MB Memory
Operating system: Microsoft Windows XP Professional Service Pack 2 (Build 2600)
PGN-File: C:\Program Files\Arena\Arena.pgn
Website:
E-Mail Address:

Uri Blass · Post by **Uri Blass** » Tue Jul 03, 2007 10:07 am

thanks for your results.

I may try it later in my tests and it may be interesting to know if 40 40 40 is good also at longer time control or only at blitz.

Note that 0 0 0 is a basis for using hash for pruning because other values mean that movei use path dependent evaluation.

I am still more interested in testing evaluation changes so I did not start to work on using hash for pruning.

Uri

glaurang epsilon 5 in strategic test..too legit to quit

glaurang epsilon 5 in strategic test..too legit to quit

Re: glaurang epsilon 5 in strategic test..too legit to quit

Re: glaurang epsilon 5 in strategic test..too legit to quit

Re: glaurang epsilon 5 in strategic test..too legit to quit

Re: glaurang epsilon 5 in strategic test..too legit to quit

Re: glaurang epsilon 5 in strategic test..too legit to quit

Re: glaurang epsilon 5 in strategic test..too legit to quit