Tord has described the new glaurung as having weak evaluation. Well, it looks like it might have already passed the old glaurung. This engine gets my vote for most likely to improve in the next year. (comparison of how the new glaurung epsilon did versus the old glaurung is below the rating table)
40 moves, 4 minutes repeating; strategic opening repertoire (white repertoire built largely around 1nf3...2 c4...3. g3; black repertoire build on nimzo indian, bogo indian, and caro kann). core-duo centrino 2.13
1 Rybkav2.3.2.mp.w32 : 3041 28 27 496 77.8 % 2823 30.6 %
2 Rybkav2.2n2.mp.w32 : 2975 28 28 434 69.2 % 2834 31.1 %
3 TogaII (1.3x4) : 2849 24 24 496 50.3 % 2847 36.1 %
4 EngineShredder10UCI : 2833 25 26 496 47.7 % 2849 30.8 %
5 Strelka : 2821 26 26 496 45.9 % 2850 30.8 %
6 Glaurung_win_32 (e5) : 2813 30 30 372 45.6 % 2844 30.4 %
7 Glaurung-SMP-32bit : 2778 28 28 434 38.1 % 2862 30.2 %
8 Spike1.2 : 2773 26 26 496 38.2 % 2856 32.1 %
9 Naum : 2765 25 25 496 37.0 % 2857 36.9 %
6 Glaurung_win_32 (E5) : 2813 372 (+113,=113,-146), 45.6 %
EngineShredder10UCI : 62 (+ 18,= 17,- 27), 42.7 %
Naum : 62 (+ 27,= 23,- 12), 62.1 %
TogaII : 62 (+ 12,= 24,- 26), 38.7 %
Spike1.2 : 62 (+ 27,= 19,- 16), 58.9 %
Strelka : 62 (+ 23,= 18,- 21), 51.6 %
Rybkav2.3.2.mp.w32 : 62 (+ 6,= 12,- 44), 19.4 %
7 Glaurung-SMP-32bit : 2778 434 (+100,=131,-203), 38.1 %
EngineShredder10UCI : 62 (+ 13,= 18,- 31), 35.5 %
Naum : 62 (+ 19,= 23,- 20), 49.2 %
TogaII : 62 (+ 12,= 27,- 23), 41.1 %
Spike1.2 : 62 (+ 22,= 21,- 19), 52.4 %
Strelka : 62 (+ 23,= 16,- 23), 50.0 %
Rybkav2.2n2.mp.w32 : 62 (+ 7,= 12,- 43), 21.0 %
Rybkav2.3.2.mp.w32 : 62 (+ 4,= 14,- 44), 17.7 %
glaurang epsilon 5 in strategic test..too legit to quit
Moderator: Ras
-
- Posts: 811
- Joined: Wed Mar 08, 2006 10:07 pm
Re: glaurang epsilon 5 in strategic test..too legit to quit
Is it possible that the use of old SMP version(instead of single 1.2.1) skewed the results a little. Or did the new version passed the old version running on 2 CPU's?
-
- Posts: 1808
- Joined: Wed Mar 08, 2006 9:19 pm
- Location: Oslo, Norway
Re: glaurang epsilon 5 in strategic test..too legit to quit
Well, the evaluation is reasonably bug-free, which is a very significant strength, but it is also simple and poorly tuned, which is evidently a weakness.ozziejoe wrote:Tord has described the new glaurung as having weak evaluation.
Quite possibly, but this is certainly not because of a very good evaluation.Well, it looks like it might have already passed the old glaurung.
Hard to say. Glaurung 2 certainly has potential to become much stronger than Glaurung 1.2.1, but I am not sure how much time and motivation I will have to work on it.This engine gets my vote for most likely to improve in the next year.
At any rate, thanks for posting these results, and foryour interest and your confidence in Glaurung's abilities.

Tord
-
- Posts: 811
- Joined: Wed Mar 08, 2006 10:07 pm
Re: glaurang epsilon 5 in strategic test..too legit to quit
Both were running on two CPU's. The results are not quite significant, but you can see that epsilon did better then the old glarung in every head to head match except against toga...
Hope you have time to continue to develop glarung. You are not far from passing toga
Hope you have time to continue to develop glarung. You are not far from passing toga
-
- Posts: 10815
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: glaurang epsilon 5 in strategic test..too legit to quit
If you can say that you have not counter productive code in the evaluationTord Romstad wrote:Well, the evaluation is reasonably bug-free, which is a very significant strength, but it is also simple and poorly tuned, which is evidently a weakness.ozziejoe wrote:Tord has described the new glaurung as having weak evaluation.
of glaurung then I guess that it is better than what most programmers can say.
I can say that part of the improvement in the latest version of movei relative to the public version is probably because of removing some evaluation code and it is not because of searching more nodes per second(I tested at fixed number of nodes to decide that the knowledge seems to be counter productive).
I am talking about improvement from 421 to 423(estimated as 23 elo improvement based on CCRL blitz results)
423 has a symmetric evaluation when 421 has not symmetric evaluation because of bugs but my first try to make 421 symmetric seemed counter productive based on my tests and only when I removed knowledge I got positive results in my tests.
I may have another counter productive knowledge and recently I got
result of 101-99 for the version with less knowledge at fixed number of nodes between 2 versions that I tested.
Uri
-
- Posts: 2564
- Joined: Thu Mar 09, 2006 3:04 am
Re: glaurang epsilon 5 in strategic test..too legit to quit
Uri, Here is an interesting results movei423 with progress 10 10 10 versus movei425 with progress 40 40 40 .Uri Blass wrote:If you can say that you have not counter productive code in the evaluationTord Romstad wrote:Well, the evaluation is reasonably bug-free, which is a very significant strength, but it is also simple and poorly tuned, which is evidently a weakness.ozziejoe wrote:Tord has described the new glaurung as having weak evaluation.
of glaurung then I guess that it is better than what most programmers can say.
I can say that part of the improvement in the latest version of movei relative to the public version is probably because of removing some evaluation code and it is not because of searching more nodes per second(I tested at fixed number of nodes to decide that the knowledge seems to be counter productive).
I am talking about improvement from 421 to 423(estimated as 23 elo improvement based on CCRL blitz results)
423 has a symmetric evaluation when 421 has not symmetric evaluation because of bugs but my first try to make 421 symmetric seemed counter productive based on my tests and only when I removed knowledge I got positive results in my tests.
I may have another counter productive knowledge and recently I got
result of 101-99 for the version with less knowledge at fixed number of nodes between 2 versions that I tested.
Uri
PS: Also movei425 progress 40 40 40 performed better against Wildcat7.0, Fruit 2.0, and Scorpio 1.9

http://64.68.157.89/forum/viewtopic.php ... highlight=
Engine Score Mo Mo S-B
1: Movei00_8_425 29.0/50 ·················································· 110=111011====1101111==101100=10=011=0==0010100011 609.00
2: Movei00_8_423 21.0/50 001=000100====0010000==010011=01=100=1==1101011100 ·················································· 609.00
50 games played / Tournament finished
Name of the tournament: Arena tournament
Site/ Country: Jorge, United States
Level: Blitz 2/1
Hardware: AMD Athlon(tm) Processor 1202 MHz with 512MB Memory
Operating system: Microsoft Windows XP Professional Service Pack 2 (Build 2600)
PGN-File: C:\Program Files\Arena\Arena.pgn
Website:
E-Mail Address:
-
- Posts: 10815
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: glaurang epsilon 5 in strategic test..too legit to quit
thanks for your results.
I may try it later in my tests and it may be interesting to know if 40 40 40 is good also at longer time control or only at blitz.
Note that 0 0 0 is a basis for using hash for pruning because other values mean that movei use path dependent evaluation.
I am still more interested in testing evaluation changes so I did not start to work on using hash for pruning.
Uri
I may try it later in my tests and it may be interesting to know if 40 40 40 is good also at longer time control or only at blitz.
Note that 0 0 0 is a basis for using hash for pruning because other values mean that movei use path dependent evaluation.
I am still more interested in testing evaluation changes so I did not start to work on using hash for pruning.
Uri