BikJump 1.4 : 1781

Discussion of computer chess matches and engine tournaments.

Moderators: hgm, Rebel, chrisw

User avatar
Le Fou numerique
Posts: 859
Joined: Wed Apr 05, 2006 9:58 pm
Location: Elsass

BikJump 1.4 : 1781

Post by Le Fou numerique »

Hi,

ImageBikJump 1.4 first rating in the UEL: 1781.

PGN on the Accession & Réserve page.

Code: Select all

Tournoi d'entrée HS-Book.abk
Le Fou numerique, 2007.07.22 - 2007.07.23
                              Score       1    2 
--------------------------------------------------
 1: BikJump 1.4              4.5 / 128   58w- 58b-
 2: Adam 3.1                 2.0 /  2    1b+  1w+
 3: Aice 0.99.2              2.0 /  2    1b+  1w+
 4: Alaric 707               2.0 /  2    1b+  1w+
 5: Alfil 7.6                2.0 /  2    1b+  1w+
 6: AliChess 4.08            2.0 /  2    1b+  1w+
 7: Amyan 1.597              2.0 /  2    1b+  1w+
 8: AnMon 5.60               2.0 /  2    1b+  1w+
 9: Arasan 9.5               2.0 /  2    1b+  1w+
10: Aristarch 4.50           2.0 /  2    1b+  1w+
11: Atlas 2.20               2.0 /  2    1b+  1w+
12: Ayito 0.2.994            2.0 /  2    1b+  1w+
13: BigLion 2.23w            2.0 /  2    1b+  1w+
14: Booot 4.13.1             2.0 /  2    1b+  1w+
15: Colossus 2007b           2.0 /  2    1b+  1w+
16: Counter 0.3              2.0 /  2    1b+  1w+
17: Delfi 5.1                2.0 /  2    1b+  1w+
18: Delphil 1.8              2.0 /  2    1b+  1w+
19: Diablo 0.5.1             2.0 /  2    1b+  1w+
20: Dragon 4.6               2.0 /  2    1b+  1w+
21: Flux 2.1                 2.0 /  2    1b+  1w+
22: Fruit 2.1                2.0 /  2    1b+  1w+
23: Gaia 3.5                 2.0 /  2    1b+  1w+
24: Gibbon 2.41a             2.0 /  2    1b+  1w+
25: Glaurung 2 - epsilon5    2.0 /  2    1b+  1w+
26: Hamsters 0.3             2.0 /  2    1b+  1w+
27: Hermann 2.0              2.0 /  2    1b+  1w+
28: Homer 2.0                2.0 /  2    1b+  1w+
29: Ktulu 4.2                2.0 /  2    1b+  1w+
30: Lime 62                  2.0 /  2    1b+  1w+
31: List 5.12                2.0 /  2    1b+  1w+
32: LittleThought 1.00       2.0 /  2    1b+  1w+
33: Madeleine 0.2            2.0 /  2    1b+  1w+
34: Mediocre 0.332           2.0 /  2    1b+  1w+
35: Monarch 1.7              2.0 /  2    1b+  1w+
36: Movei 0.08.403           2.0 /  2    1b+  1w+
37: Muse 0.899b              2.0 /  2    1b+  1w+
38: Naum 2.0                 2.0 /  2    1b+  1w+
39: Nejmet 3.07              2.0 /  2    1b+  1w+
40: Patzer 3.80              2.0 /  2    1b+  1w+
41: Pepito v1.59             2.0 /  2    1b+  1w+
42: Pharaon 3.5.1            2.0 /  2    1b+  1w+
43: Pupsi 0.18               2.0 /  2    1b+  1w+
44: Queen 3.09               2.0 /  2    1b+  1w+
45: Rotor 0.2                2.0 /  2    1b+  1w+
46: Ruffian 1.0.5            2.0 /  2    1b+  1w+
47: Sjeng 12.13              2.0 /  2    1b+  1w+
48: SlowChess Blitz WV2.1    2.0 /  2    1b+  1w+
49: Snitch 1.6.2             2.0 /  2    1b+  1w+
50: Spike 1.2 Turin          2.0 /  2    1b+  1w+
51: SOS 5.1                  2.0 /  2    1b+  1w+
52: Tao 5.6                  2.0 /  2    1b+  1w+
53: Tornado 1.0 Mainz        2.0 /  2    1b+  1w+
54: Twisted Logic 20070706   2.0 /  2    1b+  1w+
55: Ufim 8.02                2.0 /  2    1b+  1w+
56: WJChess v1.64            2.0 /  2    1b+  1w+
57: Yace Paderborn           2.0 /  2    1b+  1w+
58: Zappa 1.1                2.0 /  2    1b+  1w+
59: Anechka 0.08             1.5 /  2    1b+  1w=
60: BBChess 1.1              1.5 /  2    1b=  1w+
61: Arion 1.7                1.5 /  2    1b=  1w+
62: GreKo 5.3                1.5 /  2    1b=  1w+
63: Matheus 2.3              1.5 /  2    1b=  1w+
64: Abrok 5.0                1.0 /  2    1b+  1w-
65: Chispa 4.0.3             1.0 /  2    1b-  1w+
--------------------------------------------------
128 games: +63 =5 -60
Regards,
Patrick
User avatar
hgm
Posts: 27814
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: BikJump 1.4 : 1781

Post by hgm »

I wonder if there really is any accuracy in determining a rating from games of BikJump against these opponents. At best they are all in the extreme tail of the result probability function underlying Elo theory, making the statistical samplig error have an unduly large effect on the final rating.

But if you look at which engines the wins were scored, this makes it in itself very unikely that this distribution function properly describes the results. And if it doesn't, that would make the rating little more than a blind guess.
User avatar
Le Fou numerique
Posts: 859
Joined: Wed Apr 05, 2006 9:58 pm
Location: Elsass

Re: BikJump 1.4 : 1781

Post by Le Fou numerique »

hgm wrote:I wonder if there really is any accuracy in determining a rating from games of BikJump against these opponents. At best they are all in the extreme tail of the result probability function underlying Elo theory, making the statistical samplig error have an unduly large effect on the final rating.

But if you look at which engines the wins were scored, this makes it in itself very unikely that this distribution function properly describes the results. And if it doesn't, that would make the rating little more than a blind guess.
Hi,

All engines in the UEL make this "tournoi d'entrée".

It's true that for the weakest ones, it isn't always outstanding.

But it is at the same time an experiment for strongest. They must prove that they are able to gain victories against weakest. The UEL is composed of 88 engines currently. Each one must have the oppotunity to play against each one.

Regards,
Patrick
User avatar
hgm
Posts: 27814
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: BikJump 1.4 : 1781

Post by hgm »

I see your point, and I agree that it is a good idea to have any engine play any other engine at least once (or twice, with both colors). If you only play closely matched engines against each other, the overall rating scale becomes very ill defined. I once calculated that it is the games between engines differing in rating by ~400 that are most important for getting an accurate scale. (As opposed to determining accurately where an engine should be placed on that scale.)

But I noticed there are also many engines on your list with ratings between 1600 and 1900, which seem not to participate in your standard gauntlet. And by your own logic, games against these engines in general should be just as useful as against the top engines. And in this particular case, they would actually be vastly more useful for the rating determination.
User avatar
Le Fou numerique
Posts: 859
Joined: Wed Apr 05, 2006 9:58 pm
Location: Elsass

Re: BikJump 1.4 : 1781

Post by Le Fou numerique »

hgm wrote: But I noticed there are also many engines on your list with ratings between 1600 and 1900, which seem not to participate in your standard gauntlet. And by your own logic, games against these engines in general should be just as useful as against the top engines. And in this particular case, they would actually be vastly more useful for the rating determination.
Hi,

The UEL is composed by the Divisions Elite to Poussin.

In Accession & Réserve, there are new engines (until the monthly update of Divisions) or engines whose rating is not sufficient. These engines are in reserve. They can play tournament in Accession & Réserve or participate with a Wildcard in other tournaments.
But in general, they play just few games against the engines in the Ligue.

The interest of the Ligue is not to have the top engines with an artificial rating around 3000 or only games between engines of the Division Elite or the engine just currently in 'mode' (this is not Bild Zeitung or Exclusiv (the rich and the beautiful)).
In each Division, there are 8 engines and 6 of 8 must play in a tournament. For the monthly update, there must be 3 tournaments in each Division. So each month, all engines in the UEL play in tournaments.

Sure the rating is not perfect but in the Ligue (Elite to Poussin), I think that each engine had the right place, but not the absolut relativ rating.

Regards,
Patrick
User avatar
abik
Posts: 819
Joined: Fri Dec 01, 2006 10:46 pm
Location: Mountain View, CA, USA
Full name: Aart Bik

Re: BikJump 1.4 : 1781

Post by abik »

Thanks Patrick, as usual, for your very rapid testing of new versions. Your dedication to incorporate weaker engines like mine in your tournaments is very much appreciated.

I took a somewhat risky step by rewriting the evaluation function, as I wanted to incorporate some features that did not fit well in the older one. Nevertheless, the 1907 (v1.3) to 1781 (v1.4) drop came as somewhat as a surprise, as v1.4 did pretty well in my tests (tactical marathon and various small tournaments). I also tuned some parameters, however, and may have messed something up there (I always make the mistake of eagerly trading too much breath for nice looking deeper depths). Overall I am still struggling to find a good testing methodology that makes better strength predictions but also fits the unfortunately somewhat restricted time I can spent on BikJump development.

I have still many ideas for improvements, so I am not giving up yet :-)

Aart Bik
http://www.aartbik.com/
User avatar
hgm
Posts: 27814
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: BikJump 1.4 : 1781

Post by hgm »

I wouldn't worry about it too much, if I were you, as the standard error of the determined rating seems much larger than the rating difference. So the chances that BikJump 1.4 is better than your previous version is still appreciable.

What I found a very useful way of testing is to play gauntlets from the Nunn positions. For microMax I selected 10 engines in the expected rating range, and let it play each engine 20 times (each Nunn position both with black and white). With 200 games in total that gives a reasonable impression of the strength of your trial versions.

You can easily do this in 2+0 or 40/2, so each game takes 5 min. Then you can test a version within a day. With PSWBTM, Winboard_x + Polyglot and a number of downloaded engines this is very easy to set up.
User avatar
abik
Posts: 819
Joined: Fri Dec 01, 2006 10:46 pm
Location: Mountain View, CA, USA
Full name: Aart Bik

Re: BikJump 1.4 : 1781

Post by abik »

Thank Dr. Muller for the testing pointers. I just found one very obvious error in ponder time allocation, where I left some experimental code in (probably not relevant for the initial RUEL though), and noticed I introduced probably a little too much "late move reduction", which is more likely the culprit (if any) than the new evaluation function. I am going to put out a v1.4.1 tonight after some sanity testing, mainly because I am interested in BikJump's performance in the ongoing Chesswars, where pondering is important.
User avatar
hgm
Posts: 27814
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: BikJump 1.4 : 1781

Post by hgm »

Well, I experienced myself how difficult it is to find opponents in that rating range that are stable enough for automated testing (without bringing the test to a halt by hanging the system...). And the few that were suitable often played reproducible, so that uMax (which plays reproducible too) could only play two games against them. The Nunn positions were really helpful to allow more independent test games. And in PSWBTM I could simply select the file with Nunn positions as opening positions, and it automatically worked.