Start EndGame - A matter of fine tuning ???

laoliveirajr · Post by **laoliveirajr** » Thu Feb 14, 2013 5:57 am

.

After some more tests, I published two more versions.

I think the strongest are between 2340~~2400 !! (CCRL) !!

Rating.dat
==========
    Program                     Elo    +   -   Games   Score   Av.Op.  Draws
Capivara LK 0.09a01a 64-bit  &#58;   57   76  74    64    64.1 %    -43   28.1 %
Capivara LK 0.09a02a 64-bit  &#58;   51   69  67    64    63.3 %    -43   39.1 %
Capivara LK 0.09a01 64-bit   &#58;   22   81  80    64    56.2 %    -22   15.6 %
Capivara LK 0.09a02 64-bit   &#58;   22   67  66    64    56.2 %    -22   40.6 %

Programs.dat
============
Capivara LK 0.09a01a 64-bit&#58;   57   64 (+ 32,= 18,- 14&#41;, 64.1 %
Capivara LK 0.09a02a 64-bit&#58;   51   64 (+ 28,= 25,- 11&#41;, 63.3 %
Capivara LK 0.09a01 64-bit &#58;   22   64 (+ 31,= 10,- 23&#41;, 56.2 %
Capivara LK 0.09a02 64-bit &#58;   22   64 (+ 23,= 26,- 15&#41;, 56.2 %

.
.
.
The computer is still running ...

.
.
.

laoliveirajr · Post by **laoliveirajr** » Sat Feb 16, 2013 12:00 am

Evert wrote:64 games is not remotely enough to test anything...

Some ideas, and emerged nine different versions of the Capivara chess engine.
Although I have only one machine, 64 games each version were sufficient to rule out some builds, and choose five to post them.
I hope I'm a good feeling ...

Code: Select all

Rating.dat
==========
    Program                     Elo    +   -   Games   Score   Av.Op.  Draws
Capivara LK 0.09a01a 64-bit  &#58;   50   76  74    64    64.1 %    -50   28.1 %
Capivara LK 0.09a02a 64-bit  &#58;   47   69  67    64    63.3 %    -47   39.1 %
Capivara LK 0.09a01b 64-bit  &#58;   36   70  69    64    60.2 %    -36   35.9 %
Capivara LK 0.09a01 64-bit   &#58;   22   81  80    64    56.2 %    -22   15.6 %
Capivara LK 0.09a02 64-bit   &#58;   22   67  66    64    56.2 %    -22   40.6 %

Programs.dat
============
Capivara LK 0.09a01a 64-bit&#58;   50   64 (+ 32,= 18,- 14&#41;, 64.1 %
Capivara LK 0.09a02a 64-bit&#58;   47   64 (+ 28,= 25,- 11&#41;, 63.3 %
Capivara LK 0.09a01b 64-bit&#58;   36   64 (+ 27,= 23,- 14&#41;, 60.2 %
Capivara LK 0.09a01 64-bit &#58;   22   64 (+ 31,= 10,- 23&#41;, 56.2 %
Capivara LK 0.09a02 64-bit &#58;   22   64 (+ 23,= 26,- 15&#41;, 56.2 %

(
Edited after Arena & ELO-Stat running.
There was a difference in ratings for the previous post because this time it was executed with a file .pgn for each engine version.
)

Evert · Post by **Evert** » Sat Feb 16, 2013 6:41 am

I understand the problem of limited computational resources, I really do. There are some changes that you don't need hundreds or thousands of games to refute (or discard usually), but most of them you do.

Two things stand out to me in your posted results: the error bar is larger than the ratings themselves, let alone the rating differences. This means that as far as the test shows, all versions are equally strong. Three second is that the draw rate varies considerably. Don't know if others have a similar experience, but in my experience it means that the results haven't converged yet...

I'm not trying to get you down, but you do need more games to test things.

laoliveirajr · Post by **laoliveirajr** » Sat Feb 16, 2013 3:56 pm

Due to the few computing resources, I decided to do further tests substituting quantity for quality in testing.

Chosen for initial opponents, three engines already used in other tests. The opponents engines in the next round of testing will be those with CCRL rating equal to the average performance of the versions of Capivara in the round just ended.

(The same as used previously to determine the rating of some previous versions, see at site: Viva Xadrez !!! - Testando Engines: http://vivaxadrez.wikispaces.com/TestandoEngines)

The tests are already running since yesterday ...

There is a detail which I had not paid attention: behavior / performance of engines Capivara is variable, when playing with several different engines. That much influence the test results, which are not as predictable as I had imagined.

The tests next week will be intensive (as intense as possible), because I have not decided which version will participate in the tournament CCT15.

laoliveirajr · Post by **laoliveirajr** » Sat Mar 16, 2013 8:07 pm

Tests of Capivara new versions - how do calculate ratings ?

I do some tests against 12 engines (it will be against 15 ...) and I get average ratings, using CCRL ratings 40/4 table published in 09 FEB 2013 to comparing these.

Then I obtained:
Capivara LK 0.09a01a == 2346,3
Capivara LK 0.09a01b == 2343,9
Capivara LK 0.09a02a == 2356,8

PGN-Download: http://vivaxadrez.wikispaces.com/file/v ... derOFF.zip

The method used is correct / acceptable?

thanks
Lourenço

laoliveirajr · Post by **laoliveirajr** » Sat Mar 23, 2013 9:43 pm

.

... after play against 18 oponents

I was obtain:
Capivara LK 0.09a01a: 2349,3
Capivara LK 0.09a01b: 2352,6
Capivara LK 0.09a02a: 2357,7

PGN-Download: (1728 games) http://vivaxadrez.wikispaces.com/file/v ... derOFF.zip

in http://vivaxadrez.wikispaces.com/TestandoEngines

No answers, then I ask again:
The method used is correct / acceptable?

thanks
Lourenço

laoliveirajr · Post by **laoliveirajr** » Tue Apr 23, 2013 4:31 am

laoliveirajr wrote:
lucasart wrote:
laoliveirajr wrote: But there is still a small issue to be discussed and tested: The MaterialPieceValue to start phase EndGame.
The transition between middlegame and endgame should not be discontinuous. So what you're doing is essentially bad, regardless of where you choose to put the discontinuity: this is what engines did in the pre-Fruit era (a.k.a. paleolithic)
The linear transition between opening and endgame was one of the breakthroughs of Fruit (2004). I suppose everyone does something like that nowadays.
From the beginning, there was only CapivaraLK008b04/02a/03a/04a to the same PST LK007 series, with only the KingEndGame (Like the TSCP).

The Capivara LK008b05/06/07/08/09 were discarded, because I found a big bug, they were also implementing EndGame, among other improvements ...

The Capivara LK008b10/b11/b12 had only beta versions, where many were tested PST, with endgames, and some versions also had EndGameBonus for PieceValues ...

Now, with Capivara LK009 I resumed my old CapivaraPST with recent BonusPST (yes! for the Capivara, a PST just like the TSCP-PST works better than the "common PST" as is the "Rybka PST" or "Fruit PST" among other tested PST !!!)

The PST as the TSCP-PST is part of Evaluate00, called by Search00, considering the diagram below.
The Evaluate22 making a counter point, contains elements common engines, and elements not allowed in Evaluate00
The same occurs with SeachXX and QuiescenceXX

How Capivara LK 0.08b0x works...

How Capivara LK 0.08b0x works...

Already occurred to me the idea of dividing the game in several stages, in which the engine would several PSTables, several PieceValues, including progressive PawsValuesBonus, but this can only occur after very well defined "what is a good EndGame status".

...
...
...

.

.

The differences between a01a, a02a, a01b and a02b are relatively significant for me, in the conceptual point of view, had only a small difference in the results.

Yesterday I compiled 4 more versions of the Capivara LK 0.09, still unpublished: b01a, b02a, b01b and b02b.

The Capivara has two different search functions, one calculates the positions of the point of view of the Capivara, another calculates the point of view of the opponent.

I made a small change, this time removing the pruning of the function that computes the opponent's side, so the engine will prune considering only the values of Capivara.

The next change I intend to do, will be the Capivara with concurrent processing.
The splits will be made by the function that computes the opponent's side, so a function throws the threads, and the other will do the pruning.

Wait for Deep Capivara !!!
.

Start EndGame - A matter of fine tuning ???

Re: The EndGame start - A matter of fine tuning ???

Re: The EndGame start - A matter of fine tuning ???

Re: The EndGame start - A matter of fine tuning ???

Re: The EndGame start - A matter of fine tuning ???

Tests of Capivara new versions - how do calculate ratings ?

... after 18 oponents

Wait for Deep Capivara !!!