EGTB value

bob · Post by **bob** » Thu Oct 07, 2010 9:38 pm

hgm wrote:The latter does not contain any information. If conditions were the same, only the total contains information on the winning probablility.

12 extra points out of 500 (so 2.4%) is a bit more significant than the previous result, as the Standard Deviation is now 1.8%.

In fact the detail results are a bit suspect. The standard deviation in 100 games should be 4 points. That means that typically in 32% of such runs the result should deviate from the average by more than 4 points, i.e. be below 48.5 or above 56.5. But in fact all 5 runs are within 2 points from the average. This has a very low probability.

Of course it is not really a miracle if some programs would benefit where others would not. I am sure that micro-Max would benefit a lot from EGTB, as it is totally stupid, and thinks that KNK is a lot better than KPPK. EGTB would enlighten it in this respect. And I am equally sure Crafty contains a lot of end-game knowledge in its evaluation. So the conclusion of Bob's tests should not be that EGTB can never do any good. Just that coding the same knowledge by special purpose code outperforms storng it in huge tables.

I had mentioned that a good while back, but not in this thread. In fact, Crafty does have a lot of endgame knowledge, such as for kpk (to play that perfectly) as well as really good rules for things like KRP vs KR, to know how to play it quite accurately without having tables.

Clearly EGTBs can fill in for missing knowledge, and there more there is to fill in, the more they might help. I have watched Rybka look utterly foolish without EGTBs, playing vs Crafty on ICC. It seems to be in constant danger of drawing a KR vs K ending, for example, for reasons I don't understand... Probably a combination of hashing hiding repetitions and 50-move draws.

Uri Blass · Post by **Uri Blass** » Thu Oct 07, 2010 10:21 pm

hgm wrote:The latter does not contain any information. If conditions were the same, only the total contains information on the winning probablility.

12 extra points out of 500 (so 2.4%) is a bit more significant than the previous result, as the Standard Deviation is now 1.8%.

In fact the detail results are a bit suspect. The standard deviation in 100 games should be 4 points. That means that typically in 32% of such runs the result should deviate from the average by more than 4 points, i.e. be below 48.5 or above 56.5. But in fact all 5 runs are within 2 points from the average. This has a very low probability.

Of course it is not really a miracle if some programs would benefit where others would not. I am sure that micro-Max would benefit a lot from EGTB, as it is totally stupid, and thinks that KNK is a lot better than KPPK. EGTB would enlighten it in this respect. And I am equally sure Crafty contains a lot of end-game knowledge in its evaluation. So the conclusion of Bob's tests should not be that EGTB can never do any good. Just that coding the same knowledge by special purpose code outperforms storng it in huge tables.

The standard deviation in 100 games is unknown and is a function of some unknown parameters.

1)the probability for a draw that is not given.
It can be smaller if the probability for a draw is higher.

2)I am also not sure if the games are independent.
If both programs(rybka with tablebases and rybka without tablebases) play white and black with the same openings then it can reduce the standard deviation.

In an extreme case(for example if white always wins the game in some opening in rybka-rybka game) it is possible that we have probabilty of 100% for 1-1 result in 2 games.

Uri

jwes · Post by **jwes** » Thu Oct 07, 2010 11:27 pm

Milos wrote:
jwes wrote:I was wondering about that. It seems that the probability of one of two equal programs winning 5 short matches would be 3%. I would think that by running several short tests, you would decrease false positives with a corresponding increase of false negatives.
Nope.
If you run 100 short matches of 5 games, 5 matches of 100 games or 500 games at once the total variance is absolutely the same.

So you are saying that the probability of winning 5 short matches is the same as winning 1 long match? Would you care to make some wagers?

Milos · Post by **Milos** » Fri Oct 08, 2010 12:10 am

jwes wrote:So you are saying that the probability of winning 5 short matches is the same as winning 1 long match? Would you care to make some wagers?

Sure, google bootstraping or just read this paper:
http://bcs.whfreeman.com/ips5e/content/ ... oore14.pdf.

jwes · Post by **jwes** » Fri Oct 08, 2010 12:39 am

bob wrote:Again, 23.4-1 and 23.4-2 are the normal 23.4 versions. R03-n versions all use EGTBs, but only probe when ply < n. As you can see, no real significant trend other than that in general, deeper probing drags the Elo down just a bit, but not significantly...

There just doesn't seem to be any "setting" where EGTBs help... The very shallow limits don't hurt much at all, but also don't help significantly either...

Code: Select all

 Crafty-23.4-2        2645    4    4 30000   61%  2556   21% 
 Crafty-23.4R03-1     2644    4    4 30000   61%  2556   20% 
 Crafty-23.4R03-3     2642    4    4 30000   61%  2556   19% 
 Crafty-23.4R03-4     2642    4    4 30000   61%  2556   19% 
 Crafty-23.4-1        2641    4    4 30000   61%  2556   21% 
 Crafty-23.4R03-8     2640    4    4 30000   61%  2556   18% 
 Crafty-23.4R03-7     2640    4    4 30000   61%  2556   18% 
 Crafty-23.4R03-6     2640    4    4 30000   61%  2556   18% 
 Crafty-23.4R03-10    2640    4    4 30000   61%  2556   18% 
 Crafty-23.4R03-5     2639    4    4 30000   61%  2556   19% 
 Crafty-23.4R03-12    2639    4    4 30000   60%  2556   18% 
 Crafty-23.4R03-2     2638    4    4 30000   61%  2556   20% 
[/code[[/quote]
Depth remaining might be better than ply for a limit. Another idea is to only probe if material is within a pawn, or if the side ahead in material has no pawns, within a piece. You could create a table that says when to probe each EGTB, e.g. probe KQPkq deep in the tree, probe KQQkp at the root.
There are two reasons you would want to probe EGTBs, because it is more accurate or because it is faster. If there is enough depth remaining, it is faster to use EGTBs. There are only a few EGTB tables that are significantly more accurate than a short search.
It might be an interesting experiment to see how much other endgame knowledge helps, e.g. square of the pawn, or B+RP of the wrong color.

hgm · Post by **hgm** » Fri Oct 08, 2010 10:09 am

[quote="Uri Blass]The standard deviation in 100 games is unknown and is a function of some unknown parameters.

1)the probability for a draw that is not given.
It can be smaller if the probability for a draw is higher.

2)I am also not sure if the games are independent.
If both programs(rybka with tablebases and rybka without tablebases) play white and black with the same openings then it can reduce the standard deviation.

In an extreme case(for example if white always wins the game in some opening in rybka-rybka game) it is possible that we have probabilty of 100% for 1-1 result in 2 games.
[/quote]

The variance I estimated would indeed be for the (typocal) 36% draw rate. But it is not a very steep function of that draw rate: the latter would have to increase to 84% to halve the standard deviation. But to get this out of the way, we should indeed inquire: what was the draw rate in these tests?

To catch the second problem, we should know separate results for white and black, because within each of these sets it should still be indepent games.

jwes wrote:So you are saying that the probability of winning 5 short matches is the same as winning 1 long match? Would you care to make some wagers?

No, this is not what he is saying. What he is saying is that partal results of 5x 55/100 give you exactly the same likelihood that the average score of A against B is 55%, (or that A is stronger than B) as results 51, 53, 55, 56, 60, or as results 49, 49, 49, 49, 79, provided you were indeed doing what you think you were doing (i.e. the conditions in all games were the same, and the games were independent).

It is just that the latter result is extremely unlikely under the presumed conditions, and make it almost certain that something went wrong. so you would become suspicious with a reslt like that, and look for errors (e.g. you accidentally used a different execuatable in the 5th run). Five times 55 is actually slightly more likely than the second series, (if you did not reorder the latter), but is still suspect, because it is quite easy to have deviations from the presumed conditions that lead to it (e.g. all 100-game sets are the same, because the randomization of the opening book happened to have a cycle of 100).

Laskos · Post by **Laskos** » Mon Oct 11, 2010 4:04 pm

Uri Blass wrote:
The standard deviation in 100 games is unknown and is a function of some unknown parameters.

1)the probability for a draw that is not given.
It can be smaller if the probability for a draw is higher.

2)I am also not sure if the games are independent.
If both programs(rybka with tablebases and rybka without tablebases) play white and black with the same openings then it can reduce the standard deviation.

In an extreme case(for example if white always wins the game in some opening in rybka-rybka game) it is possible that we have probabilty of 100% for 1-1 result in 2 games.

Uri

1) The dependance on the ratio of draws is slow. For pretty equal engines, the standard deviation goes as Sqrt(1-DrawRatio). This is a slow function for usual values of DrawRatio, for example [0.2, 0.6]

2) Aint gonna happen. Even by playing the same opening always (standard opening position, for example), the distribution is pretty much a trinomial which goes to normal distribution for many games irrespective of which opponent you are playing. If one will use a set of balanced opening positions, then the deviation from normal distribution will -> 0 for large number of games.

Laskos · Post by **Laskos** » Mon Oct 11, 2010 4:25 pm

to add:

2) ... the same balanced opening always

Jouni · Post by **Jouni** » Mon Oct 11, 2010 8:10 pm

I must admit, that You need 1000s of games to prove anything. I also run 300 games against Stockfish 1.9 with result:

R3 TB - Stockfish 1.9 147 - 153
R3 NOTB - Stockfish 1.9 156,5 - 143,5 (!)

I haven't CPU time to more games...

Jouni

Gerard Taille · Post by **Gerard Taille** » Wed Oct 13, 2010 10:22 am

Hi,

I am really upset by this discussion because I cannot imagine having a large EGTB without being able to improve my program. You cannot accept such failure.
In my draughts program I effectively use a large EGTB but I had to work hard in order to use it efficiently. Several conditions seem necessary to reach an improvement of your program:
1) The first one was already mentioned : you have to analyze when it could be useful to accept an I/O and when you have to avoid it
2) The second and major point is to keep in your hash table the information relating to the draw result of a position
This second point is the most important and need some more explanations. Suppose for example that the tree root position is an advantageous position for white. As a consequence, black would be happy to find a draw. Now take node1 of the tree where it is black to play and where it exists a black move assuring a draw according to the EGTB. You can and you must keep in your hastable that node1 is “at least a draw” for black. That way, when you will reach again node1 (typically with a higher remaining depth) you will be able to stop the search without generating the subtree and without using again the time consuming EGTB.
Without this mechanism you can hardly improve your program by using an EGTB.
Has somebody try to use this approach?

EGTB value

Re: EGTB value - final test results

Re: EGTB value - final test results

Re: EGTB value - final test results

Re: EGTB value - final test results

Re: EGTB value - final test results

Re: EGTB value - final test results

Re: EGTB value - final test results

Re: EGTB value - final test results

Re: EGTB value - final test results

Re: EGTB value