WAC test

lucasart · Post by **lucasart** » Sun Jul 24, 2011 3:54 pm

bob wrote:
lucasart wrote:
bob wrote:I tried all sorts of variations on that idea, but discovered that the "check" extension was the only one that produced any positive gain in Crafty. I removed all others. I also found that 1.0 was the right extension for checks, and then got rid of fractional plies as well to simplify the code a bit...
I am currently testing my extensions against only exenting by one ply forced moves and when we are in check.
I'm just running 10 games in 1'+1". I'm curious to see what happens
How does "10 games" test _anything_? You can run that 10 times and get 10 different results...

Yes, I know. I'm just too impatient to wait for many games

Anyway, testing 10 games gave 6/10 to the OnePly extension for forced moves and check, running against my previous extensions (that were PV dependant).
I'm testing (60 games this time)
* A: extend by 1ply checks and foced moves
* B: extend by 1ply checks and foced moves, and by 1/2 ply pawn moves to the 7th rank and recaptures.

Also I'm thinking:
* pawn 7th rank extensions: should I only extend when the SEE is not < 0 ? otherwise it generally means that the pawn goes to the 7th rank only to be taken right away.
* have you ever tried negative extensions for losing captures. for example extend by -1/2 ply if the SEE is < -Pawn, for eg ?

bob · Post by **bob** » Sun Jul 24, 2011 5:27 pm

lucasart wrote:
bob wrote:
lucasart wrote:
bob wrote:I tried all sorts of variations on that idea, but discovered that the "check" extension was the only one that produced any positive gain in Crafty. I removed all others. I also found that 1.0 was the right extension for checks, and then got rid of fractional plies as well to simplify the code a bit...
I am currently testing my extensions against only exenting by one ply forced moves and when we are in check.
I'm just running 10 games in 1'+1". I'm curious to see what happens
How does "10 games" test _anything_? You can run that 10 times and get 10 different results...
Yes, I know. I'm just too impatient to wait for many games
Anyway, testing 10 games gave 6/10 to the OnePly extension for forced moves and check, running against my previous extensions (that were PV dependant).
I'm testing (60 games this time)
* A: extend by 1ply checks and foced moves
* B: extend by 1ply checks and foced moves, and by 1/2 ply pawn moves to the 7th rank and recaptures.

Also I'm thinking:
* pawn 7th rank extensions: should I only extend when the SEE is not < 0 ? otherwise it generally means that the pawn goes to the 7th rank only to be taken right away.
* have you ever tried negative extensions for losing captures. for example extend by -1/2 ply if the SEE is < -Pawn, for eg ?

I don't think that 1,000 games is enough. You need to use something like BayesElo and either look at likelihood of superiority, or at least make sure that the two elo values are separated by more than the 95% confidence error bar...

lucasart · Post by **lucasart** » Sun Jul 24, 2011 9:53 pm

bob wrote:
lucasart wrote:
bob wrote:
lucasart wrote:
bob wrote:I tried all sorts of variations on that idea, but discovered that the "check" extension was the only one that produced any positive gain in Crafty. I removed all others. I also found that 1.0 was the right extension for checks, and then got rid of fractional plies as well to simplify the code a bit...
I am currently testing my extensions against only exenting by one ply forced moves and when we are in check.
I'm just running 10 games in 1'+1". I'm curious to see what happens
How does "10 games" test _anything_? You can run that 10 times and get 10 different results...
Yes, I know. I'm just too impatient to wait for many games
Anyway, testing 10 games gave 6/10 to the OnePly extension for forced moves and check, running against my previous extensions (that were PV dependant).
I'm testing (60 games this time)
* A: extend by 1ply checks and foced moves
* B: extend by 1ply checks and foced moves, and by 1/2 ply pawn moves to the 7th rank and recaptures.

Also I'm thinking:
* pawn 7th rank extensions: should I only extend when the SEE is not < 0 ? otherwise it generally means that the pawn goes to the 7th rank only to be taken right away.
* have you ever tried negative extensions for losing captures. for example extend by -1/2 ply if the SEE is < -Pawn, for eg ?
I don't think that 1,000 games is enough. You need to use something like BayesElo and either look at likelihood of superiority, or at least make sure that the two elo values are separated by more than the 95% confidence error bar...

OK so here's the math: A plays vs B
* N games
* p wins (for A)
* q draws
* therefore N-p-q losses (for A)

X_i is the score of A (values 0,1/2,1) at the i-th game. The X_i are assumed iid (independance is true since I flush the hash every game, and identical distribution is ok if the book is good, otherwise the book may increase the variance).

unbiaised estimators
* mu = (p + q/2) / N, estimates E(X_i)
* sigma^2 = [p(1-mu)^2 + q(1/2-mu)^2 + (N-p-q)(0-mu)^2]/(N-1), estimates V(X_i)

assuming N big enouigh for the Central Limit theorem approximation to be good enough(*), we have
P(score(A) > mu + sigma/sqrt(N)*Normsinv(5%)) = 95%

example: In my 60 games, I got 23-15-22. Therefore
mu = 56.67%
sigma/sqrt(N)*Normsinv(5%) = -8.40%
P(score(A) > 56.67%-8.40% = 48.27%) = 95%
and since 48.27% < 50%... It cannot be concluded at 95% that A beats B!

(*) could be tested with the Kolmogorov-Smirnov test, which is an appropriate normality test, since it is based on the quantile function.

Conclusion: I need to increase N, until it becomes significant. If I don't get anything significant after enough games that I'm fed up with them, I'll do a WAC comparison (just compare them in tactical situations).

That should be a bit more scientific

lucasart · Post by **lucasart** » Mon Jul 25, 2011 8:19 am

lucasart wrote: OK so here's the math: A plays vs B
* N games
* p wins (for A)
* q draws
* therefore N-p-q losses (for A)

X_i is the score of A (values 0,1/2,1) at the i-th game. The X_i are assumed iid (independance is true since I flush the hash every game, and identical distribution is ok if the book is good, otherwise the book may increase the variance).

unbiaised estimators
* mu = (p + q/2) / N, estimates E(X_i)
* sigma^2 = [p(1-mu)^2 + q(1/2-mu)^2 + (N-p-q)(0-mu)^2]/(N-1), estimates V(X_i)

assuming N big enouigh for the Central Limit theorem approximation to be good enough(*), we have
P(score(A) > mu + sigma/sqrt(N)*Normsinv(5%)) = 95%

example: In my 60 games, I got 23-15-22. Therefore
mu = 56.67%
sigma/sqrt(N)*Normsinv(5%) = -8.40%
P(score(A) > 56.67%-8.40% = 48.27%) = 95%
and since 48.27% < 50%... It cannot be concluded at 95% that A beats B!

(*) could be tested with the Kolmogorov-Smirnov test, which is an appropriate normality test, since it is based on the quantile function.

Conclusion: I need to increase N, until it becomes significant. If I don't get anything significant after enough games that I'm fed up with them, I'll do a WAC comparison (just compare them in tactical situations).

That should be a bit more scientific

OK so I confront:
* A: extends checks and forced replies by 1 ply
* B: extends checks and forced replies by 1 ply, SEE > Pawn winning captures by 1/2, and reduce SEE < -Pawn by 1/2 ply, and pawn moves to the 7-th rank (with see >= 0) by 1/2 ply

Result A vs B: 69-65-59. So, using the stuff explained above, I get P(A > B) = 63.49%, which is way below the 95% signifiance level.

Also when I tested A vs:a lot of different variations, including recaptures only at PV nodes. And whenever I got something almost significant it actually said A was better.

Conclusion: Robert's right, these extensions are a waste of time. And let alone having them PV dependant, which is especially suspicious when using in combination with a hash table (it introduces path dependant conditions).

WAC test

Re: WAC test

Re: WAC test

Re: WAC test

Re: WAC test