Per-square MVV/LVS: it's nice but it doesn't work

michiguel · Post by **michiguel** » Tue Nov 25, 2008 8:02 am

bob wrote:
michiguel wrote:
bob wrote:Here is the final results. Again, 22.6R01 is normal, 22.6R02 is the SEE + MVV/LVA capture ordering code.
Code: Select all
Name               Elo    +    - games score oppo. draws
Crafty-22.6R02-101  2600    5    4 31128   49%  2604   20% 
Crafty-22.6R02-102  2600    4    5 31128   49%  2604   21% 
Crafty-22.6R01-101  2593    4    4 31128   48%  2604   21% 
Crafty-22.6R01-102  2592    5    4 31128   48%  2604   21% 
The new version checks in pretty reliably better, although I am still not certain exactly why this would be. The new version hits 2600 twice, the older version is between 2592 and 2593 twice. So 7-8 Elo, again remembering the error bar, but with a total of 62,000 games (which I could combine into two lines above if anyone wants) the error bar would be even smaller while the Elo numbers would not change.

One of those things that makes you go hmmmm.....
I am not surprised at all!

Miguel
I'm not that surprised, because the gain is very small. But I had expected it to be a very small loss, rather than a small gain. I will try this on some longer games as well to see if that changes anything at all... But when I saw WAC2 take 6x longer, I expected to see a loss, so it was a bit of a surprise. SEE is simply more accurate so I am not sure why using SEE to cull moves, and then mvv/lva to actually sort them is of any benefit. I'm going to study it to understand why, however, assuming this is not an artifact of playing fast games (these games were 10 seconds + 10ms increment per move...)

Probably the most interesting thing is that this takes a _lot_ of games to measure the benefit. It is very small. Which leads me to believe that many have depended on superstition and assumed that this works, since it is not so easy to measure...

IMHO, WAC is not good to measure this type of things.
I measured the tree sizes of many quiet positions. I think I posted some results here.

I believe it makes sense for the reasons I explained in some other message.

Miguel

hgm · Post by **hgm** » Tue Nov 25, 2008 8:42 am

7 Elo is not so small. To acheive the same by pure speed / extra thinking time, you would need approximately a 7% faster program.

IMO the gain is expected. I do expect my ordering of captures to work even better, though.

mcostalba wrote:
hgm wrote: 3) Upgrade the victim value of captures with the threatened piece or of the piece threatening it by the opponent's SEE score on this piece.
4) Replace this adjusted victim value by SEE score when (adjusted) victim < attacker or victim < highest threatened piece.
Sorry, but I fail to understund points 3 and 4.

Could you please write the alghorithm as a (simplified) pseudo-code ?

Thanks a lot
Marco

The requested pseudocode is:

Code: Select all

Search(level)
{
    MakeNullMove();
    score = -Search(level+1);
    if(score <= beta) {
        threat = SEE(bestMove[level+1]);
        threatened = bestMove[level+1].to;
        threatener = bestMove[level+1].from;
    } else threat = threatened = 0;
    UnMakeNull();
    if(score > beta) return FAIL_HIGH;

    MoveGen();
    for(ALL_MOVES)
    {
        sortKey[currentMove] = pieceValue[victim]; // MVV
        if(piece == threatened) sortKey[currentMove] += threat;
        if(victim == threatener) sortKey[currentMove] += threat;
        if(sortKey[currentMove] < pieceValue[piece] ||        // H x L
           sortKey[currentMove] < pieceValue[threatened])
            sortKey[currentMove] += SEE(currentMove) - pieceValue[victim];
        sortKey[currentMove] -= 0.01*pieceValue[piece]; // LVA
    }
    Sort(ALL_MOVES, sortKey);
}

bob · Post by **bob** » Tue Nov 25, 2008 5:22 pm

michiguel wrote:
bob wrote:
michiguel wrote:
bob wrote:Here is the final results. Again, 22.6R01 is normal, 22.6R02 is the SEE + MVV/LVA capture ordering code.
Code: Select all
Name               Elo    +    - games score oppo. draws
Crafty-22.6R02-101  2600    5    4 31128   49%  2604   20% 
Crafty-22.6R02-102  2600    4    5 31128   49%  2604   21% 
Crafty-22.6R01-101  2593    4    4 31128   48%  2604   21% 
Crafty-22.6R01-102  2592    5    4 31128   48%  2604   21% 
The new version checks in pretty reliably better, although I am still not certain exactly why this would be. The new version hits 2600 twice, the older version is between 2592 and 2593 twice. So 7-8 Elo, again remembering the error bar, but with a total of 62,000 games (which I could combine into two lines above if anyone wants) the error bar would be even smaller while the Elo numbers would not change.

One of those things that makes you go hmmmm.....
I am not surprised at all!

Miguel
I'm not that surprised, because the gain is very small. But I had expected it to be a very small loss, rather than a small gain. I will try this on some longer games as well to see if that changes anything at all... But when I saw WAC2 take 6x longer, I expected to see a loss, so it was a bit of a surprise. SEE is simply more accurate so I am not sure why using SEE to cull moves, and then mvv/lva to actually sort them is of any benefit. I'm going to study it to understand why, however, assuming this is not an artifact of playing fast games (these games were 10 seconds + 10ms increment per move...)

Probably the most interesting thing is that this takes a _lot_ of games to measure the benefit. It is very small. Which leads me to believe that many have depended on superstition and assumed that this works, since it is not so easy to measure...
IMHO, WAC is not good to measure this type of things.
I measured the tree sizes of many quiet positions. I think I posted some results here.

I believe it makes sense for the reasons I explained in some other message.

Miguel

I don't use WAC for much of anything. But a couple of positions are useful for simple analysis. WAC2 is one example that is less about tactics and more about positional evaluation, to see if a program will realize how strong two connected passed pawns become when they are advanced... Anything that causes a 6x slowdown in such a position is noteworthy...

hgm · Post by **hgm** » Wed Nov 26, 2008 3:48 pm

bob wrote:Probably the most interesting thing is that this takes a _lot_ of games to measure the benefit. It is very small. Which leads me to believe that many have depended on superstition and assumed that this works, since it is not so easy to measure...

This needs not come as a surprise either. Move sorting should not have significant effect on move choice for a given-depth tree, so the only way in which it affects games is by reducing the tree size. And a 1% reduction of tree size is fairly easy to measure by doing searches from a set of typical game positions, while determining the corresponding 1 Elo difference from playing games requires a horrendous number of games.

So playing games is a very inefficient method of determining this. And there are many people that, in the face of unsuitable methods, resort to clever testing instead, rather than superstition...

bob · Post by **bob** » Wed Nov 26, 2008 4:42 pm

hgm wrote:
bob wrote:Probably the most interesting thing is that this takes a _lot_ of games to measure the benefit. It is very small. Which leads me to believe that many have depended on superstition and assumed that this works, since it is not so easy to measure...
This needs not come as a surprise either. Move sorting should not have significant effect on move choice for a given-depth tree, so the only way in which it affects games is by reducing the tree size. And a 1% reduction of tree size is fairly easy to measure by doing searches from a set of typical game positions, while determining the corresponding 1 Elo difference from playing games requires a horrendous number of games.

So playing games is a very inefficient method of determining this. And there are many people that, in the face of unsuitable methods, resort to clever testing instead, rather than superstition...

Many are "too clever". Testing on test-type (tactical) positions is often misleading. Solving tactical positions faster does not, in and of itself, translate to higher Elo in real games...

Games expose you to _all_ types of positions, which any test set is almost guaranteed to not do...

hgm · Post by **hgm** » Wed Nov 26, 2008 4:57 pm

bob wrote:Games expose you to _all_ types of positions, which any test set is almost guaranteed to not do...

They do if you randomly select them from games that you played before, e.g. for another purpose.

This is why I wrote "typical game positions". That is not the same as: "tactical positions".

That you can think of many other poor methods next to the one you used, does not automatically mean that what I proposed above is also poor, or that the people that have been using it in the past to post their conclusions here were "merely superstitious" or must have been "too clever".

bob · Post by **bob** » Wed Nov 26, 2008 9:02 pm

hgm wrote:
bob wrote:Games expose you to _all_ types of positions, which any test set is almost guaranteed to not do...
They do if you randomly select them from games that you played before, e.g. for another purpose.

This is why I wrote "typical game positions". That is not the same as: "tactical positions".

That you can think of many other poor methods next to the one you used, does not automatically mean that what I proposed above is also poor, or that the people that have been using it in the past to post their conclusions here were "merely superstitious" or must have been "too clever".

I will say it again, _clearly_. I trust complete games to evaluate changes, over positions of any kind. Whether they are tactical, positional, opening, endgame, middlegame, or contrived. Games _are_ different from positions.

enough said, IMHO. Test however you want. I have about 3,000 positions I used to use to test changes that might affect the size of the tree. However, it is _easy_ to reduce the size of the tree _and_ reduce playing strength.

I will stick with what certainly works...

Dann Corbit · Post by **Dann Corbit** » Wed Nov 26, 2008 9:32 pm

bob wrote:
hgm wrote:
bob wrote:Games expose you to _all_ types of positions, which any test set is almost guaranteed to not do...
They do if you randomly select them from games that you played before, e.g. for another purpose.

This is why I wrote "typical game positions". That is not the same as: "tactical positions".

That you can think of many other poor methods next to the one you used, does not automatically mean that what I proposed above is also poor, or that the people that have been using it in the past to post their conclusions here were "merely superstitious" or must have been "too clever".
I will say it again, _clearly_. I trust complete games to evaluate changes, over positions of any kind. Whether they are tactical, positional, opening, endgame, middlegame, or contrived. Games _are_ different from positions.

enough said, IMHO. Test however you want. I have about 3,000 positions I used to use to test changes that might affect the size of the tree. However, it is _easy_ to reduce the size of the tree _and_ reduce playing strength.

I will stick with what certainly works...

Reminds me a a least squares fit I did with evaluation pararameters and a test set of 12000 tactical positions. After the fit, the engine would solve test positions like a world champion. Unfortunately, it got pounded by the original engine in actual games.

hgm · Post by **hgm** » Wed Nov 26, 2008 9:49 pm

bob wrote:I will stick with what certainly works...

Certainty comes at a price, and what "certainly works", might in practice be highly inferior to something that likely works. Because the combined error bars of statistical and systematic errors can still be larger when the error bars on the systematical errors is zero.

This particular case is a very good example: you sticking to things that certainly work, lagging several years behind people with approximately 100 times smaller computational resources. And without those people, you would not even have known what to test, with your certainly working method.

But it is of course all their fault that you were wrong all that time: they were, after all, just superstitious morons, that did not have the slightest idea what they were doing, and happened to pick a quite complicated and non-obvious move ordering by sheer luck. Now that you have proven to the world with certainty that it actually works, they will probably repent, and abolish their misguided ways, giving you due credit for your brilliant discovery.

mcostalba · Post by **mcostalba** » Wed Nov 26, 2008 10:00 pm

hgm wrote:
bob wrote:Games expose you to _all_ types of positions, which any test set is almost guaranteed to not do...
They do if you randomly select them from games that you played before, e.g. for another purpose.

This is why I wrote "typical game positions". That is not the same as: "tactical positions".

That you can think of many other poor methods next to the one you used, does not automatically mean that what I proposed above is also poor, or that the people that have been using it in the past to post their conclusions here were "merely superstitious" or must have been "too clever".

When it is possible I agree real games are the best thing. Unfortunatly not everybody has a cluster to test so try to be _clever_ is a necessity not a choice.

I have tried to test with tree size for a given depth the ordering algorithm here proposed by H.G.Muller (BTW thanks very much for the pseudo-code).

Ordering, IMHO, is suitable for tree size testing, I would say is the poster child of tree size testing methods. Of course you cannot rely on tree-size methods for futility pruning or evaluation changes because there it counts also the quality of the moves that the engine chooses, not only their numbers. But just changing the searching order should not change the results...at least I hope

I have tried different variations of the proposed alghoritm on a set of quiet positions, i.e. positions where there is not a winning move but are quite typical and common in mid game.

At the end I found a variation that worked in reducing the size of the tree of about 3-6% depending on the depth. When double checked with real games, with my disappoint, I found the change in ordering failed to give any advantage at all.

I am still investigating....

Per-square MVV/LVS: it's nice but it doesn't work

Re: MVV/LVA - SEE - test - final results

Re: Per-square MVV/LVS: it's nice but it doesn't work

Re: MVV/LVA - SEE - test - final results

Re: MVV/LVA - SEE - test - final results

Re: MVV/LVA - SEE - test - final results

Re: MVV/LVA - SEE - test - final results

Re: MVV/LVA - SEE - test - final results

Re: MVV/LVA - SEE - test - final results

Re: MVV/LVA - SEE - test - final results

Re: MVV/LVA - SEE - test - final results