Move ordering test (2)

Rebel · Post by **Rebel** » Thu May 30, 2013 11:15 pm

New formula as suggested by Don is up, see: http://www.top-5000.nl/motest2.htm

First results at: http://www.top-5000.nl/moresu2.htm

I took the liberty to use Don's and Peter's posted numbers. If they are not good please fill in the form or lemme know in another way.

-------

We can do a third and last test later that deals with nodes that get no A/B at all.

bob · Post by **bob** » Fri May 31, 2013 1:28 am

Rebel wrote:New formula as suggested by Don is up, see: http://www.top-5000.nl/motest2.htm

First results at: http://www.top-5000.nl/moresu2.htm

I took the liberty to use Don's and Peter's posted numbers. If they are not good please fill in the form or lemme know in another way.

-------

We can do a third and last test later that deals with nodes that get no A/B at all.

If we are going to study this, I would suggest a change to the methodology:

(1) fail_high += move_number * move_number;

This tries to differentiate between the case where a program fails high on the 2nd move twice, and the fourth once. In the current test, there is little to differentiate between the two cases, by squaring the number, failing high on the 4th move is way worse...

I suppose that if you want to, you could then compute the average and take the square root of that to approximate the "average" position. The raw number would be just as informative without the sort().

Rebel · Post by **Rebel** » Fri May 31, 2013 12:48 pm

bob wrote:
Rebel wrote:New formula as suggested by Don is up, see: http://www.top-5000.nl/motest2.htm

First results at: http://www.top-5000.nl/moresu2.htm

I took the liberty to use Don's and Peter's posted numbers. If they are not good please fill in the form or lemme know in another way.

-------

We can do a third and last test later that deals with nodes that get no A/B at all.
If we are going to study this, I would suggest a change to the methodology:

(1) fail_high += move_number * move_number;

This tries to differentiate between the case where a program fails high on the 2nd move twice, and the fourth once. In the current test, there is little to differentiate between the two cases, by squaring the number, failing high on the 4th move is way worse...

I suppose that if you want to, you could then compute the average and take the square root of that to approximate the "average" position. The raw number would be just as informative without the sort().

The move_count * move_count makes sense.

There are a couple of more lose ends,

1. If you (like me) have the factor updated every second on the screen you will notice a pattern that after a fail-high at the root during the research and the wider window that comes with it the factor will increase rapidly because many new variants (not in TT) are searched.

2. The same counts for a new iteration searching the best move at the new depth, the factor goes up. While searching the rest of the moves (provided no new best move is found), the factor rapidly goes down.

Meaning, that the 30 seconds limit is highly unreliable, if after 30 seconds you are in the middle of several fail-high's than just 10 positions can influence the overall performance in a negative way. I can only see one good solution, increase the number of positions.

3. There has to be made a decision about the nodes where no A/B is possible. My suggestion using your example code:

Code: Select all

 if ( move_number <= 5 ) fail_high += move_number * move_number;

So first 5 moves only.

And at the end of the move list:

Code: Select all

  fail_high += 36;

Rebel · Post by **Rebel** » Fri May 31, 2013 1:31 pm

8 entries received so for. Results start to make more sense.

Mincho Georgiev · Post by **Mincho Georgiev** » Fri May 31, 2013 1:42 pm

Hi Ed! I apologize if this is a stupid question, or already explained, but should we use the search in it's current state, or we should eliminate everything that could cause more noise to the result. I'll try to explain...
If an engine have a good move ordering scheme in general, but there's a bug somewhere (IID, FP, e.t.c.) this would raise a rather different result than if all the search "add-ons" have been disabled during the measurement.
I understand that the move-picking is influenced by the entire search as it is, but still...
Again sorry if I misunderstood the concept, but I think is a plausible question.

Rebel · Post by **Rebel** » Fri May 31, 2013 2:25 pm

Mincho Georgiev wrote:Hi Ed! I apologize if this is a stupid question, or already explained, but should we use the search in it's current state, or we should eliminate everything that could cause more noise to the result. I'll try to explain...
If an engine have a good move ordering scheme in general, but there's a bug somewhere (IID, FP, e.t.c.) this would raise a rather different result than if all the search "add-ons" have been disabled during the measurement.
I understand that the move-picking is influenced by the entire search as it is, but still...
Again sorry if I misunderstood the concept, but I think is a plausible question.

Hi Mincho,

In an almost perfect world such as the early days of CC when most chess programs were pure brute-force searchers with no reductions, pruning and only check extensions (hence the "almost") this test we are doing now would be conclusive.

So you are right, nowadays a conclusive result is impossible. It doesn't mean there is no value in what we are currently doing. Hope this helps.

Mincho Georgiev · Post by **Mincho Georgiev** » Fri May 31, 2013 2:58 pm

Yes, it does help and it makes sense, the search - as it is...
Probably though, someone would be interested to see his results in bare-bones search, compared to his search - as it is currently.
I would for one.
And finally, should we use the counters at the root as well, I supposed so, but tell me if we shouldn't.

hgm · Post by **hgm** » Fri May 31, 2013 3:05 pm

Counting the square of the move number seems exactly the opposite of what is needed, as late moves are often reduced, and thus hardly count. If per node on average only the first 6 moves are not reduced, having all cutoffs on the 6th move is much worse than having half at the 1st, and half at the 11th ((1+6.5)/2 = 3.75, if we count a reduced move for 0.1). With the square counting you would get sqrt((1 + 121)/2) = sqrt(61) ~ 8.

Anyway, Fairy-Max scores ~2.5 - ~3.5. I guess its main problem is that Pawn double pushes (and castlings) cannot become hash move, as it scores especially bad when there are many 2nd-rank Pawns. That it doesn't have killers of course also doesn't help. This is not counting earlier IID iterations, which score of course much worse (but take very little time compared to the last one).

lucasart · Post by **lucasart** » Fri May 31, 2013 3:06 pm

bob wrote: (1) fail_high += move_number * move_number;

We should also account for the depth. Ordering badly at depth=1 is not a disaster, but at depth 10... And there are much more nodes at depth 1 thant nodes at depth 10 due to the "exponential shape" of the tree. So we need some kind of depth rescaling here.

Anyway, I think this move ordering contest doesn't make much sense, because we are comparing apples and pears, with engines that do a lot of things differently that will pollute these results. For example if I make move count pruning more aggressive, I will improve my score at Ed's "move ordering contest", but in reality I will make my engine weaker.

That's why I am not very interested to participate to this contest. The only contest that makes any real sense is an ELO contest

hgm · Post by **hgm** » Fri May 31, 2013 3:09 pm

Well, this was inherent with adopting a defenition of move-ordering quality that does not correlate (or perhaps even anti-correlates) with search efficiency...

Move ordering test (2)

Move ordering test (2)

Re: Move ordering test (2)

Re: Move ordering test (2)

Re: Move ordering test (2)

Re: Move ordering test (2)

Re: Move ordering test (2)

Re: Move ordering test (2)

Re: Move ordering test (2)

Re: Move ordering test (2)

Re: Move ordering test (2)