Most of modern alpha-beta search logic is just heuristics that work enough of the time to give an ELO gain. There are many positions where they will fail. I remember when Toga for example was one of the early engines making heavy use of late move pruning. That is just throwing away big chunks of the search tree and not even looking at the moves. I found it incredible that that could be good but in fact it is an effective technique.
--Jon
RMO - Randomized Move Order - yet another Lazy SMP derivate
Moderators: hgm, Rebel, chrisw
-
- Posts: 4366
- Joined: Fri Mar 10, 2006 5:23 am
- Location: http://www.arasanchess.org
-
- Posts: 97
- Joined: Mon Jun 25, 2012 10:16 pm
- Location: Forks, WA
- Full name: Ben Nye
Re: RMO - Randomized Move Order - yet another Lazy SMP derivate
If this gets better game results despite worse time to depth, then it must be searching moves that were incorrectly pruned normally. And it seems that this incorrect pruning has been happening often enough to have a measurable impact. So it seems like it would be worth researching what moves get searched with this method and discarded(or reduced) with other methods that pay off. I would certainly hope that there would be a better method than "shuffle the move order at random" to improve the selection of which moves to search.
Ben
Ben
-
- Posts: 2645
- Joined: Wed Mar 10, 2010 10:18 pm
- Location: Hamburg, Germany
- Full name: Srdja Matovic
Re: RMO - Randomized Move Order - yet another Lazy SMP derivate
Not sure if you are talking about Lucas' Lazy SMP or RMO.Angrim wrote: ↑Tue Dec 31, 2019 6:24 pm If this gets better game results despite worse time to depth, then it must be searching moves that were incorrectly pruned normally. And it seems that this incorrect pruning has been happening often enough to have a measurable impact. So it seems like it would be worth researching what moves get searched with this method and discarded(or reduced) with other methods that pay off. I would certainly hope that there would be a better method than "shuffle the move order at random" to improve the selection of which moves to search.
Ben
I benchmarked ABDADA and RMO with time to depth only.
But I have to admit that my engine is quite simple, and it runs on gpu
architecture, so not sure how and if my results are applicable to pro cpu
engines.
Maybe when >128 core cpus are common someone will try it.
--
Srdja
*** edit ***
PS: I guess with a certain amount of threads ABDADA converges to RMO.
-
- Posts: 2272
- Joined: Mon Sep 29, 2008 1:50 am
Re: RMO - Randomized Move Order - yet another Lazy SMP derivate
I wonder where the TTD claim actually comes from (i.e. that lazy is worse in TTD than YBW). One would need an engine that implements both lazy SMP and YBW to make the comparison. Perhaps the version of SF when they made the switch?Angrim wrote: ↑Tue Dec 31, 2019 6:24 pm If this gets better game results despite worse time to depth, then it must be searching moves that were incorrectly pruned normally. And it seems that this incorrect pruning has been happening often enough to have a measurable impact. So it seems like it would be worth researching what moves get searched with this method and discarded(or reduced) with other methods that pay off. I would certainly hope that there would be a better method than "shuffle the move order at random" to improve the selection of which moves to search.
Ben
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Without ideas there is nothing to simplify.
-
- Posts: 3232
- Joined: Mon May 31, 2010 1:29 pm
- Full name: lucasart
Re: RMO - Randomized Move Order - yet another Lazy SMP derivate
Back to RMO...
The correct way to test is to proceed step by step, testing each component one at a time, not adding them all together, and then comparing to a wrong benchmark (ABDADA). The benchmark should be the shared HT (or possibly its slightly improved version with alternating depths).
Of course, if you can prove that RMO works better than shared HT with a pencil and paper, you can skip the testing. In that case, just write a paper and have it peer reviewed by Michel
Even in its diluted form as you propose, I am 100% certain that this will lose to a simple shared HT. And by "lose" I mean in proper testing, measuring elo not TTD, and using a realistic engine, not a toy engine with plain alpha beta and random ordering.
The correct way to test is to proceed step by step, testing each component one at a time, not adding them all together, and then comparing to a wrong benchmark (ABDADA). The benchmark should be the shared HT (or possibly its slightly improved version with alternating depths).
Of course, if you can prove that RMO works better than shared HT with a pencil and paper, you can skip the testing. In that case, just write a paper and have it peer reviewed by Michel
OK, standard shared HT stuff.- communication of TT move and TT scores between threads via Shared Hash Table
- all threads follow the same Iterative Deepening loop with same depth
- thread 0 performs a normal search with move ordering etc.
What ? Throw away move ordering that is worth hundreds of elo- threads >0 randomize the move order up to QSearch
Yet another seemingly good idea on paper that fails in testing. I used to do this until recently in Demolito. But proper testing showed that it was clearly regressive.- as soon as one thread finishes its search, all threads should terminate the
current ID iteration and start the next one
So now you put a lot of exceptions where we go back to normal, in an attempt to hide the massive elo loss from random move ordering. Move ordering is worth many hundreds of elo on modern engines, throwing that away is a non starter. It should be tested stand alone, just to see how bad it is.RMO - exceptions for threads>0:
There are some cases you may not want to randomize the move order:
- Null move
- TT move
- IID move
- Counter move
- Killer move
and some cases where you may want to perform a normal search:
- LMR
- IID
Even in its diluted form as you propose, I am 100% certain that this will lose to a simple shared HT. And by "lose" I mean in proper testing, measuring elo not TTD, and using a realistic engine, not a toy engine with plain alpha beta and random ordering.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
-
- Posts: 2645
- Joined: Wed Mar 10, 2010 10:18 pm
- Location: Hamburg, Germany
- Full name: Srdja Matovic
Re: RMO - Randomized Move Order - yet another Lazy SMP derivate
The only serious thing you mentioned is to test Elo not time to depth. And youlucasart wrote: ↑Wed Jan 01, 2020 1:25 am Back to RMO...
The correct way to test is to proceed step by step, testing each component one at a time, not adding them all together, and then comparing to a wrong benchmark (ABDADA). The benchmark should be the shared HT (or possibly its slightly improved version with alternating depths).
Of course, if you can prove that RMO works better than shared HT with a pencil and paper, you can skip the testing. In that case, just write a paper and have it peer reviewed by Michel
OK, standard shared HT stuff.- communication of TT move and TT scores between threads via Shared Hash Table
- all threads follow the same Iterative Deepening loop with same depth
- thread 0 performs a normal search with move ordering etc.
What ? Throw away move ordering that is worth hundreds of elo- threads >0 randomize the move order up to QSearch
Yet another seemingly good idea on paper that fails in testing. I used to do this until recently in Demolito. But proper testing showed that it was clearly regressive.- as soon as one thread finishes its search, all threads should terminate the
current ID iteration and start the next one
So now you put a lot of exceptions where we go back to normal, in an attempt to hide the massive elo loss from random move ordering. Move ordering is worth many hundreds of elo on modern engines, throwing that away is a non starter. It should be tested stand alone, just to see how bad it is.RMO - exceptions for threads>0:
There are some cases you may not want to randomize the move order:
- Null move
- TT move
- IID move
- Counter move
- Killer move
and some cases where you may want to perform a normal search:
- LMR
- IID
Even in its diluted form as you propose, I am 100% certain that this will lose to a simple shared HT. And by "lose" I mean in proper testing, measuring elo not TTD, and using a realistic engine, not a toy engine with plain alpha beta and random ordering.
have a point here, I compared different parallel searches only with ttd.
You can brake down the exceptions to TT-Move and Null-Move, and it should be a
no-brainer why.
I decommissioned my gpu workstation, so I am not able any more to produce any data.
I can only repeat my observation, with >= 128 threads there is not such a big
difference between ABDADA and RMO, at least with my toy engine.
--
Srdja
-
- Posts: 2645
- Joined: Wed Mar 10, 2010 10:18 pm
- Location: Hamburg, Germany
- Full name: Srdja Matovic
Re: RMO - Randomized Move Order - yet another Lazy SMP derivate
...to make my approach reproducible, here the according code snippets....
activate RMO
randomize move score during move picking
https://github.com/smatovic/Zeta/blob/v099m/src/zeta.cl
--
Srdja
PS: I named it Randomized Move Order, not Random Move Order
activate RMO
Code: Select all
// lazy smp, randomize move order
if (
lmove==MOVENONE // no Nullmove and no LMR research
&&(
// single slot, randomize
((ttindex1>1&&ttindex2<=1)&&gid>0)
||
// RMO, randomize > n
(RMO&&gid>=RANDWORKERS)
)
&&!(localNodeStates[sd]&QS) // no Quiescence-Search
&&!(localNodeStates[sd]&KIC) // no King In Check
&&!(localNodeStates[sd]&EXT) // no Extension (kic or pawn promo)
&&!(localSearchMode[sd]&NULLMOVESEARCH) // not in Nullmove-Search
&&!(localSearchMode[sd]&LMRSEARCH) // not in LMR-Search
&&!(localSearchMode[sd]&IIDSEARCH) // not in IID-Search
&&localDepth[sd]>0 // remaining search depth >0
// previous searched moves
&&localTodoIndex[sd]>=RANDBRO // RANDBRO set to 1, search oldest son first
)
{
brandomize = true;
}
}
Code: Select all
// eval move
// wood count and piece square tables, pto-pfrom
tmpscore = EvalPieceValues[GETPTYPE(pto)]
+EvalTable[GETPTYPE(pto)*64+((stm)?sqto:FLOP(sqto))]
+EvalControl[((stm)?sqto:FLOP(sqto))];
tmpscore-= EvalPieceValues[GETPTYPE(pfrom)]
+EvalTable[GETPTYPE(pfrom)*64+((stm)?lid:FLOP(lid))]
+EvalControl[((stm)?lid:FLOP(lid))];
// MVV-LVA
tmpscore = (GETPTYPE(pcpt)!=PNONE)?
EvalPieceValues[GETPTYPE(pcpt)]*16-EvalPieceValues[GETPTYPE(pto)]
:tmpscore;
// check counter move heuristic
if (countermove==tmpmove)
{
// score as second highest quiet move
tmpscore = EvalPieceValues[QUEEN]+EvalPieceValues[PAWN];
}
// check killer move heuristic
if (killermove==tmpmove)
{
// score as highest quiet move
tmpscore = EvalPieceValues[QUEEN]+EvalPieceValues[PAWN]*2;
}
// lazy smp, randomize move order
if (brandomize)
{
tmpscore+= (prn%INF);
}
// check iid move
if (localIIDMoves[sd]==tmpmove)
{
// score as 2nd highest move
tmpscore = INFMOVESCORE-200;
// iid move hit counter
COUNTERS[gid*64+5]++;
}
// check tt move
if (ttmove==tmpmove)
{
// score as highest move
tmpscore = INFMOVESCORE-100;
// TThits counter
COUNTERS[gid*64+3]++;
}
https://github.com/smatovic/Zeta/blob/v099m/src/zeta.cl
--
Srdja
PS: I named it Randomized Move Order, not Random Move Order
-
- Posts: 1754
- Joined: Tue Apr 19, 2016 6:08 am
- Location: U.S.A
- Full name: Andrew Grant
Re: RMO - Randomized Move Order - yet another Lazy SMP derivate
I agree with that thought, in that I would not just blindly commit something because its an elo gainer. But its not hard to construct a reasonable argument for why LazySMP might out perform other methods.smatovic wrote: ↑Tue Dec 31, 2019 4:00 pmWhat is an Elo increase worth if you can not figure out why it worked out?AndrewGrant wrote: ↑Tue Dec 31, 2019 2:26 pmApart from the "This basically means that your whole world of pencil and paper is an illusion" part, does anyone really disagree with this rationale? Its one of the better justifications for why Lazy SMP is king.This basically means that your whole world of pencil and paper is an illusion. Perhaps a big part of reason is the chaotic nature of modern search (with all the path dependt effects like history ordering and its effect on lmr etc.). That and the chaotic nature of thread scheduling, becoming chaos^2, a chaos feedback loop. Lazy SMP feeds on chaos, which increases with more search time and more threads, and that's why it beats the hell out of YBW on many threads.
--
Srdja
You can argue that YBW is just a cheap way of making a minor effective speedup, and that LazySMP is not really Lazy, but a more complex and evolved method of multi threading. YBW increases speed. LazySMP increase quality. At least, thats how I've always seen it, as LazySMP fails to produce massive increases in depths, but with better move ordering and the existence of Singular Extensions we get far higher quality nodes.
#WeAreAllDraude #JusticeForDraude #RememberDraude #LeptirBigUltra
"Those who can't do, clone instead" - Eduard ( A real life friend, not this forum's Eduard )
"Those who can't do, clone instead" - Eduard ( A real life friend, not this forum's Eduard )
-
- Posts: 550
- Joined: Tue Nov 19, 2019 8:48 pm
- Full name: Alayan Feh
Re: RMO - Randomized Move Order - yet another Lazy SMP derivate
The thought process of people like bob who think that YBW ought to beat Lazy-SMP goes something like this :
- The optimal single-core algorithm will need <= CPU time to reach the same strength quality as the best multi-threaded algorithm (this is obviously true).
- YBW is a pure speed-up for the single-core algorithm, therefore it has the same asymptotic efficiency.
- LazySMP leads to a different way of shaping the tree, but if that was the optimal one, then the used single-core algorithm should be updated to shape the tree the same way.
This reasoning falls apart once you consider :
- YBW scales terribly to high thread counts. You can't ignore its overhead.
- There are many, many, many way to shape a search tree, and there is no hard frontier between "very important node to search" and "pure waste of time to search". Hence a different shaping might be worse than the one achieved by the single-core algorithm but lose much less efficiency than YBW's overhead.
- The best algorithm depends on resources available. Single-core testing is oriented around bullet TC where there are massive benefits from increased depth. But if we could feasibly do SPRT testing on much longer TC, we would find out plenty of different tricks to both cut down the search tree even more and also importantly limit holes that current techniques tend to generate.
Modern engines prune very very aggressively, and leave thus plenty of holes to fill by searching somewhat wider with LazySMP.
- The optimal single-core algorithm will need <= CPU time to reach the same strength quality as the best multi-threaded algorithm (this is obviously true).
- YBW is a pure speed-up for the single-core algorithm, therefore it has the same asymptotic efficiency.
- LazySMP leads to a different way of shaping the tree, but if that was the optimal one, then the used single-core algorithm should be updated to shape the tree the same way.
This reasoning falls apart once you consider :
- YBW scales terribly to high thread counts. You can't ignore its overhead.
- There are many, many, many way to shape a search tree, and there is no hard frontier between "very important node to search" and "pure waste of time to search". Hence a different shaping might be worse than the one achieved by the single-core algorithm but lose much less efficiency than YBW's overhead.
- The best algorithm depends on resources available. Single-core testing is oriented around bullet TC where there are massive benefits from increased depth. But if we could feasibly do SPRT testing on much longer TC, we would find out plenty of different tricks to both cut down the search tree even more and also importantly limit holes that current techniques tend to generate.
Modern engines prune very very aggressively, and leave thus plenty of holes to fill by searching somewhat wider with LazySMP.
-
- Posts: 201
- Joined: Thu Jun 06, 2019 8:05 pm
- Full name: Percival Tiglao
Re: RMO - Randomized Move Order - yet another Lazy SMP derivate
That's an interesting take for sure.
If randomized move order truly benefits an engine (even across a LazySMP-like algorithm), that suggests that engines are pruning too much these days, and a more methodological search (or in the case of RMO LazySMP: a less methodological search) could increase Elo.
----------
Another note which confuses me greatly: YBW as written in the ChessProgrammingWiki seems suboptimal to me. It misses opportunities for parallelism, while simultaneously parallelizes upon "wasteful" work occasionally. In particular: ALL nodes can be parallelized. While CUT nodes should visit its children fully sequential (not "sequential for 1st node, and parallel for 2nd+ nodes").
After all, if the 2nd node's "information" causes beta-cutoff on node #3, #4, #5 (etc. etc), then all of #3, #4, and #5 search was in vain. That could have been a huge amount of work wasted. In practice, searching the children sequentially doesn't mean losing parallelism. The first child of any CUT node is an ALL node, so the grandchildren from the 1st child can always be searched in parallel.
Other search strategies, such as DTS or "modified YBW" or "modified ABDADA", that I see in Textel or some other engines, do seem to more carefully analyze the search tree and more carefully choose split-points and sources of parallelism.
-----------
Another note: I think communication between threads can be optimized, lowering overhead costs. Under an AlphaBeta framework:
1. Younger brothers never have to send a message to their older brother.
2. The older brother broadcasts a singular message when complete: the centipawn evaluation to its younger brothers.
3. Only active threads need to get the message immediately. A blocked thread could receive the message "later". As long as the message is processed before the blocked threads run, you're efficient.
#2 limits our broadcast to the number of nodes covered by younger brothers. #3 means our broadcast mechanism only needs to scale to the number of cores of the system. Surely there is an algorithm that can provide #2 and #3 with low overhead?