Parallelization questions, ABDADA or DTS?

ZirconiumX · Post by **ZirconiumX** » Fri Mar 23, 2012 4:54 pm

DTS gives you the best performance - for an iterative search - and generally.

ABDADA (or Jamboree) will probably give you the best performance for a recursive search.

Personally I would recommend ABDADA as Smash uses it - so you have a free reference point.

Matthew:out

bob · Post by **bob** » Sat Mar 24, 2012 3:56 am

BeRo wrote:Hello,

I'm new here in this forum.

I'm working on a chess engine with support for the UCI and XBoard/WinBoard proctols in the moment, where I do want to implement a parallel search now.

I've searched this forum and the wiki many times already, but I didn't find any final answers for my decision problem.

My engine has currently already two separate working search path implementations (each with IID, QS, extensions, reductions, LMR, and so on), one implementation is recursive and the other implementation is manual-stack-iterate.

And it has a already thread-safe on-64bit-x86-SSSE2-cmpxchg16b-128-bit-atomic-based / on-other-targets-PerSingleSlot-MultipleReaderSingleWriter-Lock-based transposition table.

The board representation is bitboard-based together with the kindergarten idea for semi-staged move generation.

The data structures are splitted already in global and per-thread (each with a own copy of the almost complete current board search state except the TT, iterate-state-machine-stack, etc.) structures.

My question is now: Which parallel algorithm I should implement now? ABDADA or DTS (since it is superior to YWBC, so far I've read it right from the CPW wiki and this forum)?

ABDADA seems to be too far simple for me for to have a good scalability in contrast to DTS. Am I right? But at least ABDADA seems to be almostly asyncron without any explict sync-points and split-points for my eyes, while DTS not.

At least my code is prepared for the parallelization, so I only still have to choose between the two algorithms, before I can implement the parallel search main code. But for this I do want to know which from the two algorirhms would be better for a future proof good scalability.

And after that work I'll extend the yet very weak mostly material+PST-only evaluation function to a responsible bit more strong level.

I hope that I get good answers here.

Regards,
Benjamin 'BeRo' Rosseaux

PS. Sorry for my possible bad english. I'm reading it more than I'm writing it

DTS is harder to do, but something in that "family" is the only way to go. DTS, YBW and such are "close cousins". ABDADA is nowhere near useful. It is simple, but performs poorly as a result. It is better than nothing, but not much better. Particularly if you go to more than 2 cores...

bob · Post by **bob** » Sat Mar 24, 2012 3:57 am

ZirconiumX wrote:DTS gives you the best performance - for an iterative search - and generally.

ABDADA (or Jamboree) will probably give you the best performance for a recursive search.

Personally I would recommend ABDADA as Smash uses it - so you have a free reference point.

Matthew:out

ABDADA is not even close to any good PVS/YBW/DTS type algorithm. Sharing stuff through the hash is not a very efficient way of managing a parallel search...

diep · Post by **diep** » Sat Mar 24, 2012 9:19 am

bob wrote:
ZirconiumX wrote:DTS gives you the best performance - for an iterative search - and generally.

ABDADA (or Jamboree) will probably give you the best performance for a recursive search.

Personally I would recommend ABDADA as Smash uses it - so you have a free reference point.

Matthew:out
ABDADA is not even close to any good PVS/YBW/DTS type algorithm. Sharing stuff through the hash is not a very efficient way of managing a parallel search...

Abdada is easy to implement.
PVS is no good at all, it just splits at 1 spot. It will give some sort of a speedup though.

YBW the superior algorithm of course.

Most go for a variation of Abdada nowadays, as you can start dividing the cpu's in an YBW manner yet you simply don't abort them.

So it's not so close to YBW in performance, yet easy to implement bugfree. Doesn't require big mathskills from you to get it going, unlike a full YBW implementation.

Bob is using YBW for crafty.
Diep is using YBW

It's not much of a discussion what performs best.

DTS locks in in the central search tree and is gonna figure out itself where to lock in and get going.

That's not gonna work at todays hardware, that will jam your machine totally.

Abdada + YBW is of course no good in aborting cpu's but that also removes a big bottleneck.

Much depends upon the effort you are willing to do.

In Diep i nowadays do an asynchroneous YBW. So cores that are aborted, might still be busy aborting the previous search meanwhile the new search already is on its way.
On top of all the a4's i needed to prove the first search of Diep, this extension is a page or 40 of a4 to show the insight that it will not give deadlocks nor aborts. Difficult to get to work.
On a big supercomputer or even a small cluster this has a huge advantage though, as waiting there for all cpu's to synchronize is gonna take long, especially in between the searches...

Regards,
Vincent

BeRo · Post by **BeRo** » Sat Mar 24, 2012 11:52 am

Okay, I'll go then for a ABDADA+YBW hybrid approach, but together with cutoff-children-aborting. So in general YBW together with helpful-master-concept and with some additional extended-TT concept from ABDADA.

Do you have any objections?

diep · Post by **diep** » Sat Mar 24, 2012 1:44 pm

BeRo wrote:Okay, I'll go then for a ABDADA+YBW hybrid approach, but together with cutoff-children-aborting. So in general YBW together with helpful-master-concept and with some additional extended-TT concept from ABDADA.

Do you have any objections?

If you do that then it's more of a lossy YBW already basically

It's questionable whether you can abort children by hashtable in a lossless manner. We can prove of course that you might always overwrite that specific hashtable entry.

So only a lossy concept would make sense then.

The question i'd have then is that if you already do the administering work of YBW, then why would you need the overhead of additional stores to hashtable, which only makes things lossy and slower?

By the way are you capable of reading the original abdada paper?
It's encrypted into Anglish by the original French author! Bon.

BeRo · Post by **BeRo** » Sat Mar 24, 2012 2:57 pm

diep wrote: If you do that then it's more of a lossy YBW already basically

Yes, there you're right.

diep wrote: It's questionable whether you can abort children by hashtable in a lossless manner. We can prove of course that you might always overwrite that specific hashtable entry.

So only a lossy concept would make sense then.

The question i'd have then is that if you already do the administering work of YBW, then why would you need the overhead of additional stores to hashtable, which only makes things lossy and slower?

I'm implementing the plain classical YWBC stuff (together with the helpful master concept) as the first step in this moment, and then I'll see, what optimizations I can do as the next step, so the chances, as you already said, that it remains plain YWBC with some small optimizations, are very high. The original origin hybrid idea was just a chain of first-brainstorm-thoughts.

diep wrote: By the way are you capable of reading the original abdada paper?
It's encrypted into Anglish by the original French author! Bon.

No, I'm a german, not a french, so don't confuse through my french-sounding surname.

jdart · Post by **jdart** » Sat Mar 24, 2012 7:06 pm

Bob is correct.

I tried ABDABA a long time ago and found that it performed poorly because its usage of the hash table as a synchronization mechanism interferes with the normal usage of the hash table as storage for scores and best moves from previously visited nodes. Consequently, the hash hit rate went down and this killed performance. Theoretically, maybe if you had an infinite size hash this wouldn't be an issue but with current memory sizes and node rates many programs fill up their hash rapidly.

--Jon

Daniel Shawul · Post by **Daniel Shawul** » Sat Mar 24, 2012 7:06 pm

ABDADA (or Jamboree) will probably give you the best performance for a recursive search.

Jamboree may be one of the better performers but it is not same as ABDADA. It is more closer to YBW than is to ABDADA. The latter relies on the hashtable for parallel speed up. It may perform really well sometimes. Think of embarassingly parallel algorithms such as randomized best-first RBFM and UCT etc. However YBW performs much better for inherently sequential algorithms like alpha-beta.

Suggestion to the OP: Why not implement YBW with a message passing model ?

Dragulic · Post by **Dragulic** » Sat Mar 24, 2012 10:26 pm

With these views I, a permanent beginner, concur. But parallelisation as we see it today, people think of 8 core, 16 thread, that level. OK, maybe 200 cluster.

IBM and Intel have indicated future lies in massive parallelisation. On chip, limited functionality but 1000, 10000 cores. And many chips. Machine with 1,000,000 threads can be common within our working lives.

How will these algorithms cope and scale?

Parallelization questions, ABDADA or DTS?

Re: Parallelization questions, ABDADA or DTS?

Re: Parallelization questions, ABDADA or DTS?

Re: Parallelization questions, ABDADA or DTS?

Re: Parallelization questions, ABDADA or DTS?

Re: Parallelization questions, ABDADA or DTS?

Re: Parallelization questions, ABDADA or DTS?

Re: Parallelization questions, ABDADA or DTS?

Re: Parallelization questions, ABDADA or DTS?

Re: Parallelization questions, ABDADA or DTS?

Re: Parallelization questions, ABDADA or DTS?