Crafty 25.6 search stability

MikeB · Post by **MikeB** » Fri Apr 24, 2020 3:39 am

jhaglund2 wrote: ↑Fri Apr 24, 2020 1:54 am
So it is possible that the hash/memory is being shared between all running Craftys?
Something changes...

Another good example would be the historical Crafty benchmark.

Bob would have the official total nodes number. The only thing that would change is the total time elapsed and NPS. All of the rest of data would be the same.

Today, you can get varied node totals each time you run it.
mt 1 =
mt 2 =
mt 4 =
mt 8 =

To me, they should all add up to the same total, like a calculator, but they don't.

I have never seen the bench command return a different nodes searched signature with mt=1, can you demonstrate that?

jhaglund2 · Post by **jhaglund2** » Fri Apr 24, 2020 7:23 am

I only wish it was deterministic as it would greatly simplify debugging the search. But it will never happen on any existing multi-threaded chess program.

I came across a nice little read on phase-concurrent deterministic hash tables.
https://people.csail.mit.edu/jshun/hash.pdf

I have never seen the bench command return a different nodes searched signature with mt=1, can you demonstrate that?

MT 0/1 are consistent, but if you want to get technical, running a bench back-to-back will return different nodes.

SD=25 (or whatever works without taking too long). Compare the output and they will match exactly. And then when they start pondering, they will also match exactly. Because you are not introducing slight timing variances that can have a much bigger effect that you might expect.

The SD=25 test works like I expect it too.

Yes, they match exactly. That was my point. The depth-> PV lines should always match. The scores should always match. The nodes match. Every time you search a given position, the depth -> PV lines should be the same, and not change due to some other random looking factors.

just play Crafty vs Crafty with 1 second per move, pondering on. Repeat and play a second game. They will not match move for move or score for score. The timing variations caused by interrupts, timer sampling accuracy, etc will change things. In fact, play 100 games. It is likely NONE of the games will be the same. All perfectly normal and expected.

This is where, I don't see how this is "normal"... unless some "learning" is involved with a given position to change the search selection.

If the Bench() is set to SD=16 for each position, there's still 'x' nodes to be searched. MT or not, you should arrive at SD=16, after the same node count. +/- nodes would indicate something isn't working the same as in the previous SD=25 test. You should just arrive to 'x' nodes for each additional thread helping faster.

The same for a "go". The ply PV lines and scores should always give you the same results as the last time you searched the position in any game.

The sd test:

Code: Select all

sd 20
go
20->   7.58/42.00    0.27   1. e4 Nc6 2. d4 d5 3. e5 Bf5 4. Nf3 e6
                                     5. Bb5 Qd7 6. Nc3 O-O-O 7. O-O Be7 8. Bg5
                                     f6 9. Bf4 Nh6 10. exf6 Bxf6
go
20->   9.08/48.59    0.26   1. ... Nf6 2. e5 Nd5 3. Nf3 d6 4. Bc4 Nb6
                                     5. Bb3 e6 6. O-O dxe5 7. Nxe5 Bd6 8. d4
                                     O-O 9. Nc3 N8d7 10. Nf3 Nf6 11. Ne5 Bxe5
                                     12. dxe5

The "go" test:

Code: Select all

go
 20->   7.61/42.00    0.27   1. e4 Nc6 2. d4 d5 3. e5 Bf5 4. Nf3 e6
                                     5. Bb5 Qd7 6. Nc3 O-O-O 7. O-O Be7 8. Bg5
                                     f6 9. Bf4 Nh6 10. exf6 Bxf6

go
 20->   1.84/21.60    0.30   1. ... e5 2. Nf3 Nc6 3. Bb5 Nf6 4. O-O
                                     Nxe4 5. Re1 Nd6 6. Nxe5 Be7 7. Bd3 O-O
                                     8. Nc3 Re8 9. Nd5 Nxe5 10. Rxe5 f6
                                     11. Nxe7+ Rxe7 12. Re3 c6 13. Qg4 Rxe3
                                     14. fxe3 Qe7 15. b3 b6

Why are the SD=20 PV lines able to stay the same, match node for node, but the "go" PV lines are able to change for the same ply?
What is being done different in the "go" search()?

How is the math changing when you have the same math problem?
Why is ply 20-> different for "go" when they both say 20-> for the same position?

You should arrive at the same answer, regardless, of where or what day you do your homework. This is strange to me...

bob · Post by **bob** » Fri Apr 24, 2020 6:31 pm

running bench twice should produce different node counts. The hash table will contain lots of positions from the first run, which will affect the second. You could even see different moves and scores, not just different node counts... Crafty has always been this way as it doesn't clear the hash, ever. All it does, worst case, is to change the current "age" value so that at the start of any new search, it can still use old hash data, but it will overwrite old data before even thinking about overwriting new data...

jhaglund2 · Post by **jhaglund2** » Sat Apr 25, 2020 10:37 pm

bob wrote: ↑Fri Apr 24, 2020 6:31 pm Crafty has always been this way as it doesn't clear the hash, ever. All it does, worst case, is to change the current "age" value so that at the start of any new search, it can still use old hash data, but it will overwrite old data before even thinking about overwriting new data...

Okay. I figured out 1 reason why I was getting variable scores and PV lines, even with MT=1.

I'd get consistent PV lines and scores, same node counts when doing say sd 20 at the start, but eventually it wouldn't produce the same results each time after a few moves. If I did sd 25 at the start, the node count and scores and PV lines wouldn't match with inconsistent results.

Code: Select all

sd 20
mt 1

sd 20
mt 1

Works at start, but eventually inconsistent after each move, including pondering. // No Good.

Code: Select all

sd 25
mt 1

sd 25
mt 1

Different results, scores, PV. // No good.

The default time limit of 30 seconds was terminating the the search so it wouldn't have the same hash info.
If the pondering didn't finish to the set depth, within the set time limit, you'd get different search results.

Code: Select all

st 180
sd 25
mt 1

Code: Select all

st 180
sd 25
mt 1

Same results. //Works

Code: Select all

sd 25
st 180
mt 1

sd 25
st 180
mt 4

Different results // No good.

I still think threads greater than 1, should return the same counts, scores, and PV, but just have less time used. Otherwise, how can you have a reliable search, if it's going to give you different results each time you use it? To me, that's broken by design.

How do you travel in the tree with 1 thread or 4 threads?
1 thread. 1 tree // Works
4 threads. 1 tree
4 threads. 4 trees

bob · Post by **bob** » Sat Apr 25, 2020 11:41 pm

The threads are not synchronized, if they were performance would be terrible.

The problem lies in the underlying non-determinism of the operating system thread scheduling. When an interrupt comes in, which processor/thread gets the interrupt and handles it? While that is happening, that thread is not running. Next time a different thread won't be running.. etc... So there is no real way to solve this other with numerous synchronization points that would make the search perform very poorly.

jhaglund2 · Post by **jhaglund2** » Sun Apr 26, 2020 6:09 am

bob wrote: ↑Sat Apr 25, 2020 11:41 pm The threads are not synchronized, if they were performance would be terrible.

The problem lies in the underlying non-determinism of the operating system thread scheduling. When an interrupt comes in, which processor/thread gets the interrupt and handles it? While that is happening, that thread is not running. Next time a different thread won't be running.. etc... So there is no real way to solve this other with numerous synchronization points that would make the search perform very poorly.

Is there anything that can be useful in thread.c to demonstrate thread synchronization? Also, maybe add a compile flag for a debugging feature.
Is there any working examples of non synchronization vs synchronization thread speed comparisons for chess? I don't care if it's 20x slower.

bob · Post by **bob** » Sun Apr 26, 2020 8:00 pm

About all you can do is allow each thread to search one move and then pause while the other threads do the same, one at a time. And in the same order. Which means no speedup at all, and would actually be slower than a normal alpha/beta search. If you try to let each thread search 2 moves at a time, same thing. All the way to allowing each thread to search one root move at a time. While produces perfect determinism, but zero performance gain.

I remember giving a talk out at the Los Alamos weapons lab back in the early 80's. A 1 hour talk turned into an 8 hour debate about this. Guy named Alex Lubeck (I think that is how it was spelled) had a real problem with the non-determinism (I did a demo that showed this during the talk using one of their Crays). Final conclusion which he eventually convinced himself that non-determinism meant non-performance.

Would certainly be nice. But about the only way to produce it is to do a one-thread search...

jhaglund2 · Post by **jhaglund2** » Mon Apr 27, 2020 10:26 am

bob wrote: ↑Sun Apr 26, 2020 8:00 pm About all you can do is allow each thread to search one move and then pause while the other threads do the same, one at a time. And in the same order. Which means no speedup at all, and would actually be slower than a normal alpha/beta search. If you try to let each thread search 2 moves at a time, same thing. All the way to allowing each thread to search one root move at a time. While produces perfect determinism, but zero performance gain.
<...>
Would certainly be nice. But about the only way to produce it is to do a one-thread search...

These were interesting reads. Maybe they will shed some light on ideas to try.

https://www.bogotobogo.com/cplusplus/mu ... thread.php

http://www.gerald-fahrnholz.eu/sw/onlin ... _sync.html

http://cial.csie.ncku.edu.tw/st2012/pdf ... ilters.pdf

abulmo2 · Post by **abulmo2** » Mon Apr 27, 2020 2:16 pm

jhaglund2 wrote: ↑Mon Apr 27, 2020 10:26 am These were interesting reads. Maybe they will shed some light on ideas to try.

I guess you read this link you post and got some light on the problem:
http://www.gerald-fahrnholz.eu/sw/onlin ... _sync.html and peculiarly the last sentence:
Because of multi threading it remains random behaviour how many reads will result in the same value before it changes.
Multi threading introduces random behaviour. You have to be accustomed to it.

bob · Post by **bob** » Tue Apr 28, 2020 6:15 am

Right. Believe me, there really is NO solution to this, if the goal is to have the parallel program run faster than the serial program. Painful but true. You can try any parallel chess program you can get your hands on, and see the same non-deterministic behavior.

Crafty 25.6 search stability

Re: Crafty 25.6 search stability

Re: Crafty 25.6 search stability

Re: Crafty 25.6 search stability

Re: Crafty 25.6 search stability

Re: Crafty 25.6 search stability

Re: Crafty 25.6 search stability

Re: Crafty 25.6 search stability

Re: Crafty 25.6 search stability

Re: Crafty 25.6 search stability

Re: Crafty 25.6 search stability