adding TT reduces NPS by allot

Discussion of chess software programming and technical issues.

Moderators: hgm, chrisw, Rebel

Viz
Posts: 223
Joined: Tue Apr 09, 2024 6:24 am
Full name: Michael Chaly

Re: adding TT reduces NPS by allot

Post by Viz »

He implemented tt in qs and nps plummet.
While it doesn't in stormphax and in a ton of other engines.
Solution - not to try to fix plummeting but rather disable tt usage in qsearch. Seems smart.
The same method of effectiveness as when you have a wart on your finger instead of using proper things to get rid of it just cut off your hand.
connor_mcmonigle
Posts: 544
Joined: Sun Sep 06, 2020 4:40 am
Full name: Connor McMonigle

Re: adding TT reduces NPS by allot

Post by connor_mcmonigle »

syzygy wrote: Thu Apr 18, 2024 9:26 pm
connor_mcmonigle wrote: Mon Apr 15, 2024 12:57 am Sure. However, telling someone not to perform the TT probe in quiescent search to avoid the slowdown is just bad advice generally and, when stated from a position of authority, especially unhelpful. It's pretty established at this point that the cost of performing a TT probe in the quiescent search is worth it, though I'd encourage any new authors to test this themselves and re-test as the cost of their evaluation function increases relative to the cost of a TT probe.
Whether it is worth it will depend on the engine. For most engines (that were not derived from SF), it is most likely not worth it to probe in the qsearch. For an engine written from scratch, it will probably take 10+ years of active development to make probing in the qsearch worth it. In the meantime, just don't probe in the qsearch.
What's your basis for claiming that it will "probably take 10+ years of active development"? I agree that it will vary engines to engine, but the reality is that evaluation functions have become massively more expensive in modern engines (especially if you're using a neural network for position evaluation) relative to the cost of a TT lookup such that, even for relatively new engines, probing in qsearch is usually worth it. Again, I'd encourage any author to test this for themselves and continue to re-test as the cost of their evaluation function increases relative to the cost of a TT probe.
connor_mcmonigle
Posts: 544
Joined: Sun Sep 06, 2020 4:40 am
Full name: Connor McMonigle

Re: adding TT reduces NPS by allot

Post by connor_mcmonigle »

syzygy wrote: Sat Apr 20, 2024 12:33 am ...
This is not about a "really strong" or even a strong engine. Someone is writing his very first chess program, implemented a TT, and saw his nps plummet. Not probing in the qsearch will avoid most of the nps drop, and the engine will almost certainly play stronger (which can simply be tested). Sure, after some years of further development it might be a win to probe the TT in the qsearch again, but that is not a reason to stick to that now. Engine development is an iterative process.
Yes, I completely agree. For an engine just at the stage of adding a transposition table implementation, not performing the TT probe in one's qsearch will likely gain Elo relative to performing the TT probe. However, presenting this as necessarily true for all engines is not particularly helpful.
User avatar
hgm
Posts: 28265
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: adding TT reduces NPS by allot

Post by hgm »

So why did you think it would be necessarily true for all engines?
connor_mcmonigle
Posts: 544
Joined: Sun Sep 06, 2020 4:40 am
Full name: Connor McMonigle

Re: adding TT reduces NPS by allot

Post by connor_mcmonigle »

hgm wrote: Sat Apr 20, 2024 6:50 am So why did you think it would be necessarily true for all engines?
Okay, that might have been a bit hyperbolic, but all the same, the following seems to suggest that it's some popular solution to this problem:
"Conventional solution to this problem is not probe the TT in the QS/leaf nodes." - It's not a solution to a slow TT implementation nor is it conventional.
User avatar
hgm
Posts: 28265
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: adding TT reduces NPS by allot

Post by hgm »

Note that if you want to read any recommendation in the posting that you quote, it was not to refrain from probing in QS, but use a separate, small (cache-fitting) table for that.

I admit that my PC is actually faster than I thought, for DRAM probing; it doesn't appear easy to get any info on the memory controller of Sandy Bridge, though. From combining info from various places I got the impression that it has two 64-bit data busses ('dual channel'), each running at 666MHz. That would make a cache-line fill (8 words) over one bus take 12ns. At the turbo-boost clock of 3.4GHz that would be 40.8 CPU cycles.

Now that I am back in town I could run the following test for simulating random DRAM probing in a 256MB table:

Code: Select all

#include <stdio.h>
#include <windows.h>

int a[1<<26];

int main()
{
  int i=1e9;
  int t = GetTickCount();
  int j=0, s=0;
  do {
    s += a[j];
    j + = 65382561;
    j &= 0x3ffffff;
  } while(--i);
  t = GetTickCount() - t;
  printf("%9d t = %d ms\n", s, t);  
}
This took 16.4 sec, translating to 56 CPU clocks, which is not too far from the calculated value, considering that apart from the 8 data words one also has to transfer addresses.

The result is still a bit suspect, as I did such tests before, and seem to recall the outcome then was 75 clocks. Furthermore, when I add a statement to store a modified a[j] (a[j]=s) in the loop, dirtying the cache, this has absolutely no impact on the speed. While for every cache miss the old cache content should now be written back to the DRAM, as well as the new one to be loaded, doubling the required bandwidth. Is this the effect of having two 'channels', that it can use one for reading and the other for writing at an unrelated address simultaneously? But why then doesn't it benefit from the two channels when only reading, by parallelizing two accesses?

Anyway, 56 clocks is still a lot; an i7 can execute more than 200 instructions in that time. On a single core. Having 4 cores probe through the same controller makes it 4 times slower still.
Iketh
Posts: 4
Joined: Fri Oct 28, 2022 6:33 am
Full name: Keith Downes

Re: adding TT reduces NPS by allot

Post by Iketh »

I started my engine from scratch 1.5 years ago. Tt probing in qs increases nps. Im surprised it is suggested to remove it. The time spent reading ram every node is less than the time spent sometimes evaluating duplicate positions.

That said, i do have a heavy eval.

Also, it's possible your node counter isnt firing for tt hits. I think standard design is count nodes found in tt. My engine was not counting tt nodes at first.
pgg106
Posts: 25
Joined: Wed Mar 09, 2022 3:40 pm
Full name: . .

Re: adding TT reduces NPS by allot

Post by pgg106 »

"standard" design is makemove call -> add a node, that applies to qsearch too
edit: by standard i mean what most engines that came out in the last decade do.
To anyone reading this post in the future, don't ask for help on talkchess, it's a dead site where you'll only get led astray, the few people talking sense here come from the Stockfish discord server, just join it and actual devs will help you.
User avatar
hgm
Posts: 28265
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: adding TT reduces NPS by allot

Post by hgm »

There can be of course engines that refrain from MakeMove when the TT probe would result in a hash cutoff. Comparing node counts of different engines is always tricky.
syzygy
Posts: 5661
Joined: Tue Feb 28, 2012 11:56 pm

Re: adding TT reduces NPS by allot

Post by syzygy »

connor_mcmonigle wrote: Sat Apr 20, 2024 4:19 am
syzygy wrote: Thu Apr 18, 2024 9:26 pm
connor_mcmonigle wrote: Mon Apr 15, 2024 12:57 am Sure. However, telling someone not to perform the TT probe in quiescent search to avoid the slowdown is just bad advice generally and, when stated from a position of authority, especially unhelpful. It's pretty established at this point that the cost of performing a TT probe in the quiescent search is worth it, though I'd encourage any new authors to test this themselves and re-test as the cost of their evaluation function increases relative to the cost of a TT probe.
Whether it is worth it will depend on the engine. For most engines (that were not derived from SF), it is most likely not worth it to probe in the qsearch. For an engine written from scratch, it will probably take 10+ years of active development to make probing in the qsearch worth it. In the meantime, just don't probe in the qsearch.
What's your basis for claiming that it will "probably take 10+ years of active development"?
I assume he continues with his engine as a hobby project. It would take over 10 years to surpass Crafty.
I agree that it will vary engines to engine, but the reality is that evaluation functions have become massively more expensive in modern engines
I'm done with the "modern engine" argument. A regular hobby engine is not a "modern engine".