Pondering and memory bandwidth

Discussion of chess software programming and technical issues.

Moderator: Ras

MattieShoes
Posts: 718
Joined: Fri Mar 20, 2009 8:59 pm

Pondering and memory bandwidth

Post by MattieShoes »

So I have a Core2 quad 2.5 GHz box sitting here. When I'm making an engine-engine match, I've left pondering off. It occurs to me that I could turn it on and the non-SMP engines wouldn't be stealing cycles from the other -- they'd be running on different cores. But I I have no real idea how hard the engine hits the memory. Obviously it'll vary from engine to engine, but I guess what I'm wondering is does a "generic single threaded engine" use enough memory bandwidth that pondering on one engine would significantly impact the search for the other? FSB should be 1.333 GHz in this case

It also occurred to me that I could run four multiple simultaneous matches with pondering off, but that'd put even more load on the bus... I'd probably have to add some cooling too -- I imagine 4x100% on the CPU for extended periods will let me fry an egg on it.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Pondering and memory bandwidth

Post by bob »

MattieShoes wrote:So I have a Core2 quad 2.5 GHz box sitting here. When I'm making an engine-engine match, I've left pondering off. It occurs to me that I could turn it on and the non-SMP engines wouldn't be stealing cycles from the other -- they'd be running on different cores. But I I have no real idea how hard the engine hits the memory. Obviously it'll vary from engine to engine, but I guess what I'm wondering is does a "generic single threaded engine" use enough memory bandwidth that pondering on one engine would significantly impact the search for the other? FSB should be 1.333 GHz in this case

It also occurred to me that I could run four multiple simultaneous matches with pondering off, but that'd put even more load on the bus... I'd probably have to add some cooling too -- I imagine 4x100% on the CPU for extended periods will let me fry an egg on it.
I test like that all the time and don't see any difference. Easy test is to run one program, on a particular test position, and see how long it takes. Then run the same program two times to use both cores and see if there is any difference. In the case of Crafty, it is minimal...
Gian-Carlo Pascutto
Posts: 1260
Joined: Sat Dec 13, 2008 7:00 pm

Re: Pondering and memory bandwidth

Post by Gian-Carlo Pascutto »

It's safe to assume that for the majority of engines, you will get one 64-byte cacheline transfer per node searched, and almost no (unpredictable/uncached) access to main memory besides that.

(Engines which don't probe in quiescent will be about 30%-40% of that)

This is far below the memory speed of most contemporary systems.
User avatar
hgm
Posts: 28433
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Pondering and memory bandwidth

Post by hgm »

For my engines, they were ony about 1% slower if a Chess game as in progress on the other core. In theory it is possible on C2D to totally wreck the performance of the other core by flushing the shared L2 cache as fast as you can during ponder time. This potentially gives you a much larger advatage than pondering. I don't know of any engines that implement such a 'bugger mode'. Perhaps I will try it one time in micro-Max, which does not ponder anyway. :lol:
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Pondering and memory bandwidth

Post by bob »

Gian-Carlo Pascutto wrote:It's safe to assume that for the majority of engines, you will get one 64-byte cacheline transfer per node searched, and almost no (unpredictable/uncached) access to main memory besides that.

(Engines which don't probe in quiescent will be about 30%-40% of that)

This is far below the memory speed of most contemporary systems.
There are some other issues, such as the magic move generation tables, that will stress most any cache size prior to the nehalem boxes with 8mb L3 and beyond...
CRoberson
Posts: 2094
Joined: Mon Mar 13, 2006 2:31 am
Location: North Carolina, USA

Re: Pondering and memory bandwidth

Post by CRoberson »

You can test this quite simply.

1) Run engine A on some benchmark or position for 30 to 60 seconds.
Note the nps, depth and total nodes. (this should be a single threaded
program).

2) exit out of engine A.
3) Run the test again but with 2 copies of engine A simultaneously.
Again, note the same info.

4) exit out of both engines.
5) Run the test again but with 3 copies of engine A simultaneously.
Again, note the same info.

6) exit out of both programs.
7) Do the test again with 4 copies of engine A and note the same info.

Try an engine that is fairly good. Don't allow the sum of 4 engines
to overload your memory which would create a swapping scenario.

If the notes you took are the same for 1 run vs 4 runs, there are
not any problems. Years ago, I noticed some machines having issues
at 4 engines, but not at 3 engines. In such situations, 2 engines
are not a problem.
Gian-Carlo Pascutto
Posts: 1260
Joined: Sat Dec 13, 2008 7:00 pm

Re: Pondering and memory bandwidth

Post by Gian-Carlo Pascutto »

bob wrote: There are some other issues, such as the magic move generation tables, that will stress most any cache size prior to the nehalem boxes with 8mb L3 and beyond...
In real games only a small part of those tables is active at any given time. Rank/File occupation is not random.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Pondering and memory bandwidth

Post by bob »

Gian-Carlo Pascutto wrote:
bob wrote: There are some other issues, such as the magic move generation tables, that will stress most any cache size prior to the nehalem boxes with 8mb L3 and beyond...
In real games only a small part of those tables is active at any given time. Rank/File occupation is not random.
It's not random, but in a 20 ply search, it comes pretty close. It would be pretty easy to set up a bitmap for those tables, and during the indexing, set a bit in the bitmap to show which value is used. I'd bet the bitmap ends up almost all 1's after a 60 second move...
Gian-Carlo Pascutto
Posts: 1260
Joined: Sat Dec 13, 2008 7:00 pm

Re: Pondering and memory bandwidth

Post by Gian-Carlo Pascutto »

bob wrote:
Gian-Carlo Pascutto wrote: In real games only a small part of those tables is active at any given time. Rank/File occupation is not random.
It's not random, but in a 20 ply search, it comes pretty close. It would be pretty easy to set up a bitmap for those tables, and during the indexing, set a bit in the bitmap to show which value is used. I'd bet the bitmap ends up almost all 1's after a 60 second move...
After 60 seconds, perhaps, but that doesn't say a lot about the effectiveness of the cache. 99% of the accesses could have been local and 1% random all over. You will not stress memory then. You can profile this with valgrind, I think. I am sure that you will find that even 1M of cache will give very high hitrates.
Gian-Carlo Pascutto
Posts: 1260
Joined: Sat Dec 13, 2008 7:00 pm

Re: Pondering and memory bandwidth

Post by Gian-Carlo Pascutto »

Gian-Carlo Pascutto wrote:I am sure that you will find that even 1M of cache will give very high hitrates.
Put differently: do you really believe that your move generation is hitting main memory each node, or even regularly? That would kill performance pretty badly, and it would show up clearly in profiling.