So I have a Core2 quad 2.5 GHz box sitting here. When I'm making an engine-engine match, I've left pondering off. It occurs to me that I could turn it on and the non-SMP engines wouldn't be stealing cycles from the other -- they'd be running on different cores. But I I have no real idea how hard the engine hits the memory. Obviously it'll vary from engine to engine, but I guess what I'm wondering is does a "generic single threaded engine" use enough memory bandwidth that pondering on one engine would significantly impact the search for the other? FSB should be 1.333 GHz in this case
It also occurred to me that I could run four multiple simultaneous matches with pondering off, but that'd put even more load on the bus... I'd probably have to add some cooling too -- I imagine 4x100% on the CPU for extended periods will let me fry an egg on it.
Pondering and memory bandwidth
Moderator: Ras
-
bob
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Pondering and memory bandwidth
I test like that all the time and don't see any difference. Easy test is to run one program, on a particular test position, and see how long it takes. Then run the same program two times to use both cores and see if there is any difference. In the case of Crafty, it is minimal...MattieShoes wrote:So I have a Core2 quad 2.5 GHz box sitting here. When I'm making an engine-engine match, I've left pondering off. It occurs to me that I could turn it on and the non-SMP engines wouldn't be stealing cycles from the other -- they'd be running on different cores. But I I have no real idea how hard the engine hits the memory. Obviously it'll vary from engine to engine, but I guess what I'm wondering is does a "generic single threaded engine" use enough memory bandwidth that pondering on one engine would significantly impact the search for the other? FSB should be 1.333 GHz in this case
It also occurred to me that I could run four multiple simultaneous matches with pondering off, but that'd put even more load on the bus... I'd probably have to add some cooling too -- I imagine 4x100% on the CPU for extended periods will let me fry an egg on it.
-
Gian-Carlo Pascutto
- Posts: 1260
- Joined: Sat Dec 13, 2008 7:00 pm
Re: Pondering and memory bandwidth
It's safe to assume that for the majority of engines, you will get one 64-byte cacheline transfer per node searched, and almost no (unpredictable/uncached) access to main memory besides that.
(Engines which don't probe in quiescent will be about 30%-40% of that)
This is far below the memory speed of most contemporary systems.
(Engines which don't probe in quiescent will be about 30%-40% of that)
This is far below the memory speed of most contemporary systems.
-
hgm
- Posts: 28433
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Pondering and memory bandwidth
For my engines, they were ony about 1% slower if a Chess game as in progress on the other core. In theory it is possible on C2D to totally wreck the performance of the other core by flushing the shared L2 cache as fast as you can during ponder time. This potentially gives you a much larger advatage than pondering. I don't know of any engines that implement such a 'bugger mode'. Perhaps I will try it one time in micro-Max, which does not ponder anyway. 
-
bob
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Pondering and memory bandwidth
There are some other issues, such as the magic move generation tables, that will stress most any cache size prior to the nehalem boxes with 8mb L3 and beyond...Gian-Carlo Pascutto wrote:It's safe to assume that for the majority of engines, you will get one 64-byte cacheline transfer per node searched, and almost no (unpredictable/uncached) access to main memory besides that.
(Engines which don't probe in quiescent will be about 30%-40% of that)
This is far below the memory speed of most contemporary systems.
-
CRoberson
- Posts: 2094
- Joined: Mon Mar 13, 2006 2:31 am
- Location: North Carolina, USA
Re: Pondering and memory bandwidth
You can test this quite simply.
1) Run engine A on some benchmark or position for 30 to 60 seconds.
Note the nps, depth and total nodes. (this should be a single threaded
program).
2) exit out of engine A.
3) Run the test again but with 2 copies of engine A simultaneously.
Again, note the same info.
4) exit out of both engines.
5) Run the test again but with 3 copies of engine A simultaneously.
Again, note the same info.
6) exit out of both programs.
7) Do the test again with 4 copies of engine A and note the same info.
Try an engine that is fairly good. Don't allow the sum of 4 engines
to overload your memory which would create a swapping scenario.
If the notes you took are the same for 1 run vs 4 runs, there are
not any problems. Years ago, I noticed some machines having issues
at 4 engines, but not at 3 engines. In such situations, 2 engines
are not a problem.
1) Run engine A on some benchmark or position for 30 to 60 seconds.
Note the nps, depth and total nodes. (this should be a single threaded
program).
2) exit out of engine A.
3) Run the test again but with 2 copies of engine A simultaneously.
Again, note the same info.
4) exit out of both engines.
5) Run the test again but with 3 copies of engine A simultaneously.
Again, note the same info.
6) exit out of both programs.
7) Do the test again with 4 copies of engine A and note the same info.
Try an engine that is fairly good. Don't allow the sum of 4 engines
to overload your memory which would create a swapping scenario.
If the notes you took are the same for 1 run vs 4 runs, there are
not any problems. Years ago, I noticed some machines having issues
at 4 engines, but not at 3 engines. In such situations, 2 engines
are not a problem.
-
Gian-Carlo Pascutto
- Posts: 1260
- Joined: Sat Dec 13, 2008 7:00 pm
Re: Pondering and memory bandwidth
In real games only a small part of those tables is active at any given time. Rank/File occupation is not random.bob wrote: There are some other issues, such as the magic move generation tables, that will stress most any cache size prior to the nehalem boxes with 8mb L3 and beyond...
-
bob
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Pondering and memory bandwidth
It's not random, but in a 20 ply search, it comes pretty close. It would be pretty easy to set up a bitmap for those tables, and during the indexing, set a bit in the bitmap to show which value is used. I'd bet the bitmap ends up almost all 1's after a 60 second move...Gian-Carlo Pascutto wrote:In real games only a small part of those tables is active at any given time. Rank/File occupation is not random.bob wrote: There are some other issues, such as the magic move generation tables, that will stress most any cache size prior to the nehalem boxes with 8mb L3 and beyond...
-
Gian-Carlo Pascutto
- Posts: 1260
- Joined: Sat Dec 13, 2008 7:00 pm
Re: Pondering and memory bandwidth
After 60 seconds, perhaps, but that doesn't say a lot about the effectiveness of the cache. 99% of the accesses could have been local and 1% random all over. You will not stress memory then. You can profile this with valgrind, I think. I am sure that you will find that even 1M of cache will give very high hitrates.bob wrote:It's not random, but in a 20 ply search, it comes pretty close. It would be pretty easy to set up a bitmap for those tables, and during the indexing, set a bit in the bitmap to show which value is used. I'd bet the bitmap ends up almost all 1's after a 60 second move...Gian-Carlo Pascutto wrote: In real games only a small part of those tables is active at any given time. Rank/File occupation is not random.
-
Gian-Carlo Pascutto
- Posts: 1260
- Joined: Sat Dec 13, 2008 7:00 pm
Re: Pondering and memory bandwidth
Put differently: do you really believe that your move generation is hitting main memory each node, or even regularly? That would kill performance pretty badly, and it would show up clearly in profiling.Gian-Carlo Pascutto wrote:I am sure that you will find that even 1M of cache will give very high hitrates.