Pondering and memory bandwidth

Discussion of chess software programming and technical issues.

Moderator: Ras

MattieShoes
Posts: 718
Joined: Fri Mar 20, 2009 8:59 pm

Re: Pondering and memory bandwidth

Post by MattieShoes »

Thanks for the help. :-) Strange, it never even occurred to me that I could test this in a minute or two without bothering y'all :-P

I started four engines so all memory was allocated but they were idle. then i threw the first one into analyze mode, waited 30 seconds, threw the second into analyze mode, waited 30 seconds, etc.

First three are essentially identical -- the second one actually ran marginally faster than the first. memory alignment, OS overhead, I don't know. Fourth one is about 2-3% slower.

After the first four, I figured I should test a 5th just to verify that it would in fact suck to run 5 engines simultaneously on a 4 core box. It came out about 87% slower than the average of the first four.

And because my life isn't complete if I don't graph everything I can get my hands on:
Image[/img]
wgarvin
Posts: 838
Joined: Thu Jul 05, 2007 5:03 pm
Location: British Columbia, Canada

Re: Pondering and memory bandwidth

Post by wgarvin »

Gian-Carlo Pascutto wrote:
bob wrote: There are some other issues, such as the magic move generation tables, that will stress most any cache size prior to the nehalem boxes with 8mb L3 and beyond...
In real games only a small part of those tables is active at any given time. Rank/File occupation is not random.
Are you sure about that? The magic tables are used to answer questions about all of the positions encountered in the search tree, not just the over-the-board position. I would kind of expect a large fraction of the cachelines making up the table to be touched during any given search?
Gian-Carlo Pascutto
Posts: 1260
Joined: Sat Dec 13, 2008 7:00 pm

Re: Pondering and memory bandwidth

Post by Gian-Carlo Pascutto »

wgarvin wrote:
Gian-Carlo Pascutto wrote:
bob wrote: There are some other issues, such as the magic move generation tables, that will stress most any cache size prior to the nehalem boxes with 8mb L3 and beyond...
In real games only a small part of those tables is active at any given time. Rank/File occupation is not random.
Are you sure about that? The magic tables are used to answer questions about all of the positions encountered in the search tree, not just the over-the-board position. I would kind of expect a large fraction of the cachelines making up the table to be touched during any given search?
You're repeating what Bob already said and what I rebutted above.

Even if you touch the entire table, as long as temporal locality is good enough for the cache to be effective, you're not hitting main memory a lot.

If you'd be hitting main memory a lot, you'd be slow, and magic movegen wouldn't be interesting.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Pondering and memory bandwidth

Post by bob »

Gian-Carlo Pascutto wrote:
Gian-Carlo Pascutto wrote:I am sure that you will find that even 1M of cache will give very high hitrates.
Put differently: do you really believe that your move generation is hitting main memory each node, or even regularly? That would kill performance pretty badly, and it would show up clearly in profiling.
I certainly believe it hits it every node. Not for every piece, but with 2 bishops, 2 rooks and a queen, that turns into 6 magic move generations, and I'd bet at least one of those goes to memory unless you are talking about the Nehalems with 8mb and beyond of L3. Every node goes to the hash table, which means something is going to get displaced. Ditto for pawn hash. I'd agree that the hitrates are quite high, else the processors would be very slow. But at least 1/2 the hits are instructions, if not more.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Pondering and memory bandwidth

Post by bob »

Gian-Carlo Pascutto wrote:
wgarvin wrote:
Gian-Carlo Pascutto wrote:
bob wrote: There are some other issues, such as the magic move generation tables, that will stress most any cache size prior to the nehalem boxes with 8mb L3 and beyond...
In real games only a small part of those tables is active at any given time. Rank/File occupation is not random.
Are you sure about that? The magic tables are used to answer questions about all of the positions encountered in the search tree, not just the over-the-board position. I would kind of expect a large fraction of the cachelines making up the table to be touched during any given search?
You're repeating what Bob already said and what I rebutted above.

Even if you touch the entire table, as long as temporal locality is good enough for the cache to be effective, you're not hitting main memory a lot.

If you'd be hitting main memory a lot, you'd be slow, and magic movegen wouldn't be interesting.
I'm not following your "rebuttal". I take the occupied squares and with a mask to extract just the diagonal occupied bits, multiply by a magic number that is then shifted a number of bits. This value then is used as an index to another big table that contains the resulting moves. Somewhere in that mess, it seems likely that a cache miss will happen somewhere in all of that, when you repeat it 6 times for every node searched. Not to mention the times the above is used for things besides move generation (mobility calculations, for one thing. So that is probably off by a factor of 2.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Pondering and memory bandwidth

Post by bob »

MattieShoes wrote:Thanks for the help. :-) Strange, it never even occurred to me that I could test this in a minute or two without bothering y'all :-P

I started four engines so all memory was allocated but they were idle. then i threw the first one into analyze mode, waited 30 seconds, threw the second into analyze mode, waited 30 seconds, etc.

First three are essentially identical -- the second one actually ran marginally faster than the first. memory alignment, OS overhead, I don't know. Fourth one is about 2-3% slower.

After the first four, I figured I should test a 5th just to verify that it would in fact suck to run 5 engines simultaneously on a 4 core box. It came out about 87% slower than the average of the first four.

And because my life isn't complete if I don't graph everything I can get my hands on:
Image[/img]
Sounds about right. Note that this isn't a perfect test, because depending on the processor, you have some local cache and some shared cache. On the Nehalem I was testing on, each core had 32k x 32k L1, 256k L2, and an 8mb L3 shared between the 4 cores. Running the same program 4 times will cause the instructions and non-modified data to be shared among all cores with just one copy in the shared cache, where running four different programs would change this a bit. So you should probably re-run the test but try 4 different programs to see how much that hurts. Hopefully not much.
Gian-Carlo Pascutto
Posts: 1260
Joined: Sat Dec 13, 2008 7:00 pm

Re: Pondering and memory bandwidth

Post by Gian-Carlo Pascutto »

Bob: Can you do a test with cachegrind?

You can let it simulate various cache configurations. L1, L2, associativity...
MattieShoes
Posts: 718
Joined: Fri Mar 20, 2009 8:59 pm

Re: Pondering and memory bandwidth

Post by MattieShoes »

That makes sense, I'll have to try some more tests later. For now, I don't really care if there's a performance hit as long as it's a consistent one. I've only been running two matches at a time anyway, without pondering.

It's the lower end yorkfield with 6 meg L2 cache. I know next to nothing about low level processor stuff, like what gets shoved into cache and what doesn't, branch prediction... It's on my to-learn list, along with statistical analysis, LMR, matrix math, how EGTB's work, how the sensors on gas pumps work, how to make honey walnut shrimp, python, how to integrate results from a PN and A/B search effectively, etc. . .

So another simple question I can't find an answer to... Bayeselo has the rating command, then it lists + and -. I assume this is a confidence interval based on the bayesian analysis (also on the to-learn list) but I don't know exactly what the confidence interval is (is it 95% here too? one-tailed, two tailed?), and I can't find it documented anywhere. No doubt this is because the people that actually care already know... :-)
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Pondering and memory bandwidth

Post by bob »

Gian-Carlo Pascutto wrote:Bob: Can you do a test with cachegrind?

You can let it simulate various cache configurations. L1, L2, associativity...
I'll have to find it and download again, I have just upgraded all my linux boxes and the old versions don't run due to new glibc and .so lib versions...
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Pondering and memory bandwidth

Post by bob »

MattieShoes wrote:That makes sense, I'll have to try some more tests later. For now, I don't really care if there's a performance hit as long as it's a consistent one. I've only been running two matches at a time anyway, without pondering.

It's the lower end yorkfield with 6 meg L2 cache. I know next to nothing about low level processor stuff, like what gets shoved into cache and what doesn't, branch prediction... It's on my to-learn list, along with statistical analysis, LMR, matrix math, how EGTB's work, how the sensors on gas pumps work, how to make honey walnut shrimp, python, how to integrate results from a PN and A/B search effectively, etc. . .

So another simple question I can't find an answer to... Bayeselo has the rating command, then it lists + and -. I assume this is a confidence interval based on the bayesian analysis (also on the to-learn list) but I don't know exactly what the confidence interval is (is it 95% here too? one-tailed, two tailed?), and I can't find it documented anywhere. No doubt this is because the people that actually care already know... :-)
I believe that when Remi joined a discussion about this last year, this is a two-tailed test, 95% confidence interval. Using 1 SD really gives too wide a range to be useful for measuring small changes, and 2 SD can be problematic if the change is very small..