hardware

Don · Post by **Don** » Sat Sep 14, 2013 2:13 pm

mwyoung wrote: I am not a chess programmer, but it seems to me it is not about who has more cores, but do you have enough cores to run your tests and ideas.

Even if we had 1000 cores it wouldn't be enough. We have more ideas than we can test reasonably and in fact we have to take shortcuts. Once shortcut we take is to reject an idea that doesn't immediately start hot. For example if we are down 5 ELO after 1000 games and have 10 more ideas in the queue we may simply move on to the next test and take a big chance on rejecting a good idea. But 5 ELO is way under the error margin and the idea still has an excellent chance to succeed.

Is Mr. Mansoor planning to hire also people and programmers to help with ideas for Houdini?

Will a two man team need this that much CPU power that Stockfish uses to test?

Stockfish:
Active - 34 machines 191 cores 1.29M nps (245.80M total nps) 346 games/minute

Testers 157
Developers 21
Active testers 22
Tests submitted 1454
Games played 26875732
CPU time 43.02 years

Uri Blass · Post by **Uri Blass** » Sat Sep 14, 2013 2:41 pm

Don wrote:
mwyoung wrote: I am not a chess programmer, but it seems to me it is not about who has more cores, but do you have enough cores to run your tests and ideas.

Even if we had 1000 cores it wouldn't be enough. We have more ideas than we can test reasonably and in fact we have to take shortcuts. Once shortcut we take is to reject an idea that doesn't immediately start hot. For example if we are down 5 ELO after 1000 games and have 10 more ideas in the queue we may simply move on to the next test and take a big chance on rejecting a good idea. But 5 ELO is way under the error margin and the idea still has an excellent chance to succeed.

Is Mr. Mansoor planning to hire also people and programmers to help with ideas for Houdini?

Will a two man team need this that much CPU power that Stockfish uses to test?

Stockfish:
Active - 34 machines 191 cores 1.29M nps (245.80M total nps) 346 games/minute

Testers 157
Developers 21
Active testers 22
Tests submitted 1454
Games played 26875732
CPU time 43.02 years

My opinion is that you are too fast to take shortcuts
I guess that you can probably get more progress
if you simply use SPRT like the stockfish team.

The demage from rejecting an idea after a small number of games is not only the elo loss from not accepting the idea in case that it is a good idea but the fact that you have less knowledge in order to suggest good ideas for the future and you practically do not know if the idea is good or bad.

I believe that knowing that the idea is good or knowing that the idea is bad can help you to suggest better ideas in the future.

The number of ideas is not what is important but the quality of the ideas.

Don · Post by **Don** » Sat Sep 14, 2013 3:15 pm

Uri Blass wrote: I am not a chess programmer, but it seems to me it is not about who has more cores, but do you have enough cores to run your tests and ideas.

Even if we had 1000 cores it wouldn't be enough. We have more ideas than we can test reasonably and in fact we have to take shortcuts. Once shortcut we take is to reject an idea that doesn't immediately start hot. For example if we are down 5 ELO after 1000 games and have 10 more ideas in the queue we may simply move on to the next test and take a big chance on rejecting a good idea. But 5 ELO is way under the error margin and the idea still has an excellent chance to succeed.

My opinion is that you are too fast to take shortcuts
I guess that you can probably get more progress
if you simply use SPRT like the stockfish team.

We are working on something like SRPT now but it will STILL have the characteristic that it's easier to reject than accept. Any test that accepts changes too readily will cause your program to regress over time and running long tests that have little chance of succeeding just wastes CPU power which is at a premium for us.

Generally, if your test accepts very many regressions due to sample error, your progress will be based on what percentage of improvements you actually test more than the actual results. For example if 1 out 10 things are actually good but the false positive rate is not super low, you will begin accepting many of those other 9 regressions. It's a serious problem unless you are testing BIG improvements or BIG regressions.

It's not currently a big problem for Stockfish because they are still catching up - most of the changes in the log are just re-turning something that should have been done long ago and they are getting big enough gains from these things not to suffer the regression problem.

The demage from rejecting an idea after a small number of games is not only the elo loss from not accepting the idea in case that it is a good idea but the fact that you have less knowledge in order to suggest good ideas for the future and you practically do not know if the idea is good or bad.

We might reject a test, but not an idea. We are not stupid about these things. We consider most of the ideas that get rejected as only unproved and will often even revisit them later. In fact we think the proper testing procedure is start with a simple and fast test which rejects a change that has little chance of success, but will allow the test to continue to a more stringent test even if it is somewhat negative and that concept will be built into our new testing infrastructure which I have wanted to do for years now.

The Stockfish project has inspired me to get this working.

I believe that knowing that the idea is good or knowing that the idea is bad can help you to suggest better ideas in the future.

The number of ideas is not what is important but the quality of the ideas.

We agree on this and we think we might be able to keep up even with a major hardware disadvantage because of this. But it still is a major disadvantage not having the hardware. As Larry pointed out we might be able to keep up with a serious disadvantage but probably not a 10 to 1 or more.

kranium · Post by **kranium** » Sat Sep 14, 2013 3:28 pm

i agree with Uri and all of the above...

unfortunately one never knows beforehand if an idea is good or bad,
the only way to be sure is to test it,

because sometimes the most 'incongruous' idea can prove beneficial, and what seemed to be a very sound 'logical' idea can fail miserably.

Marco C. made a great decision concerning idea criteria for the Stockfish distributed testing queue...there is none!

"This was another decision that was made: not to pre-filter / veto any patch that is queued for testing. And it worked."
http://talkchess.com/forum/viewtopic.ph ... ht=#533633

i have run many hundreds of 20000 game tests on my 6 core system during the last year
and during these tests, i've seen results fluctuate significantly, for ex: +-26 after 1000 games, but end at +-0 ELO after 20,000 games (and vice versa)
therefore i think it's a good idea to let them all run to 100%, in an effort to eliminate human error

good luck Don and Larry,
we'll all be looking forward to your next release

jdart · Post by **jdart** » Sat Sep 14, 2013 3:36 pm

I had similar requirements and went with AMD. There is no question, the performance per core is much worse than Intel. On the other hand, you can put two 16-core AMD chips on a server board .. I believe they even support quad-socket boards, although those are costly. If you are only going to use it for testing then IMO you want the maximum core count. If you are thinking of using it for anything else, like tournament play or analysis, where you are actually using all cores on one search, then I'd go with Intel.

--Jon

Werewolf · Post by **Werewolf** » Sat Sep 14, 2013 3:53 pm

Don / Larry,
Just one small point on the hardware:
Your current box is dual 2687w which is 16 cores at 3.1 GHz base and 3.4 GHz turbo.
The new E5-2697 v2 will be 24 cores at 2.7 GHz base and 3 GHz turbo.

So it's not 3.4 Ghz vs 2.7 Ghz as some are saying.

Secondly, with the intel CPUs if you run an engine on a single thread I think it runs at 0.6x speed with HT ON.
This equals a 20% speed gain for your testing since you could run 48 engines at (0.6 x 3.0 = 1.8 GHz).

Carl

mwyoung · Post by **mwyoung** » Sat Sep 14, 2013 3:55 pm

Don wrote:
Uri Blass wrote: I am not a chess programmer, but it seems to me it is not about who has more cores, but do you have enough cores to run your tests and ideas.

Even if we had 1000 cores it wouldn't be enough. We have more ideas than we can test reasonably and in fact we have to take shortcuts. Once shortcut we take is to reject an idea that doesn't immediately start hot. For example if we are down 5 ELO after 1000 games and have 10 more ideas in the queue we may simply move on to the next test and take a big chance on rejecting a good idea. But 5 ELO is way under the error margin and the idea still has an excellent chance to succeed.

My opinion is that you are too fast to take shortcuts
I guess that you can probably get more progress
if you simply use SPRT like the stockfish team.

We are working on something like SRPT now but it will STILL have the characteristic that it's easier to reject than accept. Any test that accepts changes too readily will cause your program to regress over time and running long tests that have little chance of succeeding just wastes CPU power which is at a premium for us.

Generally, if your test accepts very many regressions due to sample error, your progress will be based on what percentage of improvements you actually test more than the actual results. For example if 1 out 10 things are actually good but the false positive rate is not super low, you will begin accepting many of those other 9 regressions. It's a serious problem unless you are testing BIG improvements or BIG regressions.

It's not currently a big problem for Stockfish because they are still catching up - most of the changes in the log are just re-turning something that should have been done long ago and they are getting big enough gains from these things not to suffer the regression problem.

The demage from rejecting an idea after a small number of games is not only the elo loss from not accepting the idea in case that it is a good idea but the fact that you have less knowledge in order to suggest good ideas for the future and you practically do not know if the idea is good or bad.

We might reject a test, but not an idea. We are not stupid about these things. We consider most of the ideas that get rejected as only unproved and will often even revisit them later. In fact we think the proper testing procedure is start with a simple and fast test which rejects a change that has little chance of success, but will allow the test to continue to a more stringent test even if it is somewhat negative and that concept will be built into our new testing infrastructure which I have wanted to do for years now.

The Stockfish project has inspired me to get this working.

I believe that knowing that the idea is good or knowing that the idea is bad can help you to suggest better ideas in the future.

The number of ideas is not what is important but the quality of the ideas.

We agree on this and we think we might be able to keep up even with a major hardware disadvantage because of this. But it still is a major disadvantage not having the hardware. As Larry pointed out we might be able to keep up with a serious disadvantage but probably not a 10 to 1 or more.

I agree you should be able to do well with less. No you will not have the luxury of throwing the spaghetti against the wall in your ideas to see what sticks, because you will not have the testing CPU power of Stockfish.

And yes it is also about the kind and quality of the ideas, you will have to be selective.

Innovation is not about copying someone's ideas, but taking that idea to the next level.

I wish you success.

Don · Post by **Don** » Sat Sep 14, 2013 4:53 pm

kranium wrote:i agree with Uri and all of the above...

unfortunately one never knows beforehand if an idea is good or bad,
the only way to be sure is to test it,

because sometimes the most 'incongruous' idea can prove beneficial, and what seemed to be a very sound 'logical' idea can fail miserably.

Marco C. made a great decision concerning idea criteria for the Stockfish distributed testing queue...there is none!

"This was another decision that was made: not to pre-filter / veto any patch that is queued for testing. And it worked."
http://talkchess.com/forum/viewtopic.ph ... ht=#533633

What are you comparing this too in order to say it works?

What you have is a massive testing infrastructure and it's going to work unless it's done in a totally incompetent way.

Larry and I avoid testing changes that we know are stupid and that "works" too. So where is your evidence that this is somehow the magic formula?

No disrespect to Marco intended here but in my opinion the post you are talking about was mostly a lot of philosophical double-talk. Marco is a "packager" in that he wrapped everything up in pretty language as if it were some kind of religion when in fact it's simply a powerful testing infrastructure. I tuned out when he went "right brained" on me here. Some people need that, but I don't.

The most relevant part of his post is a real gem however and it's this:

the tests undergo a very strict and demanding process so that we are quite sure that whatever we commit is good. If we have some doubts we don't.

This is exactly what all the successful programs are doing, some better than others but this captures the important working principles of testing. All the rest is debatable.

I will say this about the testing any idea no matter how stupid. In an open setting such as they use, it may be necessary, if nothing else to avoid discouraging volunteers or having to have a "boss" who decides what is worthy of testing or not. But wrapping it up as a virtue or claiming that is the key ... I just don't buy it.

i have run many hundreds of 20000 game tests on my 6 core system during the last year
and during these tests, i've seen results fluctuate significantly, for ex: +-26 after 1000 games, but end at +-0 ELO after 20,000 games (and vice versa)
therefore i think it's a good idea to let them all run to 100%, in an effort to eliminate human error

good luck Don and Larry,
we'll all be looking forward to your next release

Thanks, we are doing our best!

Don · Post by **Don** » Sat Sep 14, 2013 5:10 pm

mwyoung wrote:
Don wrote:
Uri Blass wrote: I am not a chess programmer, but it seems to me it is not about who has more cores, but do you have enough cores to run your tests and ideas.

Even if we had 1000 cores it wouldn't be enough. We have more ideas than we can test reasonably and in fact we have to take shortcuts. Once shortcut we take is to reject an idea that doesn't immediately start hot. For example if we are down 5 ELO after 1000 games and have 10 more ideas in the queue we may simply move on to the next test and take a big chance on rejecting a good idea. But 5 ELO is way under the error margin and the idea still has an excellent chance to succeed.

My opinion is that you are too fast to take shortcuts
I guess that you can probably get more progress
if you simply use SPRT like the stockfish team.

We are working on something like SRPT now but it will STILL have the characteristic that it's easier to reject than accept. Any test that accepts changes too readily will cause your program to regress over time and running long tests that have little chance of succeeding just wastes CPU power which is at a premium for us.

Generally, if your test accepts very many regressions due to sample error, your progress will be based on what percentage of improvements you actually test more than the actual results. For example if 1 out 10 things are actually good but the false positive rate is not super low, you will begin accepting many of those other 9 regressions. It's a serious problem unless you are testing BIG improvements or BIG regressions.

It's not currently a big problem for Stockfish because they are still catching up - most of the changes in the log are just re-turning something that should have been done long ago and they are getting big enough gains from these things not to suffer the regression problem.

The demage from rejecting an idea after a small number of games is not only the elo loss from not accepting the idea in case that it is a good idea but the fact that you have less knowledge in order to suggest good ideas for the future and you practically do not know if the idea is good or bad.

We might reject a test, but not an idea. We are not stupid about these things. We consider most of the ideas that get rejected as only unproved and will often even revisit them later. In fact we think the proper testing procedure is start with a simple and fast test which rejects a change that has little chance of success, but will allow the test to continue to a more stringent test even if it is somewhat negative and that concept will be built into our new testing infrastructure which I have wanted to do for years now.

The Stockfish project has inspired me to get this working.

I believe that knowing that the idea is good or knowing that the idea is bad can help you to suggest better ideas in the future.

The number of ideas is not what is important but the quality of the ideas.

We agree on this and we think we might be able to keep up even with a major hardware disadvantage because of this. But it still is a major disadvantage not having the hardware. As Larry pointed out we might be able to keep up with a serious disadvantage but probably not a 10 to 1 or more.

I agree you should be able to do well with less. No you will not have the luxury of throwing the spaghetti against the wall in your ideas to see what sticks, because you will not have the testing CPU power of Stockfish.

Actually, I probably spoke wrong here. It's easy to assume that if you had 100x more power you could do anything with impunity, but the reality is probably that whatever works with little hardware is going to work with bigger hardware too. Could we now test a lot of really stupid ideas that have only the slightest chance of working? Wouldn't that still be a waste of resources? I agree, throwing more spaghetti on the wall is not necessarily the best approach and in fact it's not scalable. And that would apply even with a lot of hardware.

There is an expression that goes, "spending like a drunken sailor on payday" so the temptation is to be stupid with the additional hardware, thereby wasting most of it on frivolous things just because you have it. What would be ideal is to make the ideas become the bottleneck instead of the hardware. We could push a lot more idea's through the queue, including lesser ideas that we might not test now but we would also be able to do more thorough scalability tests - some of our ideas appear to work at hyper blitz but not at longer time controls and testing those at longer time controls would pretty much KILL our idea throughput so I do think expanding the long tests would be a wise use of resources.

You cannot have enough hardware for tuning parameters though. But care would have to be taken not to be lazy just because you have the hardware to throw at the problem.

A general rule of thumb: You never have enough. The richest men in the world are working overtime to get richer - they don't have enough and will never feel that they do. It's the same with CPU power.

It's funny to think about what I had when I developed Rexchess - PC-XT machines at something like 3 mghz - I could not have dreamed about having what I have now back then, but it's still not nearly enough.

And yes it is also about the kind and quality of the ideas, you will have to be selective.

Innovation is not about copying someone's ideas, but taking that idea to the next level.

I wish you success.

kgburcham · Post by **kgburcham** » Sat Sep 14, 2013 5:55 pm

Your current box is dual 2687w which is 16 cores at 3.1 GHz base and 3.4 GHz turbo.
The new E5-2697 v2 will be 24 cores at 2.7 GHz base and 3 GHz turbo.

Carl, so with HT off we have
dual 2687w v2 16 cores at 3.4 for 54.4gig
dual 2697w v2 24 cores at 3 gig for 72gig

I wonder how much faster the dual 2697 v2 will be using a chess program compared to the dual 2687w v2?

thanks
kgburcham

hardware

Re: hardware

Re: hardware

Re: hardware

Re: hardware

Re: hardware

Re: hardware

Re: hardware

Re: hardware

Re: hardware

Re: hardware