Beta for Stockfish distributed testing

lucasart · Post by **lucasart** » Tue Mar 26, 2013 12:23 am

From a technical point of view, the current testing is better than before because:
- much more games can be played. before it was only marco, and testing was a real bottleneck. it's still the bottleneck, but less so now.
- patch validation is done in two steps: 16,000 games in 15+0.05 and 60,000 games in 60+0.05. Different machines are dynamically allocated to different tests, so that means a patch will get validated by a blend of machines. we are less likely to validate a patch just because it works well on one machine.

From a social point of view, I think this experiment is a breakthrough, at least in the computer chess world.
- anyone can help spend his CPU time for the Stockfish project. and because there's a page with all the users and the number of games they played, it's like a scoreboard: it entices a healthy competition between testers (well, that's my hope).
- anyone can see the patches being tested, and push their own patches to the repo. this also creates a competitive environment for developpers, who want to make some good patches and see them get validated on the fishtest board.
- by being open this way, I hope the project will attract more talented people to join. It would be great if Richard Vida could join the party.

Edmund · Post by **Edmund** » Tue Mar 26, 2013 1:12 am

lucasart wrote:From a technical point of view, the current testing is better than before because:
- much more games can be played. before it was only marco, and testing was a real bottleneck. it's still the bottleneck, but less so now.
- patch validation is done in two steps: 16,000 games in 15+0.05 and 60,000 games in 60+0.05. Different machines are dynamically allocated to different tests, so that means a patch will get validated by a blend of machines. we are less likely to validate a patch just because it works well on one machine.

From a social point of view, I think this experiment is a breakthrough, at least in the computer chess world.
- anyone can help spend his CPU time for the Stockfish project. and because there's a page with all the users and the number of games they played, it's like a scoreboard: it entices a healthy competition between testers (well, that's my hope).
- anyone can see the patches being tested, and push their own patches to the repo. this also creates a competitive environment for developpers, who want to make some good patches and see them get validated on the fishtest board.
- by being open this way, I hope the project will attract more talented people to join. It would be great if Richard Vida could join the party.

Just an idea that came to my mind when reading your list. Would it be possible to add a ranking-list: per person the number of supplied patches & number of accepted patches & accepted/supplied ratio & average elo gain/loss per supplied patch & average/total elo gain of accepted patches ?

Michel · Post by **Michel** » Tue Mar 26, 2013 2:20 am

From a social point of view, I think this experiment is a breakthrough, at least in the computer chess world.
- anyone can help spend his CPU time for the Stockfish project. and because there's a page with all the users and the number of games they played, it's like a scoreboard: it entices a healthy competition between testers (well, that's my hope).

A small social bottleneck it seems to me is that it is non-trivial to install the client since it depends on some advanced python packages. I tried to do it on CentOS 6 (a widely used server OS) and had some problems. I am sure I can solve these (and eventually will) but if I did not have to do this I would be contributing CPU time already.

gladius · Post by **gladius** » Tue Mar 26, 2013 2:41 am

Michel wrote:
From a social point of view, I think this experiment is a breakthrough, at least in the computer chess world.
- anyone can help spend his CPU time for the Stockfish project. and because there's a page with all the users and the number of games they played, it's like a scoreboard: it entices a healthy competition between testers (well, that's my hope).
A small social bottleneck it seems to me is that it is non-trivial to install the client since it depends on some advanced python packages. I tried to do it on CentOS 6 (a widely used server OS) and had some problems. I am sure I can solve these (and eventually will) but if I did not have to do this I would be contributing CPU time already.

You are right, making the setup easier is definitely a big goal. Turns out it's a bit of a pain to including the python packages locally, but we will get there

.

As a side note, I actually developed it on centos 6, so if it runs well anywhere, it should be there!

The one hiccup was a python 2.7+ requirement, but that is no longer the case (I will update the docs now).

Michel · Post by **Michel** » Tue Mar 26, 2013 3:25 am

As a side note, I actually developed it on centos 6, so if it runs well anywhere, it should be there! Smile The one hiccup was a python 2.7+ requirement, but that is no longer the case (I will update the docs now).

Ok great. I must confess I did not try very hard since I was in a hurry. If you say it works then I will try again. Thanks.

lucasart · Post by **lucasart** » Tue Mar 26, 2013 3:44 am

Edmund wrote:
lucasart wrote:From a technical point of view, the current testing is better than before because:
- much more games can be played. before it was only marco, and testing was a real bottleneck. it's still the bottleneck, but less so now.
- patch validation is done in two steps: 16,000 games in 15+0.05 and 60,000 games in 60+0.05. Different machines are dynamically allocated to different tests, so that means a patch will get validated by a blend of machines. we are less likely to validate a patch just because it works well on one machine.

From a social point of view, I think this experiment is a breakthrough, at least in the computer chess world.
- anyone can help spend his CPU time for the Stockfish project. and because there's a page with all the users and the number of games they played, it's like a scoreboard: it entices a healthy competition between testers (well, that's my hope).
- anyone can see the patches being tested, and push their own patches to the repo. this also creates a competitive environment for developpers, who want to make some good patches and see them get validated on the fishtest board.
- by being open this way, I hope the project will attract more talented people to join. It would be great if Richard Vida could join the party.
Just an idea that came to my mind when reading your list. Would it be possible to add a ranking-list: per person the number of supplied patches & number of accepted patches & accepted/supplied ratio & average elo gain/loss per supplied patch & average/total elo gain of accepted patches ?

It's very hard to reliably measure that. But the testing standard is extremely rigourous. A patch needs to be an improvement, beyong all reasonable doubt (statistically) at both short and long time control, and on a blend of machines. Already getting one of your patches commited is something to be proud of. But I don't think developpers do it for that reason. Intellectual satisfaction is enough

For testers only (non developpers), it may not be as fun, as they don't get the intellectual satisfaction out of it. But their contribution does get recognized, through a kind of "rating list": sorted by descnding number of games played
http://54.235.120.254:6543/users

Anyone has some big machines, and want to take Gary's top place on this list ? What are you waiting for!

mcostalba · Post by **mcostalba** » Tue Mar 26, 2013 8:17 am

Michel wrote:
As a side note, I actually developed it on centos 6, so if it runs well anywhere, it should be there! Smile The one hiccup was a python 2.7+ requirement, but that is no longer the case (I will update the docs now).
Ok great. I must confess I did not try very hard since I was in a hurry. If you say it works then I will try again. Thanks.

Michel, I'd strongly suggest to read the fishtest's readme, on Linux setup is very easy. The biggest prerequisite is to have already an installation of gcc able to compile SF sources, so I'd suggest to first try to compile the sources your self with gcc, if this work you are a few steps away from a working installation.

jpqy · Post by **jpqy** » Tue Mar 26, 2013 9:33 am

Hi Marco,

I have tried to ask this a few times if it's possible to put the engine settings , aggressiveness & cowardice back into the compiles!

If i understand well ,they doesn't hurt Stockfish at all..but gives a lot fun to engine testers..
The last version i have with these settings is SF121227 and still my best version with changed aggr.&cow. settings.

I download last versions from http://abrok.eu/stockfish/
but non off them has these settings anymore..hope to see them back!

Thanks..Kind regards,
JP.

lucasart · Post by **lucasart** » Tue Mar 26, 2013 12:19 pm

jpqy wrote:Hi Marco,

I have tried to ask this a few times if it's possible to put the engine settings , aggressiveness & cowardice back into the compiles!

If i understand well ,they doesn't hurt Stockfish at all..but gives a lot fun to engine testers..
The last version i have with these settings is SF121227 and still my best version with changed aggr.&cow. settings.

I download last versions from http://abrok.eu/stockfish/
but non off them has these settings anymore..hope to see them back!

Thanks..Kind regards,
JP.

That's typically a good reason to remove these options. If you leave them in, you have people arguing about which values to use for these, and user requests about it.

The aggressiveness and cowardice settings control the asymmetry of the king safety. More aggressive and you would value more the king safety of your opponent than yours, and vice versa. The effect on game play of shifting these values far from their elo-optimum is really unclear. Increasing aggressiveness too much and you would just make SF play unsound moves.

I strongly believe that these values should be hardcoded in the program and optimized by developpers, rather than left to the user. The less options the user has to worry about, the better.

What is interesting to note with SF is that King safety is asymmetric. This was an idea of Tord Romstad in Glaurung, back in the days. And still today, removing this asymmetry is a clear regression (I tested it). So it really benefits from that asymetry, provided the values are optimized. And the values optimized by Joona are pretty good, I can tell you that. I (and others) have tried to tweak them, and everytime it was a regression!

jpqy · Post by **jpqy** » Tue Mar 26, 2013 12:53 pm

Lucas ,i can agree with your explenation..but where is your fun part in testing engines!?

I test all engines default..but i also enjoy playing around with engines settings and follow these games..
I don't see any problem to have 1setting or 100settings..the more i have the more fun i get with these engines..why worry about them?

Regression..yes..so what..i have still one with my own settings as my best Stockfish version and i still try to pass it with all these new compiles..still not happen.

JP.

Beta for Stockfish distributed testing

Re: Beta for Stockfish distributed testing

Re: Beta for Stockfish distributed testing

Re: Beta for Stockfish distributed testing

Re: Beta for Stockfish distributed testing

Re: Beta for Stockfish distributed testing

Re: Beta for Stockfish distributed testing

Re: Beta for Stockfish distributed testing

Re: Beta for Stockfish distributed testing

Re: Beta for Stockfish distributed testing

Re: Beta for Stockfish distributed testing