Hi
I have made a few changes to StockFish 1.7.1 and called it Crab 1.0 beta. Basically its just a different personality of StockFish and I assume weaker than SF. Now if you could pick one of the changes from Crab and test it against SF which one would you pick, and why? Can you suggest improvements for that idea?
here is the thread:
http://talkchess.com/forum/viewtopic.ph ... ew=threads
Regards.
Testing Crab
Moderator: Ras
-
mcostalba
- Posts: 2684
- Joined: Sat Jun 14, 2008 9:17 pm
Re: Testing Crab
Hi Adam,Look wrote:Hi
I have made a few changes to StockFish 1.7.1 and called it Crab 1.0 beta. Basically its just a different personality of StockFish and I assume weaker than SF. Now if you could pick one of the changes from Crab and test it against SF which one would you pick, and why? Can you suggest improvements for that idea?
here is the thread:
http://talkchess.com/forum/viewtopic.ph ... ew=threads
Regards.
please take with no offence, but I would guess that nobody is going to do your homework for you
If you don't test your changes why others should do?
And you don't need a big hardware for it, I have started with a single core 32 bit machine, you just need time, more or less 1-2 days (as you already know).
I would guess if you come here with a test result saying Crab is stronger then SF people will be much more interested in committing time for reviewing your work. Of course test should be serious; last time there was a guy that claimed 150 ELO increase based on some "internal test", of course it was just bullsh...
To see published serious tests it is a very important sign for me that I am not facing a mister wannabe, much more then reading his patches...but this is just my opinion
-
Look
Re: Testing Crab
Well The point about hardware is what matters IMO. I already knew that testing with something like 1min per game is advised. Nevertheless since this was much slower than common hardware used by fans , I thought a different approach may be sound. Also I was not aware of the 150 elo points case, seems unrealistic of course, but I did not made such claims.Hi Adam,Hi
I have made a few changes to StockFish 1.7.1 and called it Crab 1.0 beta. Basically its just a different personality of StockFish and I assume weaker than SF. Now if you could pick one of the changes from Crab and test it against SF which one would you pick, and why? Can you suggest improvements for that idea?
here is the thread:
http://talkchess.com/forum/viewtopic.ph ... ew=threads
Regards.
please take with no offence, but I would guess that nobody is going to do your homework for you
If you dont test your changes why others should do?
And you dont need a big hardware for it, I have started with a single core 32 bit machine, you just need time, more or less 1-2 days (as you already know).
I would guess if you come here with a test result saying Crab is stronger then SF people will be much more interested in committing time for reviewing your work. Of course test should be serious; last time there was a guy that claimed 150 ELO increase based on some internal test, of course it was just bullsh...
To see published serious tests it is a very important sign for me that I am not facing a mister wannabe, much more then reading his patches...but this is just my opinion
My point is that there could be an interesting idea, but how to make sure that say, there is no bug in it or similar stuff, I think in these situations something like a critical look from those who know better what is going on might help.
-
mcostalba
- Posts: 2684
- Joined: Sat Jun 14, 2008 9:17 pm
Re: Testing Crab
Well, a lot of people is testing with ultra fast TC, so I think 1 minute is good also on a slow hardware: if you reach search depth 12-13 in middlegame positions I think it is more then enough (it is what Bob reaches in his super-cluster and has done like that for decades).Look wrote:
Well The point about hardware is what matters IMO. I already knew that testing with something like 1min per game is advised. Nevertheless since this was much slower than common hardware used by fans , I thought a different approach may be sound. Also I was not aware of the 150 elo points case, seems unrealistic of course, but I did not made such claims.
-
Look
Re: Testing Crab
HiWell, a lot of people is testing with ultra fast TC, so I think 1 minute is good also on a slow hardware: if you reach search depth 12-13 in middlegame positions I think it is more then enough (it is what Bob reaches in his super-cluster and has done like that for decades).
Well The point about hardware is what matters IMO. I already knew that testing with something like 1min per game is advised. Nevertheless since this was much slower than common hardware used by fans , I thought a different approach may be sound. Also I was not aware of the 150 elo points case, seems unrealistic of course, but I did not made such claims.
Some independent tests ponder on:
http://rybkaforum.net/cgi-bin/rybkaforu ... ?tid=17608
It seems Crab 1.0 beta is just one or two points weaker than SF 1.7.1 . This is really surprising to me. I still expect current Crab to be weaker than this. But it seems promising to me. I am for now trying individual changes to see what happens.
-
mcostalba
- Posts: 2684
- Joined: Sat Jun 14, 2008 9:17 pm
Re: Testing Crab
On 600 games you can more or less reliably spot differences of 15-20 ELO, not less.Look wrote:HiWell, a lot of people is testing with ultra fast TC, so I think 1 minute is good also on a slow hardware: if you reach search depth 12-13 in middlegame positions I think it is more then enough (it is what Bob reaches in his super-cluster and has done like that for decades).
Well The point about hardware is what matters IMO. I already knew that testing with something like 1min per game is advised. Nevertheless since this was much slower than common hardware used by fans , I thought a different approach may be sound. Also I was not aware of the 150 elo points case, seems unrealistic of course, but I did not made such claims.
Some independent tests ponder on:
http://rybkaforum.net/cgi-bin/rybkaforu ... ?tid=17608
It seems Crab 1.0 beta is just one or two points weaker than SF 1.7.1 . This is really surprising to me. I still expect current Crab to be weaker than this. But it seems promising to me. I am for now trying individual changes to see what happens.
I suggest to test yourself compiling both Crab and 1.7.1 so to remove the different compiler speed possible noise.
No need to say the idea of testing single changes is the right way to go
-
Look
Re: Testing Crab
Well I already know what I have done in the code, in some places I tried some (very) aggressive changes on code which very likely has been tuned. In this regards, the few logical changes, if revised can make a difference, thats my hope at least.On 600 games you can more or less reliably spot differences of 15-20 ELO, not less.HiWell, a lot of people is testing with ultra fast TC, so I think 1 minute is good also on a slow hardware: if you reach search depth 12-13 in middlegame positions I think it is more then enough (it is what Bob reaches in his super-cluster and has done like that for decades).
Well The point about hardware is what matters IMO. I already knew that testing with something like 1min per game is advised. Nevertheless since this was much slower than common hardware used by fans , I thought a different approach may be sound. Also I was not aware of the 150 elo points case, seems unrealistic of course, but I did not made such claims.
Some independent tests ponder on:
http://rybkaforum.net/cgi-bin/rybkaforu ... ?tid=17608
It seems Crab 1.0 beta is just one or two points weaker than SF 1.7.1 . This is really surprising to me. I still expect current Crab to be weaker than this. But it seems promising to me. I am for now trying individual changes to see what happens.
I suggest to test yourself compiling both Crab and 1.7.1 so to remove the different compiler speed possible noise.
No need to say the idea of testing single changes is the right way to goalthough not the fastest.
But as a general observation on Stockfish, I remember Glaurang days, coding was done way above my understanding, but seemed really well established. With Stockfish you did some novel and aggressive ideas which evidently has worked. But the comments are what has made SF what it is now. I may compare that to K&R in C, which has exceptional comments for highly efficient and brilliant code. Nevertheless, dont forget that despite great strength, SF has some serious and fundamental weaknesses in several areas.
-
mcostalba
- Posts: 2684
- Joined: Sat Jun 14, 2008 9:17 pm
Re: Testing Crab
I never forget thisLook wrote:Nevertheless, dont forget that despite great strength, SF has some serious and fundamental weaknesses in several areas.
...but, simply, I am not able to fix that weaknesses...
-
Look
Re: Testing Crab
Well in my first post in this section I specifically made two suggestions about eval tuning and run-time data. If you once decided to address some of those issues I assume you can pick a long time project along casual ideas and try to improve it. Note that I dont specifically mean you or your team, just anyone interested in these areas can give these a try. Point is there are many novel stuff to go for but many people that I see here want to reinvent the wheel.I never forget thisNevertheless, dont forget that despite great strength, SF has some serious and fundamental weaknesses in several areas.
...but, simply, I am not able to fix that weaknesses...
-
lech
- Posts: 1170
- Joined: Sun Feb 14, 2010 10:02 pm
Re: Testing Crab
In many areas, the code is a dirty speculation. Is it possible to do it differently? Probably can, but unfortunately there is a pressure on the ELO growth. For now, developers are fighting mainly in technical areas, because they feel more comfortable there.mcostalba wrote:I never forget thisLook wrote:Nevertheless, dont forget that despite great strength, SF has some serious and fundamental weaknesses in several areas.
...but, simply, I am not able to fix that weaknesses...