Testing Crab

Look · Post by **Look** » Sun Jun 13, 2010 4:39 pm

Hi

I have made a few changes to StockFish 1.7.1 and called it Crab 1.0 beta. Basically its just a different personality of StockFish and I assume weaker than SF. Now if you could pick one of the changes from Crab and test it against SF which one would you pick, and why? Can you suggest improvements for that idea?

here is the thread:
http://talkchess.com/forum/viewtopic.ph ... ew=threads

Regards.

mcostalba · Post by **mcostalba** » Sun Jun 13, 2010 5:14 pm

Look wrote:Hi

I have made a few changes to StockFish 1.7.1 and called it Crab 1.0 beta. Basically its just a different personality of StockFish and I assume weaker than SF. Now if you could pick one of the changes from Crab and test it against SF which one would you pick, and why? Can you suggest improvements for that idea?

here is the thread:
http://talkchess.com/forum/viewtopic.ph ... ew=threads

Regards.

Hi Adam,

please take with no offence, but I would guess that nobody is going to do your homework for you

If you don't test your changes why others should do?

And you don't need a big hardware for it, I have started with a single core 32 bit machine, you just need time, more or less 1-2 days (as you already know).

I would guess if you come here with a test result saying Crab is stronger then SF people will be much more interested in committing time for reviewing your work. Of course test should be serious; last time there was a guy that claimed 150 ELO increase based on some "internal test", of course it was just bullsh...

To see published serious tests it is a very important sign for me that I am not facing a mister wannabe, much more then reading his patches...but this is just my opinion

Look · Post by **Look** » Sun Jun 13, 2010 5:43 pm

Hi

I have made a few changes to StockFish 1.7.1 and called it Crab 1.0 beta. Basically its just a different personality of StockFish and I assume weaker than SF. Now if you could pick one of the changes from Crab and test it against SF which one would you pick, and why? Can you suggest improvements for that idea?

here is the thread:
http://talkchess.com/forum/viewtopic.ph ... ew=threads

Regards.
Hi Adam,

please take with no offence, but I would guess that nobody is going to do your homework for you

If you dont test your changes why others should do?

And you dont need a big hardware for it, I have started with a single core 32 bit machine, you just need time, more or less 1-2 days (as you already know).

I would guess if you come here with a test result saying Crab is stronger then SF people will be much more interested in committing time for reviewing your work. Of course test should be serious; last time there was a guy that claimed 150 ELO increase based on some internal test, of course it was just bullsh...

To see published serious tests it is a very important sign for me that I am not facing a mister wannabe, much more then reading his patches...but this is just my opinion

Well The point about hardware is what matters IMO. I already knew that testing with something like 1min per game is advised. Nevertheless since this was much slower than common hardware used by fans , I thought a different approach may be sound. Also I was not aware of the 150 elo points case, seems unrealistic of course, but I did not made such claims.

My point is that there could be an interesting idea, but how to make sure that say, there is no bug in it or similar stuff, I think in these situations something like a critical look from those who know better what is going on might help.

mcostalba · Post by **mcostalba** » Sun Jun 13, 2010 5:48 pm

Look wrote:
Well The point about hardware is what matters IMO. I already knew that testing with something like 1min per game is advised. Nevertheless since this was much slower than common hardware used by fans , I thought a different approach may be sound. Also I was not aware of the 150 elo points case, seems unrealistic of course, but I did not made such claims.

Well, a lot of people is testing with ultra fast TC, so I think 1 minute is good also on a slow hardware: if you reach search depth 12-13 in middlegame positions I think it is more then enough (it is what Bob reaches in his super-cluster and has done like that for decades).

Look · Post by **Look** » Mon Jun 14, 2010 6:07 pm

Well The point about hardware is what matters IMO. I already knew that testing with something like 1min per game is advised. Nevertheless since this was much slower than common hardware used by fans , I thought a different approach may be sound. Also I was not aware of the 150 elo points case, seems unrealistic of course, but I did not made such claims.
Well, a lot of people is testing with ultra fast TC, so I think 1 minute is good also on a slow hardware: if you reach search depth 12-13 in middlegame positions I think it is more then enough (it is what Bob reaches in his super-cluster and has done like that for decades).

Hi

Some independent tests ponder on:
http://rybkaforum.net/cgi-bin/rybkaforu ... ?tid=17608

It seems Crab 1.0 beta is just one or two points weaker than SF 1.7.1 . This is really surprising to me. I still expect current Crab to be weaker than this. But it seems promising to me. I am for now trying individual changes to see what happens.

mcostalba · Post by **mcostalba** » Mon Jun 14, 2010 6:33 pm

Look wrote:

Well The point about hardware is what matters IMO. I already knew that testing with something like 1min per game is advised. Nevertheless since this was much slower than common hardware used by fans , I thought a different approach may be sound. Also I was not aware of the 150 elo points case, seems unrealistic of course, but I did not made such claims.
Well, a lot of people is testing with ultra fast TC, so I think 1 minute is good also on a slow hardware: if you reach search depth 12-13 in middlegame positions I think it is more then enough (it is what Bob reaches in his super-cluster and has done like that for decades).
Hi

Some independent tests ponder on:
http://rybkaforum.net/cgi-bin/rybkaforu ... ?tid=17608

It seems Crab 1.0 beta is just one or two points weaker than SF 1.7.1 . This is really surprising to me. I still expect current Crab to be weaker than this. But it seems promising to me. I am for now trying individual changes to see what happens.

On 600 games you can more or less reliably spot differences of 15-20 ELO, not less.

I suggest to test yourself compiling both Crab and 1.7.1 so to remove the different compiler speed possible noise.

No need to say the idea of testing single changes is the right way to go

although not the fastest.

Look · Post by **Look** » Mon Jun 14, 2010 7:12 pm

Well The point about hardware is what matters IMO. I already knew that testing with something like 1min per game is advised. Nevertheless since this was much slower than common hardware used by fans , I thought a different approach may be sound. Also I was not aware of the 150 elo points case, seems unrealistic of course, but I did not made such claims.
Well, a lot of people is testing with ultra fast TC, so I think 1 minute is good also on a slow hardware: if you reach search depth 12-13 in middlegame positions I think it is more then enough (it is what Bob reaches in his super-cluster and has done like that for decades).
Hi

Some independent tests ponder on:
http://rybkaforum.net/cgi-bin/rybkaforu ... ?tid=17608

It seems Crab 1.0 beta is just one or two points weaker than SF 1.7.1 . This is really surprising to me. I still expect current Crab to be weaker than this. But it seems promising to me. I am for now trying individual changes to see what happens.
On 600 games you can more or less reliably spot differences of 15-20 ELO, not less.

I suggest to test yourself compiling both Crab and 1.7.1 so to remove the different compiler speed possible noise.

No need to say the idea of testing single changes is the right way to go although not the fastest.

Well I already know what I have done in the code, in some places I tried some (very) aggressive changes on code which very likely has been tuned. In this regards, the few logical changes, if revised can make a difference, thats my hope at least.

But as a general observation on Stockfish, I remember Glaurang days, coding was done way above my understanding, but seemed really well established. With Stockfish you did some novel and aggressive ideas which evidently has worked. But the comments are what has made SF what it is now. I may compare that to K&R in C, which has exceptional comments for highly efficient and brilliant code. Nevertheless, dont forget that despite great strength, SF has some serious and fundamental weaknesses in several areas.

mcostalba · Post by **mcostalba** » Mon Jun 14, 2010 7:25 pm

Look wrote:Nevertheless, dont forget that despite great strength, SF has some serious and fundamental weaknesses in several areas.

I never forget this

...but, simply, I am not able to fix that weaknesses...

Look · Post by **Look** » Mon Jun 14, 2010 7:35 pm

Nevertheless, dont forget that despite great strength, SF has some serious and fundamental weaknesses in several areas.
I never forget this

...but, simply, I am not able to fix that weaknesses...

Well in my first post in this section I specifically made two suggestions about eval tuning and run-time data. If you once decided to address some of those issues I assume you can pick a long time project along casual ideas and try to improve it. Note that I dont specifically mean you or your team, just anyone interested in these areas can give these a try. Point is there are many novel stuff to go for but many people that I see here want to reinvent the wheel.

lech · Post by **lech** » Tue Jun 15, 2010 12:37 am

mcostalba wrote:
Look wrote:Nevertheless, dont forget that despite great strength, SF has some serious and fundamental weaknesses in several areas.
I never forget this

...but, simply, I am not able to fix that weaknesses...

In many areas, the code is a dirty speculation. Is it possible to do it differently? Probably can, but unfortunately there is a pressure on the ELO growth. For now, developers are fighting mainly in technical areas, because they feel more comfortable there.

Testing Crab

Testing Crab

Re: Testing Crab

Re: Testing Crab

Re: Testing Crab

Re: Testing Crab

Re: Testing Crab

Re: Testing Crab

Re: Testing Crab

Re: Testing Crab

Re: Testing Crab