Stockfish testing: one question
Moderator: Ras
-
Jouni
- Posts: 3792
- Joined: Wed Mar 08, 2006 8:15 pm
- Full name: Jouni Uski
Stockfish testing: one question
AFAIK all testing is so far selftest. Have there been serious consideration about testing with foreign engines? There is no shortage from strong and free opponents. Lets' use Rybka, Critter, Houdini, Spike and Protector etc. Of course it takes same days to get reference score to SF3. After that maybe more effective to find real improvements?
Jouni
-
zamar
- Posts: 613
- Joined: Sun Jan 18, 2009 7:03 am
Re: Stockfish testing: one question
- So far I haven't seen a single example of the case when patch would do well in self-play, but fail against other opponents. At least if such cases exist, they are very rear.
- Gauntlet requires 2x more games, and still error bars are sqrt(2) times higher. Very bad trade.
- In self-play the ELO change is around 2x compared to matches against other engines. This is a very good thing for determining small improvements. To get the same resolution in gauntlets, we would 4x more games.
- Gauntlet requires 2x more games, and still error bars are sqrt(2) times higher. Very bad trade.
- In self-play the ELO change is around 2x compared to matches against other engines. This is a very good thing for determining small improvements. To get the same resolution in gauntlets, we would 4x more games.
Joona Kiiski
-
Uri Blass
- Posts: 11150
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: Stockfish testing: one question
Note that I also saw no example when A>B>C>A in selftesting.
I made some improvement in the mobility evaluation of stockfish by changing
the mobility array.
Let call the mobility vector M
I simply changed that array and let call the new vector M+d
I thought to try changing it again to the same direction and test M+3d against M+d but got objection because of the claim that I may fall into the trap A>B>C>A and they cannot do regression tests for every change.
I would like to know if there is a single case in computer chess that somebody got significant result of A beat B, B beat C and C beat A.
In theory it can happen but I do not know about a single case.
I made some improvement in the mobility evaluation of stockfish by changing
the mobility array.
Let call the mobility vector M
I simply changed that array and let call the new vector M+d
I thought to try changing it again to the same direction and test M+3d against M+d but got objection because of the claim that I may fall into the trap A>B>C>A and they cannot do regression tests for every change.
I would like to know if there is a single case in computer chess that somebody got significant result of A beat B, B beat C and C beat A.
In theory it can happen but I do not know about a single case.
-
zamar
- Posts: 613
- Joined: Sun Jan 18, 2009 7:03 am
Re: Stockfish testing: one question
I think it's completely logical to schedule M+3d against M+d. But I haven't objected this at any occasion.Uri Blass wrote:Note that I also saw no example when A>B>C>A in selftesting.
I made some improvement in the mobility evaluation of stockfish by changing
the mobility array.
Let call the mobility vector M
I simply changed that array and let call the new vector M+d
I thought to try changing it again to the same direction and test M+3d against M+d but got objection because of the claim that I may fall into the trap A>B>C>A and they cannot do regression tests for every change.
I would like to know if there is a single case in computer chess that somebody got significant result of A beat B, B beat C and C beat A.
In theory it can happen but I do not know about a single case.
Joona Kiiski
-
Uri Blass
- Posts: 11150
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: Stockfish testing: one question
Correct
It was Marco's opinion and because Marco has the final word
I decided not even to try it.
It was Marco's opinion and because Marco has the final word
I decided not even to try it.