YAFTS - Yet Another Fast Testing Scheme

Uri Blass · Post by **Uri Blass** » Sun Apr 19, 2009 12:28 pm

Here is an interesting variant of the test.

Use exactly the same positions but give rybka3 to search 1 hours on every position(it will take 4830/24 days of computer time).

Decide that the the programs need to find not the game's move but rybka's move.
I guess that the test may be good to predict rating of chess programs when you ignore rybka.

I also suspect that
optimizing the evaluation to find after one second as much as possible of what rybka can find in one hour may cause improvement in chess programs including improvement in rybka(4830 positions may be too small as Don suggest in one of his posts in this thread).

Uri

bob · Post by **bob** » Sun Apr 19, 2009 5:56 pm

Uri Blass wrote:Here is an interesting variant of the test.

Use exactly the same positions but give rybka3 to search 1 hours on every position(it will take 4830/24 days of computer time).

Decide that the the programs need to find not the game's move but rybka's move.
I guess that the test may be good to predict rating of chess programs when you ignore rybka.

I also suspect that
optimizing the evaluation to find after one second as much as possible of what rybka can find in one hour may cause improvement in chess programs including improvement in rybka(4830 positions may be too small as Don suggest in one of his posts in this thread).

Uri

I think that the idea of trying to tune an evaluation for a one second search so that it matches the moves played by a one hour search is so far beyond flawed, it takes sunlight 6 months to get from flawed to that idea. Tactics can _not_ be addressed by positional attributes in today's programs... Tuning to make this appear to happen will completely wreck a chess program.

Uri Blass · Post by **Uri Blass** » Sun Apr 19, 2009 6:33 pm

bob wrote:
Uri Blass wrote:Here is an interesting variant of the test.

Use exactly the same positions but give rybka3 to search 1 hours on every position(it will take 4830/24 days of computer time).

Decide that the the programs need to find not the game's move but rybka's move.
I guess that the test may be good to predict rating of chess programs when you ignore rybka.

I also suspect that
optimizing the evaluation to find after one second as much as possible of what rybka can find in one hour may cause improvement in chess programs including improvement in rybka(4830 positions may be too small as Don suggest in one of his posts in this thread).

Uri

I think that the idea of trying to tune an evaluation for a one second search so that it matches the moves played by a one hour search is so far beyond flawed, it takes sunlight 6 months to get from flawed to that idea. Tactics can _not_ be addressed by positional attributes in today's programs... Tuning to make this appear to happen will completely wreck a chess program.

Programs can find better positional moves by deeper search and it is correct also for programs with not tuned evaluation so I see no reason to assume that the idea does not work.

The only way to know if the idea works or does not work is by trying it and comparing the result of many games with the result of the test.

I do not say that it gives correct result for every change but if it give correct result for big majority of the changes then it can be productive because it does it faster than games(analyzing 100,000 positions at 1 second per move is faster than playing 100,000 games).

Uri

MattieShoes · Post by **MattieShoes** » Mon Apr 20, 2009 12:16 am

I was thinking the deep thought eval tuning stuff might be applicable here, since they're essentially rating different evaluation functions based on a series of positions, and that seems like mostly what you'd get doing 1 second searches. They were using the GM move as the "oracle" rather than rybka analysis obviously.

http://www.tim-mann.org/DT_eval_tune.txt

diep · Post by **diep** » Mon Apr 20, 2009 2:12 am

Uri Blass wrote:Here is an interesting variant of the test.

Use exactly the same positions but give rybka3 to search 1 hours on every position(it will take 4830/24 days of computer time).

Decide that the the programs need to find not the game's move but rybka's move.
I guess that the test may be good to predict rating of chess programs when you ignore rybka.

I also suspect that
optimizing the evaluation to find after one second as much as possible of what rybka can find in one hour may cause improvement in chess programs including improvement in rybka(4830 positions may be too small as Don suggest in one of his posts in this thread).

Uri

Problem of your test method is that you favour engines with little chessknowledge over engines with a lot of chessknowledge. Marc is doing it a lot better there.

Rybka as an engine is good in avoiding mistakes, that is not the same as finding the superior positional move that distinguishes 2600 rated corr players from 2400 guys who just take over what their TFT shows.

You should give Marc credit that he managed to have found yet another method to prove that chessknowledge works.

bob · Post by **bob** » Mon Apr 20, 2009 3:52 am

Uri Blass wrote:
bob wrote:
Uri Blass wrote:Here is an interesting variant of the test.

Use exactly the same positions but give rybka3 to search 1 hours on every position(it will take 4830/24 days of computer time).

Decide that the the programs need to find not the game's move but rybka's move.
I guess that the test may be good to predict rating of chess programs when you ignore rybka.

I also suspect that
optimizing the evaluation to find after one second as much as possible of what rybka can find in one hour may cause improvement in chess programs including improvement in rybka(4830 positions may be too small as Don suggest in one of his posts in this thread).

Uri

I think that the idea of trying to tune an evaluation for a one second search so that it matches the moves played by a one hour search is so far beyond flawed, it takes sunlight 6 months to get from flawed to that idea. Tactics can _not_ be addressed by positional attributes in today's programs... Tuning to make this appear to happen will completely wreck a chess program.
Programs can find better positional moves by deeper search and it is correct also for programs with not tuned evaluation so I see no reason to assume that the idea does not work.

The only way to know if the idea works or does not work is by trying it and comparing the result of many games with the result of the test.

I do not say that it gives correct result for every change but if it give correct result for big majority of the changes then it can be productive because it does it faster than games(analyzing 100,000 positions at 1 second per move is faster than playing 100,000 games).

Uri

First question. What percentage of the moves played during a game of chess are made because of a tactical necessity? Those you can't possibly tune an evaluation to find in a 1 second search...

bob · Post by **bob** » Mon Apr 20, 2009 3:53 am

MattieShoes wrote:I was thinking the deep thought eval tuning stuff might be applicable here, since they're essentially rating different evaluation functions based on a series of positions, and that seems like mostly what you'd get doing 1 second searches. They were using the GM move as the "oracle" rather than rybka analysis obviously.

http://www.tim-mann.org/DT_eval_tune.txt

yes, but they screened the tactical moves out, so that all that was left were positional issues to deal with...

MattieShoes · Post by **MattieShoes** » Mon Apr 20, 2009 6:16 am

That's kind of what I was getting at. They went through a lot of work to make their eval tuning work, and the paper details some of the pitfalls, like culling positions where the chosen move is wildly different in score than the "best" move, and how deeper searches yield better results. The functions they were using to measure quality of eval could be used to rank quality of different engines just as easily.

They also point out that the tuning helped but the most "tuned" versions underperformed. I'm guessing the eval was getting the right answers for the wrong reasons then, so even with care, you're likely to get outliers, where their strength is not well represented by their score.

Uri Blass · Post by **Uri Blass** » Mon Apr 20, 2009 6:39 am

diep wrote:
Uri Blass wrote:Here is an interesting variant of the test.

Use exactly the same positions but give rybka3 to search 1 hours on every position(it will take 4830/24 days of computer time).

Decide that the the programs need to find not the game's move but rybka's move.
I guess that the test may be good to predict rating of chess programs when you ignore rybka.

I also suspect that
optimizing the evaluation to find after one second as much as possible of what rybka can find in one hour may cause improvement in chess programs including improvement in rybka(4830 positions may be too small as Don suggest in one of his posts in this thread).

Uri
Problem of your test method is that you favour engines with little chessknowledge over engines with a lot of chessknowledge. Marc is doing it a lot better there.

Rybka as an engine is good in avoiding mistakes, that is not the same as finding the superior positional move that distinguishes 2600 rated corr players from 2400 guys who just take over what their TFT shows.

You should give Marc credit that he managed to have found yet another method to prove that chessknowledge works.

I disagree

I have iccf rating of more than 2600 and I rely mainly on chess engines(of course I used average time of more than one hour per move).
most of my moves that helped me to get iccf rating above 2600 were result of long analysis of chess engines and it was before rybka.

I do not believe in the theory that rybka is only good in avoiding mistakes and I believe that rybka is simply good in finding better positional moves
if you use it for a long time.

I have one example from analyzing some quiet theory position in the spanish defence with all the 32 pieces on the board when rybka changed her mind to the theory move after a long time.

Uri

Marc Lacrosse · Post by **Marc Lacrosse** » Mon Apr 20, 2009 11:18 pm

Don wrote:I am doing something very much like this with Larry Kauman.

We did a bunch of work to throw out positions that are less relevant such as positions that are too easy. If several programs including weak and strong programs find the move, we consider it too easy.

I just did a little experimentation in the same direction.

In my initial investigation, analysis of 4830 positions led to a rating estimation with a 58 elo points median error.

I considered that positions where either all tested engines or none of them were able to find the good move were uninteresting for ranking purposes.

Having discarded these positions, the same linear regression analysis was applied to the remaining positions.This led to a clearly improved estimation : median error of the estimated elo was lowered from 58 to 47.4 elo points with a very good correlation between CCRL "true" elo and the new estimation (regression coefficient r = 0.92).
In the same time, number of positions (and thus analysis time) was reduced by a little more than 30 percents (3376 positions versus 4830 initially).

Interesting ...

Marc

YAFTS - Yet Another Fast Testing Scheme

Re: YAFTS - Yet Another Fast Testing Scheme

Re: YAFTS - Yet Another Fast Testing Scheme

Re: YAFTS - Yet Another Fast Testing Scheme

Re: YAFTS - Yet Another Fast Testing Scheme

Re: YAFTS - Yet Another Fast Testing Scheme

Re: YAFTS - Yet Another Fast Testing Scheme

Re: YAFTS - Yet Another Fast Testing Scheme

Re: YAFTS - Yet Another Fast Testing Scheme

Re: YAFTS - Yet Another Fast Testing Scheme

Re: YAFTS - Yet Another Fast Testing Scheme