YAFTS - Yet Another Fast Testing Scheme

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Uri Blass
Posts: 10282
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: YAFTS - Yet Another Fast Testing Scheme

Post by Uri Blass »

Here is an interesting variant of the test.

Use exactly the same positions but give rybka3 to search 1 hours on every position(it will take 4830/24 days of computer time).

Decide that the the programs need to find not the game's move but rybka's move.
I guess that the test may be good to predict rating of chess programs when you ignore rybka.

I also suspect that
optimizing the evaluation to find after one second as much as possible of what rybka can find in one hour may cause improvement in chess programs including improvement in rybka(4830 positions may be too small as Don suggest in one of his posts in this thread).

Uri
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: YAFTS - Yet Another Fast Testing Scheme

Post by bob »

Uri Blass wrote:Here is an interesting variant of the test.

Use exactly the same positions but give rybka3 to search 1 hours on every position(it will take 4830/24 days of computer time).

Decide that the the programs need to find not the game's move but rybka's move.
I guess that the test may be good to predict rating of chess programs when you ignore rybka.

I also suspect that
optimizing the evaluation to find after one second as much as possible of what rybka can find in one hour may cause improvement in chess programs including improvement in rybka(4830 positions may be too small as Don suggest in one of his posts in this thread).

Uri

I think that the idea of trying to tune an evaluation for a one second search so that it matches the moves played by a one hour search is so far beyond flawed, it takes sunlight 6 months to get from flawed to that idea. Tactics can _not_ be addressed by positional attributes in today's programs... Tuning to make this appear to happen will completely wreck a chess program.
Uri Blass
Posts: 10282
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: YAFTS - Yet Another Fast Testing Scheme

Post by Uri Blass »

bob wrote:
Uri Blass wrote:Here is an interesting variant of the test.

Use exactly the same positions but give rybka3 to search 1 hours on every position(it will take 4830/24 days of computer time).

Decide that the the programs need to find not the game's move but rybka's move.
I guess that the test may be good to predict rating of chess programs when you ignore rybka.

I also suspect that
optimizing the evaluation to find after one second as much as possible of what rybka can find in one hour may cause improvement in chess programs including improvement in rybka(4830 positions may be too small as Don suggest in one of his posts in this thread).

Uri

I think that the idea of trying to tune an evaluation for a one second search so that it matches the moves played by a one hour search is so far beyond flawed, it takes sunlight 6 months to get from flawed to that idea. Tactics can _not_ be addressed by positional attributes in today's programs... Tuning to make this appear to happen will completely wreck a chess program.
Programs can find better positional moves by deeper search and it is correct also for programs with not tuned evaluation so I see no reason to assume that the idea does not work.

The only way to know if the idea works or does not work is by trying it and comparing the result of many games with the result of the test.

I do not say that it gives correct result for every change but if it give correct result for big majority of the changes then it can be productive because it does it faster than games(analyzing 100,000 positions at 1 second per move is faster than playing 100,000 games).

Uri
MattieShoes
Posts: 718
Joined: Fri Mar 20, 2009 8:59 pm

Re: YAFTS - Yet Another Fast Testing Scheme

Post by MattieShoes »

I was thinking the deep thought eval tuning stuff might be applicable here, since they're essentially rating different evaluation functions based on a series of positions, and that seems like mostly what you'd get doing 1 second searches. They were using the GM move as the "oracle" rather than rybka analysis obviously.

http://www.tim-mann.org/DT_eval_tune.txt
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: YAFTS - Yet Another Fast Testing Scheme

Post by diep »

Uri Blass wrote:Here is an interesting variant of the test.

Use exactly the same positions but give rybka3 to search 1 hours on every position(it will take 4830/24 days of computer time).

Decide that the the programs need to find not the game's move but rybka's move.
I guess that the test may be good to predict rating of chess programs when you ignore rybka.

I also suspect that
optimizing the evaluation to find after one second as much as possible of what rybka can find in one hour may cause improvement in chess programs including improvement in rybka(4830 positions may be too small as Don suggest in one of his posts in this thread).

Uri
Problem of your test method is that you favour engines with little chessknowledge over engines with a lot of chessknowledge. Marc is doing it a lot better there.

Rybka as an engine is good in avoiding mistakes, that is not the same as finding the superior positional move that distinguishes 2600 rated corr players from 2400 guys who just take over what their TFT shows.

You should give Marc credit that he managed to have found yet another method to prove that chessknowledge works.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: YAFTS - Yet Another Fast Testing Scheme

Post by bob »

Uri Blass wrote:
bob wrote:
Uri Blass wrote:Here is an interesting variant of the test.

Use exactly the same positions but give rybka3 to search 1 hours on every position(it will take 4830/24 days of computer time).

Decide that the the programs need to find not the game's move but rybka's move.
I guess that the test may be good to predict rating of chess programs when you ignore rybka.

I also suspect that
optimizing the evaluation to find after one second as much as possible of what rybka can find in one hour may cause improvement in chess programs including improvement in rybka(4830 positions may be too small as Don suggest in one of his posts in this thread).

Uri

I think that the idea of trying to tune an evaluation for a one second search so that it matches the moves played by a one hour search is so far beyond flawed, it takes sunlight 6 months to get from flawed to that idea. Tactics can _not_ be addressed by positional attributes in today's programs... Tuning to make this appear to happen will completely wreck a chess program.
Programs can find better positional moves by deeper search and it is correct also for programs with not tuned evaluation so I see no reason to assume that the idea does not work.

The only way to know if the idea works or does not work is by trying it and comparing the result of many games with the result of the test.

I do not say that it gives correct result for every change but if it give correct result for big majority of the changes then it can be productive because it does it faster than games(analyzing 100,000 positions at 1 second per move is faster than playing 100,000 games).

Uri
First question. What percentage of the moves played during a game of chess are made because of a tactical necessity? Those you can't possibly tune an evaluation to find in a 1 second search...
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: YAFTS - Yet Another Fast Testing Scheme

Post by bob »

MattieShoes wrote:I was thinking the deep thought eval tuning stuff might be applicable here, since they're essentially rating different evaluation functions based on a series of positions, and that seems like mostly what you'd get doing 1 second searches. They were using the GM move as the "oracle" rather than rybka analysis obviously.

http://www.tim-mann.org/DT_eval_tune.txt
yes, but they screened the tactical moves out, so that all that was left were positional issues to deal with...
MattieShoes
Posts: 718
Joined: Fri Mar 20, 2009 8:59 pm

Re: YAFTS - Yet Another Fast Testing Scheme

Post by MattieShoes »

That's kind of what I was getting at. They went through a lot of work to make their eval tuning work, and the paper details some of the pitfalls, like culling positions where the chosen move is wildly different in score than the "best" move, and how deeper searches yield better results. The functions they were using to measure quality of eval could be used to rank quality of different engines just as easily.

They also point out that the tuning helped but the most "tuned" versions underperformed. I'm guessing the eval was getting the right answers for the wrong reasons then, so even with care, you're likely to get outliers, where their strength is not well represented by their score.
Uri Blass
Posts: 10282
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: YAFTS - Yet Another Fast Testing Scheme

Post by Uri Blass »

diep wrote:
Uri Blass wrote:Here is an interesting variant of the test.

Use exactly the same positions but give rybka3 to search 1 hours on every position(it will take 4830/24 days of computer time).

Decide that the the programs need to find not the game's move but rybka's move.
I guess that the test may be good to predict rating of chess programs when you ignore rybka.

I also suspect that
optimizing the evaluation to find after one second as much as possible of what rybka can find in one hour may cause improvement in chess programs including improvement in rybka(4830 positions may be too small as Don suggest in one of his posts in this thread).

Uri
Problem of your test method is that you favour engines with little chessknowledge over engines with a lot of chessknowledge. Marc is doing it a lot better there.

Rybka as an engine is good in avoiding mistakes, that is not the same as finding the superior positional move that distinguishes 2600 rated corr players from 2400 guys who just take over what their TFT shows.

You should give Marc credit that he managed to have found yet another method to prove that chessknowledge works.
I disagree

I have iccf rating of more than 2600 and I rely mainly on chess engines(of course I used average time of more than one hour per move).
most of my moves that helped me to get iccf rating above 2600 were result of long analysis of chess engines and it was before rybka.

I do not believe in the theory that rybka is only good in avoiding mistakes and I believe that rybka is simply good in finding better positional moves
if you use it for a long time.

I have one example from analyzing some quiet theory position in the spanish defence with all the 32 pieces on the board when rybka changed her mind to the theory move after a long time.

Uri
Marc Lacrosse
Posts: 511
Joined: Wed Mar 08, 2006 10:05 pm

Re: YAFTS - Yet Another Fast Testing Scheme

Post by Marc Lacrosse »

Don wrote:I am doing something very much like this with Larry Kauman.

We did a bunch of work to throw out positions that are less relevant such as positions that are too easy. If several programs including weak and strong programs find the move, we consider it too easy.
I just did a little experimentation in the same direction.

In my initial investigation, analysis of 4830 positions led to a rating estimation with a 58 elo points median error.

I considered that positions where either all tested engines or none of them were able to find the good move were uninteresting for ranking purposes.

Having discarded these positions, the same linear regression analysis was applied to the remaining positions.This led to a clearly improved estimation : median error of the estimated elo was lowered from 58 to 47.4 elo points with a very good correlation between CCRL "true" elo and the new estimation (regression coefficient r = 0.92).
In the same time, number of positions (and thus analysis time) was reduced by a little more than 30 percents (3376 positions versus 4830 initially).

Interesting ...

Marc