Questions for the Stockfish team

bob · Post by **bob** » Thu Jul 22, 2010 1:28 am

Daniel Shawul wrote:Oh so now you are afraid of the result so you start complaining it should be 30 min per game, so that it would take decades. People are not stupid and will see the light at the end of the tunnel !

Just specify your conditions here so that we will get this crap done and dusted once and for all.

Daniel, I am not afraid of _anything_. So please get that chip off your shoulder, grow up a bit, and stop this nonsense. If, you had read Beal's paper, before diving into a discussion you knew _absolutely_ nothing about, most of this would not be taking place. But since you did no research, just formed an uninformed opinion and dove in, you are going to make a ton of mistakes.

I am reporting what Beal found. I have also found the same thing. And others that are using Crafty skill=1 for their rating lists are finding the _same_ thing. So lose the chip, pay attention, do the research, and participate usefully. I doubt, at the moment, that you could even recognize the tunnel, much less the light at the end of it, since you have done no research at all.

My conditions are

(a) purely random eval, exactly as is done in Crafty, except that I only go down to 1% real 99% random, while in recent tests I have been using 100% random.

(b) reasonable games, not game in one second or 5 seconds. This effect does depend on reasonable depth, as Beal reported and I have mentioned several times now. I have even explained that my "best fix" so far has been to add a cpu-burning loop in evaluate to slow the search down, which reduces the depth. Because less depth reduces the "Beal effect."

Given those things, you ought to be able to make this work, and run a real test. Once you figure out how many games you need. You complain about 67-0 which is an expected result, in fact. So until you have some idea of what this is all about, there's no need to run any tests or anything else, except to study Beal's paper, the previous discussions in the general forum, and then set about testing once you understand how to actually run a meaningful test. Hint: 67 games is _not_ meaningful between two opponents this different in strength. But most of us understand that already and don't jump into a discussion half-cocked and not knowing what to expect.

bob · Post by **bob** » Thu Jul 22, 2010 1:30 am

michiguel wrote:
bob wrote:
michiguel wrote:
bob wrote:
Joost Buijs wrote:I do understand that with an infinite depth you don't need eval at all. With a perfect evaluation function a 1 ply search will be sufficient as well. This is just theoretical.

It is my feeling that everything depends on the quality of the evaluation. When i look at my own engine, it has an evaluation function comparable to a 1600 player, but it plays at 2850 level just because it is very good at tactics. I'm pretty sure that when i'm able to improve the evaluation function to a higher level it's elo will go up.
OK, some background. It turns out that if you replace Crafty's evaluation with a pure random number, it plays well above 2,000 Elo. If you disable all the search extensions, reductions, no null-move and such, you still can't get it below 1800. There has been a long discussion about this, something I call "The Beal Effect" since Don Beal first reported on this particular phenomenon many years ago. So a basic search + random eval gives an 1800 player. Full search + full eval adds 1,000 to that. How much from each? Unknown. But I have watched many many stockfish vs crafty games and the deciding issue does not seem to be evaluation. We seem to get hurt by endgame search depth more than anything...
And that is where most (all?) engines had the biggest holes in evaluation... endgame!

Miguel
I have never heard _anyone_ say that Crafty's endgame evaluation is poor. In fact, several GM players have said exactly the opposite. Most ignore candidate passed pawns and such. We don't.

Sorry, the endgame analysis of any engine have huge holes, including Crafty.

Miguel

The issue would be do you believe StockFish's eval (endgame) is far superior to Crafty's? That was the implication I addressed. I do _not_ believe this, and have, in fact, noticed that we are getting out-searched for whatever reasons. As I had mentioned...

I can watch two programs play, and display analysis, and figure out who is searching deeper...

My comment was based on _watching_ games, where we get out-searched and then end up losing something tactically. Not positionally.

Daniel Shawul · Post by **Daniel Shawul** » Thu Jul 22, 2010 1:31 am

patience my friend.. games are running and i guarantee you would like it

bob · Post by **bob** » Thu Jul 22, 2010 1:35 am

BubbaTough wrote:
jwes wrote:
bob wrote: My comment was based on _watching_ games, where we get out-searched and then end up losing something tactically. Not positionally.
I have been seeing that too. Some endgame test positions stockfish solves 100x faster than crafty.
When its 100x faster, its often because the evaluation is giving hints to the search. These two things are not as independent as most people seem to imply.

-Sam

In the cases I see, this is not what is happening. Stockfish has way wrong evals, but is going significantly deeper and suddenly, half-way thru the root move list, says "aha"... I have not yet spent any time trying to see what the differences are, but there are definitely some differences.

Milos · Post by **Milos** » Thu Jul 22, 2010 2:58 am

jwes wrote: I think this is the greatest edge that human grandmasters still have over computers, is that they can evaluate an endgame as won or drawn while a computer might give it a score of 1.5 and the gm might possibly sacrifice material to bring about such an endgame.

GMs have advantage only in analysis of particular (bad ones for engines) endgames. In gameplay, since engines see so much ahead, they will (statistically speaking) rarely end up in those endgames they play badly and much more often in endgames they play good.
The problem is ppl notice only those endgames where engines perform poorly but not those where they perform great, even though statistically the first category is far less common. But that's a common thing in human nature...

BubbaTough · Post by **BubbaTough** » Thu Jul 22, 2010 3:16 am

Milos wrote: GMs have advantage only in analysis of particular (bad ones for engines) endgames. In gameplay, since engines see so much ahead, they will (statistically speaking) rarely end up in those endgames they play badly and much more often in endgames they play good.

That is funny, because I think the opposite is true, at least when it comes to entities (GMs or engines) that disagree on the assessment. The types of positions I have been referring to are rarely solvable via depth of search, particularly when it comes to considering transposing into them deeper in the tree. If one engine thinks opposite bishop endgames are pretty drawish, and the other engine does not, then when those two engines play they are MORE likely to end up in that type of position because both sides are aiming for it (one side thinks its winning because its up a pawn, while the other side thinks its totally drawn so both are happy).

-Sam

Milos · Post by **Milos** » Thu Jul 22, 2010 4:42 am

BubbaTough wrote:If one engine thinks opposite bishop endgames are pretty drawish, and the other engine does not, then when those two engines play they are MORE likely to end up in that type of position because both sides are aiming for it (one side thinks its winning because its up a pawn, while the other side thinks its totally drawn so both are happy).

Nope, it's not more likely since the logic is dual.
If in reality it's a draw with engine B pawn up, and engine A wrongly evaluates it as e.g. -1.5, and engine B correctly as 0 (even though being a pawn up), engine A will tend to avoid this position and find something better (which won't be hard since real evaluation is better than -1.5) and engine B will also try to improve by avoiding draw.

BubbaTough · Post by **BubbaTough** » Thu Jul 22, 2010 4:51 am

Milos wrote:
BubbaTough wrote:If one engine thinks opposite bishop endgames are pretty drawish, and the other engine does not, then when those two engines play they are MORE likely to end up in that type of position because both sides are aiming for it (one side thinks its winning because its up a pawn, while the other side thinks its totally drawn so both are happy).
Nope, it's not more likely since the logic is dual.
If in reality it's a draw with engine B pawn up, and engine A wrongly evaluates it as e.g. -1.5, and engine B correctly as 0 (even though being a pawn up), engine A will tend to avoid this position and find something better (which won't be hard since real evaluation is better than -1.5) and engine B will also try to improve by avoiding draw.

Usually when I see this happen (which is A LOT in my testing) the one that is a pawn up is misevaluating. So engine B thinks its position is worth +1.5, and engine A thinks it is drawish. They get to the position because engine B had a good position (say +0.8) and allows engine A to escape to a drawish position because engine B incorrectly thinks it is won.

This happens over and over and over in my testing. I am not sure exactly what you do related to computer chess, but I assume you do something because of all your posts and your strong confidence in your conclusions, so I am surprised your observations are so different from mine.

-Sam

jwes · Post by **jwes** » Thu Jul 22, 2010 5:03 am

Milos wrote:
jwes wrote: I think this is the greatest edge that human grandmasters still have over computers, is that they can evaluate an endgame as won or drawn while a computer might give it a score of 1.5 and the gm might possibly sacrifice material to bring about such an endgame.
GMs have advantage only in analysis of particular (bad ones for engines) endgames. In gameplay, since engines see so much ahead, they will (statistically speaking) rarely end up in those endgames they play badly and much more often in endgames they play good.
The problem is ppl notice only those endgames where engines perform poorly but not those where they perform great, even though statistically the first category is far less common. But that's a common thing in human nature...

I wrote evaluate, not analyze. There is a difference.

Milos · Post by **Milos** » Thu Jul 22, 2010 5:11 am

jwes wrote:I wrote evaluate, not analyze. There is a difference.

Engines don't evaluate, engines search, GM's evaluate. Eventually ppl will understand this.

Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Questions for the Stockfish team

Re: Questions for the Stockfish team