GM Kaufman v.Novag Constellation 3.6 QR Odds (Moves 1-40)

Discussion of computer chess matches and engine tournaments.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Steve B
Posts: 3697
Joined: Tue Jul 31, 2007 2:26 pm
Contact:

Re: GM Kaufman v. Novag Constellation 3.6 QR Odds Game

Post by Steve B » Thu Jul 08, 2010 10:52 am

lkaufman wrote:7.castles (0-0). Now I expect Connie to take another pawn, for fear that I might otherwise win one of them back! This is known as the "compromised" defense to Evan's (in normal chess). Although it has a bad reputation, my Rybka/IDeA analysis claims it to be good enough for equality, although I think that in practice White wins more often than Black in this line. Of course this has little to do with the present rook-odds situation.
it would be interesting to see if the Evans is even in Connie's book
even if it is.. the book line would not run very deep and be more then 25 years old so certainly not current

after Connie replies to your 7th move ..i will start a new game from the normal position(QR in place) to see if she plays the same moves as in this game up to this point
im guessing she probably will

Move Later Today Regards
Steve

bob
Posts: 20478
Joined: Mon Feb 27, 2006 6:30 pm
Location: Birmingham, AL

Re: GM Kaufman v. Novag Constellation 3.6 QR Odds Game

Post by bob » Thu Jul 08, 2010 3:46 pm

lkaufman wrote:Unlike you, I never play a gambit I don't believe to be sound, and I don't play Evan's Gambit (although I did as a teenager). But I believe it to be "almost sound", i.e. not worse than playing Black normally is.
My thinking, in the case of Valvo, was that I could "go down in flames" with a wild opening, or "get strangled" in a more normal opening. Given the choice of a lingering death, or a quick on that gets some good counter-play for a while, my choice was obvious. In real games I didn't play much of that stuff unless I knew something about my opponent that suggested doing so.

lkaufman
Posts: 3680
Joined: Sun Jan 10, 2010 5:15 am
Location: Maryland USA
Contact:

Re: GM Kaufman v. Novag Constellation 3.6 QR Odds Game

Post by lkaufman » Thu Jul 08, 2010 6:53 pm

Bob, here's a question inspired by one of your above comments. You indicated that when Crafty was playing another engine, playing a gambit like Evan's was a bad idea, basically just losing a pawn. This implies that it takes more comp to offset a pawn in engine vs. engine play than in human vs human play. However, your scores for mobility and/or piece location must be very high in Crafty, since (for example) after 1e4 c5 the position is already evaluated as nearly winning for White. Presumably these very high scores for dynamic factors (relative to material) scored well in your testing. So how do you reconcile these apparently contradictory facts? How can it be right to score unclear gambits as sound if the engine usually loses when playing them?

bob
Posts: 20478
Joined: Mon Feb 27, 2006 6:30 pm
Location: Birmingham, AL

Re: GM Kaufman v. Novag Constellation 3.6 QR Odds Game

Post by bob » Thu Jul 08, 2010 9:19 pm

lkaufman wrote:Bob, here's a question inspired by one of your above comments. You indicated that when Crafty was playing another engine, playing a gambit like Evan's was a bad idea, basically just losing a pawn. This implies that it takes more comp to offset a pawn in engine vs. engine play than in human vs human play. However, your scores for mobility and/or piece location must be very high in Crafty, since (for example) after 1e4 c5 the position is already evaluated as nearly winning for White. Presumably these very high scores for dynamic factors (relative to material) scored well in your testing. So how do you reconcile these apparently contradictory facts? How can it be right to score unclear gambits as sound if the engine usually loses when playing them?
I think the idea is split into two parts.

(1) giving up a pawn. We have a lot of evaluation terms that can cause crafty to sac a pawn, and when we get to king safety, even a piece. But any of these can blow up. Nothing worse than seeing the eval say 0.1, yet when I look at just material, I see -1.00. And then the eval goes 0.1, -.05, -0.2, -0.4, ... and slowly settles to -1.0. Why? I believe it is because it is really difficult, in a program, to sum up "advantages" equally. Some are pretty permanent (weak pawns, but not always), while some are transient (knight temporarily trapped on a1, for example, but it can escape.) As a human, I automatically look at these things and create 3 baskets of advantages. (1) permanent (say a knight on E5, no enemy pawns on F or D files to chase it away, no enemy knights nor an enemy bishop that can trade itself for that knight. So that is going to be around for a while. (2) fleeting. Things like an isolated pawn that can soon be traded away. (3) unclear. Things that are hard to categorize as either of the first two cases. I tend to look at these more closely to try to resolve the issue so that everything is a 1 or a 2.

Computer programs seem to have a tough time with this concept. I have seen major positional advantages evaporate over time. If you give up a pawn for such temporary advantages, things go bad.

2. not giving up a pawn but doing something else to get a positional advantage. Giving up the bishop pair for lasting pawn weaknesses. Assuming that they actually last and that you can exploit them.

From watching a lot of games, I generally feel better being material up and defending, rather than being material down and having the onus on me (my program, actually) to prove that it has sufficient compensation for the material.

Computers are bad about making a "lemon move" here or there when it becomes difficult to improve their positions further. And often, one "passing move" is enough to let the opponent begin to untangle whatever it is that got tangled when it took the pawn. And the bind unravels, and you are left with a missing pawn to deal with.

For years I followed the "Ken Thompson" mantra that said "nothing is worth a pawn." But after 15+ years of playing against GM players on ICC, I discovered that was not the smartest approach. :) Nowadays we may well have gone too far in certain things, but I occasionally run "sanity tests". I take some major positional component (not a single value) such as passed pawns, or king safety, or mobility, and "scale" those values and run cluster matches. I will try things like 90%, 80% and then go on the other side with 110% and 120%.

What I want to see is an Elo "peak" at 100%, with it dropping off on either side. If it actually would improve at 90%, we'd then look at scaling the terms to pick up that lost Elo...

So I _hope_ our stuff is not too big. But here's a risk. Suppose you have some term that would really be best fit by a 3rd degree polynomial. But you don't know that. And you choose to fit it with a linear equation (straight line) by adjusting the slope and constant to get the best Elo. What you end up with is a term that is best for overall Elo, but which is quite bad for some cases. And if you kept the "linear fit" for the places where it worked well, and switched to a polynomial in the other places, you would get an even bigger Elo gain. But not knowing about the polynomial, that's lost. And there is no doubt that some of my positional terms fit *exactly* that description. Tuned for best overall Elo, but could be done better with an alternate formulation. Same is true for other things.

Good case is the LMR stuff. I looked briefly at the Stockfish stuff of trying to increase the reduction as you complete more and more moves. But here's a question. Given two sets of moves, MA(i) and MB(i) where both MA and MB have the same number of moves, but in MA the first couple are good and the rest are lemons, while in MB all but the last couple are good. What is the justification for reducing the middle move in MA the same as the middle move in MB, when the middle move in MA is bad while the middle move in MB is good? No justification at all. Just force-fitting a linear value to a polynomial data set, and finding the best liner approximation you can get, even though it is bad in many cases.

I do not like the idea of reducing (or extending) something just because it comes before or after the Nth move in the list. Surely there must be a better reason. I certainly do not do that kind of nonsense as a human. I believe we can do this stuff pretty well (fit the wrong type of curve, but at least optimize the poor fit so that we arrive at the best "poor fit" we can get.)

So with that level of distrust, I'd think it is pretty obvious why I'd rather have "pawn in hand" and fight to nullify the compensation, as opposed to having the compensation and fighting to eventually convert it to a pawn or more in hand...

Hope that wasn't too much rambling. It's an interesting topic. Particularly when I know we are optimizing something to work best in the general case, rather than modifying the code so that we have different alternatives for different cases so that the overall fit is much better. But we are working on these issues regularly...

Steve B
Posts: 3697
Joined: Tue Jul 31, 2007 2:26 pm
Contact:

Re: GM Kaufman v. Novag Constellation 3.6 QR Odds Game

Post by Steve B » Thu Jul 08, 2010 9:58 pm

Steve B wrote:
lkaufman wrote:7.castles (0-0). Now I expect Connie to take another pawn, for fear that I might otherwise win one of them back! This is known as the "compromised" defense to Evan's (in normal chess). Although it has a bad reputation, my Rybka/IDeA analysis claims it to be good enough for equality, although I think that in practice White wins more often than Black in this line. Of course this has little to do with the present rook-odds situation.
it would be interesting to see if the Evans is even in Connie's book
even if it is.. the book line would not run very deep and be more then 25 years old so certainly not current

after Connie replies to your 7th move ..i will start a new game from the normal position(QR in place) to see if she plays the same moves as in this game up to this point
im guessing she probably will

Move Later Today Regards
Steve
The Evans is not in Connie's book
in playing the game over with NO QR odds ..Connie made every move exactly the same as in this QR odds except for this 7th move now
in a normal game she would play here 7..Bxc3

however in this game she played ..
7..dxc3

[d] r1bqk1nr/pppp1ppp/2n5/b7/2B1P3/2p2N2/P4PPP/1NBQ1RK1 w kq - 0 8

The Feast Continues Regards
Steve

lkaufman
Posts: 3680
Joined: Sun Jan 10, 2010 5:15 am
Location: Maryland USA
Contact:

Re: GM Kaufman v. Novag Constellation 3.6 QR Odds Game

Post by lkaufman » Thu Jul 08, 2010 10:11 pm

(to Bob) You make a lot of good points. I particularly agree with your comment about basing LMR on the move number being rather silly, but it's hard to find something that works better. Anyway, it seems to me that if the program really does better to be up a pawn for what it considers "fair" compensation than the reverse, shouldn't it be true that multiplying ALL non-material terms by some constant less than one (maybe 0.8) should improve self-play results, since the only important consequence should be to avoid questionable pawn sacs (and to grab borderline pawns)? But surely you would have done this if it were so. So where is the fallacy?

bob
Posts: 20478
Joined: Mon Feb 27, 2006 6:30 pm
Location: Birmingham, AL

Re: GM Kaufman v. Novag Constellation 3.6 QR Odds Game

Post by bob » Fri Jul 09, 2010 12:19 am

lkaufman wrote:(to Bob) You make a lot of good points. I particularly agree with your comment about basing LMR on the move number being rather silly, but it's hard to find something that works better. Anyway, it seems to me that if the program really does better to be up a pawn for what it considers "fair" compensation than the reverse, shouldn't it be true that multiplying ALL non-material terms by some constant less than one (maybe 0.8) should improve self-play results, since the only important consequence should be to avoid questionable pawn sacs (and to grab borderline pawns)? But surely you would have done this if it were so. So where is the fallacy?
The fallacy is in believing that any of this is logical. :) I suppose I ought to create a few hundred pawn-down positions that are considered playable, and try some cluster testing to get a precise measurement, something I have _not_ done. And as the old computer chess (or regular chess) saying goes, you probably remember your losses much better than your wins, which means my observation may well be prejudiced without my knowing.

I have spend _significant_ time in dealing with material imbalances. Where you give up a pawn, or an exchange, in return for something that is hopefully tangible. But I eventually gave up on the exchange sacs as I never found something that worked well enough. Yes, Roman (or someone else) would occasionally say "wow, perfect exchange sac" but more often than not, being an exchange down led to trouble, even knowing that KR vs KB or KN is a draw.

Due to the ease of picking up stray Elo points here and there, we've spent more time tuning what we have than in adding new code to address apparent weaknesses. We have done a good bit of the latter, but way more of the former since cluster testing is quick and accurate.

lkaufman
Posts: 3680
Joined: Sun Jan 10, 2010 5:15 am
Location: Maryland USA
Contact:

Re: GM Kaufman v. Novag Constellation 3.6 QR Odds Game

Post by lkaufman » Fri Jul 09, 2010 12:35 am

I have another theory about this issue. Perhaps high scores for dynamic features work well for all the crazy positions one encounters in a search, but for sensible positions the "proper" scores for dynamic features are much lower. What do you think?

bob
Posts: 20478
Joined: Mon Feb 27, 2006 6:30 pm
Location: Birmingham, AL

Re: GM Kaufman v. Novag Constellation 3.6 QR Odds Game

Post by bob » Fri Jul 09, 2010 3:25 am

lkaufman wrote:I have another theory about this issue. Perhaps high scores for dynamic features work well for all the crazy positions one encounters in a search, but for sensible positions the "proper" scores for dynamic features are much lower. What do you think?
That's possible. The enormous trees we search today appear to be self-healing. Particularly if you look at the hash collision experiments Cozzie and I ran a couple of years ago. And while you might not have seen the discussion, I discovered that my "skill" option is not as effective near the "bottom" of the range. It appears to be impossible to get below roughly 1800 on good hardware, if you use a perfectly random evaluation with no material values or anything else, just a random number. Pretty damned amazing, in fact...

lkaufman
Posts: 3680
Joined: Sun Jan 10, 2010 5:15 am
Location: Maryland USA
Contact:

Re: GM Kaufman v. Novag Constellation 3.6 QR Odds Game

Post by lkaufman » Fri Jul 09, 2010 4:24 am

Wow, that is amazing! Can you even begin to explain how this is possible?

Post Reply