MTD(f) experiments with Crafty

Eric Stock · Post by **Eric Stock** » Fri Dec 18, 2009 5:33 am

Hello, I am doing my masters dissertation on chess search algorithms. I have modified Crafty v23.0 to use MTD(f). I then compared the MTD(f) version with regular Crafty. I disabled null-move pruning, futility pruning, razoring and lazy eval in both versions. The tests I ran indicated that there isn't much difference between the versions. 1000 games searching to 7 ply were split almost exactly 50/50. 1000 games searching 1 second per move favored MTD 51/49. I am testing using Arena and using an opening book called little-mainbook.abk to randomize the starting positions.

When I enabled null-move pruning in both versions, things became a bit more interesting as the MTD version is clearly weaker than regular Crafty.
Regular wins 58/42 in 1000 games and 52/48 with 1 second per move.

I have read the many archive posts about MTD(f). It seems a problem with MTD(f) is that it allows null moves to be considered part of the principal variation. It would be like allowing the null move value to update alpha and possibly be considered a best move in Principal Variation Search.

I understand that the PV in MTD(f) is the intersection of two move sequences from an upper bound and a lower bound where the bounds are equal, thus proving the minmax value.

I have made an attempt at eliminating null moves from the PV in MTD(f).

The basic idea is as follows:
1. At each MT pass, all left most nodes from the root to the first leaf node have null move disabled. This sequence then becomes a starting pv which contains no null moves.
2. If any of these nodes are expected all nodes and they get refuted by a child, then the pv has changed
3. There might be a null move in this new pv. We must detect the situation where there are null moves and prevent them from entering the pv.
4. The nodes where this will be handled are children of current pv all nodes. Thus, the nodes are expected cut nodes.

At these nodes, the situation is handled as follows:

5. Disabling null-move pruning when the first move at an expected cut node returns a value below cutoff threshold.
6. If the pv generated by this first move contains a null move, this must be detected and flagged for later.
7. If after searching the rest of the moves, if none of them generate a cut-off, then this node will become part of the new pv.
8. The first move searched at this expected cut node which turned out to be an all node, may contain null moves. In Step 2, if this was the case, a flag was set. In this case, the node must be re-searched with null-move disabled.

I have made some diagrams and placed a better description of this idea at my website here:
http://sites.google.com/site/ericstockg ... ssertation

Any suggestions on whether or not this would work would be greatly appreciated.

Eric Stock

bob · Post by **bob** » Fri Dec 18, 2009 10:59 pm

Eric Stock wrote:Hello, I am doing my masters dissertation on chess search algorithms. I have modified Crafty v23.0 to use MTD(f). I then compared the MTD(f) version with regular Crafty. I disabled null-move pruning, futility pruning, razoring and lazy eval in both versions. The tests I ran indicated that there isn't much difference between the versions. 1000 games searching to 7 ply were split almost exactly 50/50. 1000 games searching 1 second per move favored MTD 51/49. I am testing using Arena and using an opening book called little-mainbook.abk to randomize the starting positions.

When I enabled null-move pruning in both versions, things became a bit more interesting as the MTD version is clearly weaker than regular Crafty.
Regular wins 58/42 in 1000 games and 52/48 with 1 second per move.

I have read the many archive posts about MTD(f). It seems a problem with MTD(f) is that it allows null moves to be considered part of the principal variation. It would be like allowing the null move value to update alpha and possibly be considered a best move in Principal Variation Search.

I understand that the PV in MTD(f) is the intersection of two move sequences from an upper bound and a lower bound where the bounds are equal, thus proving the minmax value.

I have made an attempt at eliminating null moves from the PV in MTD(f).

The basic idea is as follows:
1. At each MT pass, all left most nodes from the root to the first leaf node have null move disabled. This sequence then becomes a starting pv which contains no null moves.
2. If any of these nodes are expected all nodes and they get refuted by a child, then the pv has changed
3. There might be a null move in this new pv. We must detect the situation where there are null moves and prevent them from entering the pv.
4. The nodes where this will be handled are children of current pv all nodes. Thus, the nodes are expected cut nodes.

At these nodes, the situation is handled as follows:

5. Disabling null-move pruning when the first move at an expected cut node returns a value below cutoff threshold.
6. If the pv generated by this first move contains a null move, this must be detected and flagged for later.
7. If after searching the rest of the moves, if none of them generate a cut-off, then this node will become part of the new pv.
8. The first move searched at this expected cut node which turned out to be an all node, may contain null moves. In Step 2, if this was the case, a flag was set. In this case, the node must be re-searched with null-move disabled.

I have made some diagrams and placed a better description of this idea at my website here:
http://sites.google.com/site/ericstockg ... ssertation

Any suggestions on whether or not this would work would be greatly appreciated.

Eric Stock

I experimented with this years ago when it became a hot topic. Had absolutely no luck. One key change is to modify the hash table processing to store both upper and lower bounds to avoid losing critical information as you bounce on either side of the true score. I had no problems getting the correct results out of it, but I could never make MTF(f) search faster, which was the killer. It was almost always slower, and frequently significantly slower.

Don · Post by **Don** » Mon Dec 21, 2009 4:31 am

Eric Stock wrote:Hello, I am doing my masters dissertation on chess search algorithms. I have modified Crafty v23.0 to use MTD(f). I then compared the MTD(f) version with regular Crafty. I disabled null-move pruning, futility pruning, razoring and lazy eval in both versions. The tests I ran indicated that there isn't much difference between the versions. 1000 games searching to 7 ply were split almost exactly 50/50. 1000 games searching 1 second per move favored MTD 51/49. I am testing using Arena and using an opening book called little-mainbook.abk to randomize the starting positions.

When I enabled null-move pruning in both versions, things became a bit more interesting as the MTD version is clearly weaker than regular Crafty.
Regular wins 58/42 in 1000 games and 52/48 with 1 second per move.

I have read the many archive posts about MTD(f). It seems a problem with MTD(f) is that it allows null moves to be considered part of the principal variation. It would be like allowing the null move value to update alpha and possibly be considered a best move in Principal Variation Search.

I understand that the PV in MTD(f) is the intersection of two move sequences from an upper bound and a lower bound where the bounds are equal, thus proving the minmax value.

I have made an attempt at eliminating null moves from the PV in MTD(f).

The basic idea is as follows:
1. At each MT pass, all left most nodes from the root to the first leaf node have null move disabled. This sequence then becomes a starting pv which contains no null moves.
2. If any of these nodes are expected all nodes and they get refuted by a child, then the pv has changed
3. There might be a null move in this new pv. We must detect the situation where there are null moves and prevent them from entering the pv.
4. The nodes where this will be handled are children of current pv all nodes. Thus, the nodes are expected cut nodes.

At these nodes, the situation is handled as follows:

5. Disabling null-move pruning when the first move at an expected cut node returns a value below cutoff threshold.
6. If the pv generated by this first move contains a null move, this must be detected and flagged for later.
7. If after searching the rest of the moves, if none of them generate a cut-off, then this node will become part of the new pv.
8. The first move searched at this expected cut node which turned out to be an all node, may contain null moves. In Step 2, if this was the case, a flag was set. In this case, the node must be re-searched with null-move disabled.

I have made some diagrams and placed a better description of this idea at my website here:
http://sites.google.com/site/ericstockg ... ssertation

Any suggestions on whether or not this would work would be greatly appreciated.

Eric Stock

Eric,

Cilkchess used MTD(f) and as it turns out Aske Platt was a postdoc at MIT at the time and we were friends. Although I was the one that actually implemented it I was able to get feedback from Aske.

Cilkchess was a faster program once I implemented MTD(f) and it was a small improvement for us.

I can easily imagine that it may not be so for some programs. Some issues are the resolution of your scoring algorithm, and your hash table implementation. In the end I still prefer PVS as it seems to cause less trouble. MTD(f) is highly dependent on the hash tables too and would suffer more than PVS is the table was too small.

If you are doing PVS and also using an aspiration window, it's almost as if you are already using MTD(f), as this just takes the search to the next level - everything is a zero width window.

In theory, you probably cannot do better. In practice it's more complicated as you are probably aware.

diep · Post by **diep** » Tue Dec 22, 2009 1:58 am

Eric Stock wrote:Hello, I am doing my masters dissertation on chess search algorithms. I have modified Crafty v23.0 to use MTD(f). I then compared the MTD(f) version with regular Crafty. I disabled null-move pruning, futility pruning, razoring and lazy eval in both versions.

That's not realistic to disable Eric.

If you go toy on random search trees like Aske Plaat did do, without qsearch even, then obviously MTD works fine.

The real problem is once you get bigger depths WITH all those enhancements. MTD is really having BAD worst cases at depths above 12 ply and with inconsistent search trees.

Then you simply automatically schedule crafty at 200 positions and let it search for 7 minutes there or so. 200 positions is what you need roughly for 2x SD. Though i know SMK thinks the 213 positions i use to test (for example parallel speedup) is not enough. He wants 1000.

You do that for both versions. That's 2 days of automatic running.
Goes easy, Bob can explain how.

then you look at WORST CASE depth you get and WORST case depth PVS gets. You lose all your game of course in case you have every game one or 2 times a depth of 12, whereas normal crafty will probably have a worst case depth of 17 to 18 ply.

Chess is a worst case game. ParSOS regurarly i saw stuck at 11-13 ply whereas it on average in that same middlegame got 17 ply or so. This is how i saw ParSos lose games. As i'm a 1500 elo better chessplayer than him or so, i'm sure he didn't see that himself.

If you can prove with this that MTD is real ugly, you achieve something very
great, because i feel it is a real invention and it is at the name of Aske, but in computerchess we have something called a hashtable. If that functions well, then MTD won't work at all of course.

You can mathematical show insight why.

Maybe i shouldn't modify this posting to write that and write a new one.

Thanks,
Vincent

diep · Post by **diep** » Tue Dec 22, 2009 2:09 am

The whole problem of MTD boils down to the hashtable.

In diep i use pvs.
initial window: -infinite,infinite.

let's look at iteration 16

from hashtable yoy get the first 15 ply.
Let's ignore search depths tinier than 15 ply ok.
To quote SMK:
"i don't even print those to the screen as it reaches that depth too quickly".

Then from the hashtable you get the best line.

to that line you add a very tiny tree of 1 ply.

Now you have a GREAT SUPERIOR BOUND that total outperforms any bound you GAMBLED with MTD of course in the root.

If you gamble a wrong bound with MTD you have to search a total different tree. In case of nullmove for example, nullmove suddenly will work very bad where in PVS it worked great, as we directly have the correct bound there.

But with MTD using nullmove and other forms of selectiveness, you have to search a total new tree.

Not using nullmove, odds is big you don't have to search that much of a different tree, as you're doing probably a total fullwidth tree now.

Reductions and other forms of selectivity work similar here.

Of course lazy evaluation is a tad more tricky with MTD.

Messing up your hashtable with lazy evaluated nodes sometimes has hidden luck effects (especially for your opponents), as you are going to prune something now that you actually don't want to prune actually. You prune it thanks to hashtable.

Ok enough i need to go to bed quickly.

Eric Stock · Post by **Eric Stock** » Tue Dec 22, 2009 3:29 am

Bob, Don and Vincent, thank you for the replies.

Here is an update on my testing. I am currently testing MTDCrafty vs Regular Crafty with only null-move disabled. I modified Lazy Eval, and Futility pruning to work in a fail soft environment, plus I increased the value of the "lazy-cutoff-score" in Crafty. After 366 games, MTD has scored 188, regular Crafty 178. I will play 1000 games. The time controls are 1 sec per move. I am using littlemainbook which comes with Arena to randomize the starting positions. Both programs are searching 9-11 plies in the middle game.

When I test with WAC, at 1 second per move, regular Crafty solves 290 and MTD solves 287. Any test position where Crafty's search changes drastically between iterations favours the regular Crafty as MTD must make many researches to get the new score. However, these test suite results don't seem to translate when testing with games (at least the way I am doing it).

Vincent, I am very interested in some of the things you mention.

Perhaps, my testing needs to be improved? Could you suggest some test suites I can use. I am aware that Win At Chess is far too easy for a program of Crafty's strength. What is the name of the test suite you use which has 213 positions?

Also, do you have any suggestions for my test matches. Should I increase the time controls?

As far as null-move pruning is concerned. My tests indicate that it works poorly with MTD(f) in comparison to PVS. When playing 1000 matches to depth 7, MTD Crafty scored 420 vs 580 for PVS. At 1 second per move, it is a little closer, but regular Crafty is still clearly stronger.

I have an idea to make null-move work with MTD. I am going to try testing it next. To me it makes sense, but I try stuff all the time and usually it doesn't work, but I have a good feeling about this one, so we'll see.

To summarize,

1. The only problem I see with MTD is that null-move doesn't work properly with it.

2. I am aware of the problem of MTD(f) having a bad guess and having to do many re-searches (several hundred). There are a couple of WAC positions where this happens. However, my test-matches do not seem to indicate this is a problem in real - games.

3. My techniques for testing need to be improved.

Eric Stock

Gian-Carlo Pascutto · Tue Dec 22, 2009 10:55 am

Eric Stock wrote: Perhaps, my testing needs to be improved? Could you suggest some test suites I can use. I am aware that Win At Chess is far too easy for a program of Crafty's strength. What is the name of the test suite you use which has 213 positions?

In my opinion using testsuites is 100% a waste of time, they don't tell you anything about the real performance. Playing a large amount of games is much better.

Note that Vincent talks about checking the "worst case" with the testsuite instead of the averages. This is because in a game that's what matters. But it's still much better to measure the playing strength with games directly.

Also, do you have any suggestions for my test matches. Should I increase the time controls?

It's a good idea to "validate" your blitz results at a slower timecontrol as well.

1. The only problem I see with MTD is that null-move doesn't work properly with it.

But that's not a small problem, isn't it?

From your initial post, I gather the problem is positions where you fail high on the nullmove first (>= beta, no best move) and then fail low on the same position (<= alpha, no best move), so you end up with a "true" score (for MTD) and no move.

The approach you described for dealing with it seems sensible. If I got it you simply detect when a nullmove enters your PV, and force a research of that tree with nullmove disabled.

What I'm not so sure about then is what it means if you get an inconsistent score back. In a PVS program you would throw open your window and force a research (maybe even a time extension) with most pruning disabled, because you've just detected a critical position where your pruning is failing.

It seems hard to get this safety-net in MTD based searches.

diep · Post by **diep** » Tue Dec 22, 2009 6:04 pm

Eric Stock wrote:Bob, Don and Vincent, thank you for the replies.

Here is an update on my testing. I am currently testing MTDCrafty vs Regular Crafty with only null-move disabled. I modified Lazy Eval, and Futility pruning to work in a fail soft environment, plus I increased the value of the "lazy-cutoff-score" in Crafty. After 366 games, MTD has scored 188, regular Crafty 178. I will play 1000 games. The time controls are 1 sec per move. I am using littlemainbook which comes with Arena to randomize the starting positions. Both programs are searching 9-11 plies in the middle game.

When I test with WAC, at 1 second per move, regular Crafty solves 290 and MTD solves 287. Any test position where Crafty's search changes drastically between iterations favours the regular Crafty as MTD must make many researches to get the new score. However, these test suite results don't seem to translate when testing with games (at least the way I am doing it).

Vincent, I am very interested in some of the things you mention.

Perhaps, my testing needs to be improved? Could you suggest some test suites I can use. I am aware that Win At Chess is far too easy for a program of Crafty's strength. What is the name of the test suite you use which has 213 positions?

Also, do you have any suggestions for my test matches. Should I increase the time controls?

As far as null-move pruning is concerned. My tests indicate that it works poorly with MTD(f) in comparison to PVS. When playing 1000 matches to depth 7, MTD Crafty scored 420 vs 580 for PVS. At 1 second per move, it is a little closer, but regular Crafty is still clearly stronger.

I have an idea to make null-move work with MTD. I am going to try testing it next. To me it makes sense, but I try stuff all the time and usually it doesn't work, but I have a good feeling about this one, so we'll see.

To summarize,

1. The only problem I see with MTD is that null-move doesn't work properly with it.

2. I am aware of the problem of MTD(f) having a bad guess and having to do many re-searches (several hundred). There are a couple of WAC positions where this happens. However, my test-matches do not seem to indicate this is a problem in real - games.

3. My techniques for testing need to be improved.

Eric Stock

Hi Eric,

Very good to discuss what you do in public. Please note that this is very valuable and very good. The way how to measure is very important.

Of course what GCP mentions is most important that's playing a lot of games.

Depth limitation for the games i feel is not a good idea.

To make things clear to start with is running a bunch of positions, preferably NOT testset positions. Those positions are all fail high positions, so not very relevant. The big problem with MTD is namely positions where the program is in doubt and doubts everything and where score drops a little. Fail low positions quality therefore more than fail high positions.

I'm not using testset positions, but a bunch of positions taken from games.
I'll give an email with them to you if you drop me an email or message with your email. My email adress is easy to find: diep at xs4all dot nl.

Just give each engine a few minutes.

Please don't do tests of X seconds. Crafty has built in commands to run a bunch of positions full automatic. You want to run this while you're celebrating christmas.

For a master thesis i would find it more than acceptable if you run these bunches of positions at the full blown versions of crafty 23.0, the normal crafty 23.1 version and your version with MTD version.

So turn on everything that benefits crafty 23.0 for playstrength and then also run the same things at your modified version. Single core measurement is enough.

Use everywhere the same settings and use a time limit of say 3 minutes or 7 minutes. I'd prefer 7 minutes, that makes it exactly a 24 hours run. Also the run has significance for future then.

Then you write down from each position in an excel sheet the biggest depth that was fully FINISHED. So *started* is not enough.

Now with those 2 runs of each 24 hours you can then do math.

You calculate, full automatic, 2 different things:
a) average depth difference
b) worst case depth difference

And you can also calculate from it things like errors and so on.

All this is a first step. Setting up a method of playing games is much tougher and requires way more time. Of course it is much better.

However my assumption is that with that B worst case depth difference you can already show such a huge worst case in MTD over PVS, that testing games doesn't even make sense anymore. My experience is that it can soon run up to 5 ply depth difference for MTD, which in competative computerchess is complete suicide.

Vincent

diep · Post by **diep** » Tue Dec 22, 2009 6:13 pm

Oh please note there is different forms of MTD.

you can do a bunch of windows with the minimum window difference; idea then is that this is relative cheap research to do and it increases the bound with 0.01 in case of crafty. ParSOS is using this.

the other idea is to use some sort of binary jumping for MTD.

What would be real cool is if you equip crafty with aspiration window and also do without and compare the difference of that also once your testsuite has been set up.

note i refer to parsos a lot as this used to compete in tournaments and rudolf huber has built up a lot of experience with mtd. let me wake him up at skype now.

Vincent

p.s. so all together you can produce a number of outputs at a bunch of positions. put them in 1 big table in excel and you're nearly done already.

diep · Post by **diep** » Tue Dec 22, 2009 6:21 pm

mistake i made: i meant version 23.0 everywhere, not 23.1 as you modified 23.0.

Thanks,
Vincent

MTD(f) experiments with Crafty

MTD(f) experiments with Crafty

Re: MTD(f) experiments with Crafty

Re: MTD(f) experiments with Crafty

Re: MTD(f) experiments with Crafty

Re: MTD(f) experiments with Crafty

Re: MTD(f) experiments with Crafty

Re: MTD(f) experiments with Crafty

Re: MTD(f) experiments with Crafty

Re: MTD(f) experiments with Crafty

Re: MTD(f) experiments with Crafty