eval blending?

clumma · Post by **clumma** » Fri May 05, 2017 2:25 am

Pardon if this has been discussed before, but has anyone tried blending evals of different engines to create an all-computer centaur?

Blending is a popular technique in machine learning contests (it won the Netflix prize).

We know something about the scaling of engines when adding cores. At what point are we better off using some cores for different engines?

For example, use 2 cores each for Stockfish, Komodo, and Houdini. Always make Stockfish move unless both Komodo and Houdini agree on a different one, in which case make that.

In another variant, run the engines with multi-pv = 2 and make the choice of the engine that is 'most sure' (shows largest eval difference between first and second choices). Here it may be good to normalize all the evals first to make them more consistent across engines.

Other schemes are possible. Evals of pvs can be averaged across engines (again probably after normalizing). Or we can try to identify phases of game where each engine is best, and simply use the best engine at each phase (or when certain material imbalances are present etc).

Has it been done?

Nay Lin Tun · Post by **Nay Lin Tun** » Fri May 05, 2017 3:24 am

Yes stockfish search with Komodo evaluation!!.(In your idea, it is quite similar to asking three people to search the same dark room three times without dedicated area of search by each individual.There will be more repetitive search and more chance of missing ).So let only one engine search but the evaluation of the most important PVs by three engines will be interesting.

tmokonen · Post by **tmokonen** » Fri May 05, 2017 3:41 am

There's a program called ChessCombi that sort of does what you are asking for. It runs as a UCI engine within your chess GUI, and combines the evaluation of two different UCI engines. Unfortunately, it can only combine a maximum of two engines. I've not actually used it, I've only read about it in other forums.

Download link:
http://www.mediafire.com/file/zl6ye8kl4 ... ombiV1.zip

Rybka forum discussion:
http://rybkaforum.net/cgi-bin/rybkaforu ... chesscombi

tmokonen · Post by **tmokonen** » Fri May 05, 2017 4:10 am

Sorry, bad Rybka forum link:

http://rybkaforum.net/cgi-bin/rybkaforu ... order=desc

kbhearn · Post by **kbhearn** » Fri May 05, 2017 4:17 am

In its simplest form the idea has problems - namely that when you say have your trio of engines and one disagrees you really don't know if it's seeing more than the other two or if it's blind to something the other two are seeing.

What you'd probably need to do is combine it in an external idea-like framework that maintains a tree with each engine's evaluations and in nodes that are of particular interest to any of the 3 engines you expand that node and get second opinions on the child nodes again from all 3 engines and minimax the results back to the root for each. eventually time would force you to have to just go with one of the moves if the disagreement couldn't be resolved but at least it'd give a chance for the outlier engine to demonstrate it knows better or to realise it was the one missing something.

clumma · Post by **clumma** » Fri May 05, 2017 6:46 am

tmokonen wrote:There's a program called ChessCombi that sort of does what you are asking for.

Thanks!

Evert · Post by **Evert** » Fri May 05, 2017 8:05 am

Have you ever tried to play a game where multiple people suggest moves? In my experience this leads to erratic and poor play. Of course, computers are not human.

clumma wrote:Pardon if this has been discussed before, but has anyone tried blending evals of different engines to create an all-computer centaur?

Nitpick: mixing evaluations in the GUI does nothing because it does not affect move selection, so as a term this does not capture the essence of what you want to do.

For example, use 2 cores each for Stockfish, Komodo, and Houdini. Always make Stockfish move unless both Komodo and Houdini agree on a different one, in which case make that.

How often does that situation occur?
You can verify this by taking a game and letting all three engines analyse (with multi-PV).

In another variant, run the engines with multi-pv = 2 and make the choice of the engine that is 'most sure' (shows largest eval difference between first and second choices). Here it may be good to normalize all the evals first to make them more consistent across engines.

Bad idea. Multi-pv is not free and costs valuable time. You're also assuming that a large gap means something. You don't know that. Perhaps the engine with the smallest gap sees something that the others don't?

Other schemes are possible. Evals of pvs can be averaged across engines (again probably after normalizing).

Again, this does not affect move selection.

Or we can try to identify phases of game where each engine is best, and simply use the best engine at each phase (or when certain material imbalances are present etc).

It would probably be better to improve the respective engines.

What might have some value is having engines ponder the proposed move, sometimes engines see why a move is bad immediately after playing it. You still need to communicate that information somehow though, and you don't need multiple engines to do it (and even then, a better suggestion is to try and fix it in the engine in the first place).

clumma · Post by **clumma** » Fri May 05, 2017 8:29 am

Bad idea. Multi-pv is not free and costs valuable time. You're also assuming that a large gap means something. You don't know that. Perhaps the engine with the smallest gap sees something that the others don't?

I'm not assuming anything, other than that it's an interesting idea which is worth trying.

flok · Post by **flok** » Fri May 05, 2017 8:42 am

I think DBPPuppetMaster on freechess does this. With 8 engines that is.

Cardoso · Post by **Cardoso** » Fri May 05, 2017 11:00 am

Carl, that idea simply could never work.
Different engines have different evals, and different searches.
Even if one engine gives a score of an half a pawn up, how do you know it is right?
An higher score doesn't mean it is better than the analysis of other engine with a lower score.
The score for a move is allways an aproximation, wich can climb up or come down as the game progresses.
Maybe other engine wich gives a lower eval has the best sequence.
The search part of an engine works by making line eval comparisons (simply put), how would you compare lines (wich are different or even slightly different) that come from different engines?
Each engine hasn't a clue of other's analysis, they are completely disconnected.
Also different extensions/reductions give different terminal positions where the eval is applied.
Suppose you have 3 or 4 grandmasters and each give a different move, who's move are you going to choose?

Alvaro

eval blending?

eval blending?

Re: eval blending?

Re: eval blending?

Re: eval blending?

Re: eval blending?

Re: eval blending?

Re: eval blending?

Re: eval blending?

Re: eval blending?

Re: eval blending?