Chessprogams with the most chessknowing

carldaman · Post by **carldaman** » Sat Dec 20, 2014 1:48 am

Henryval wrote:Which Chessprogram(s) have the most implemented chessknowing. And are there more than a few proofs out there in the chesscommunity?

I wonder why we don't have any ranking about this issue.

Diep was supposed to be loaded with knowledge. Too bad it wasn't made publicly available by its author, so it's hard to be more specific.

I wish there were more engines that emphasized knowledge and positional factors, even at the expense of tactics or speed. Even weaker engines are already quite strong tactically when compared to a human master, but many are so lacking in positional understanding and (especially) endgame knowledge.

A more viable AI-oriented approach would just build up the knowledge and produce a more well-rounded, human-like engine. This would likely serve to de-emphasize tactics, since it would by necessity slow down the search speed.

The prevalent (non-AI) philosophy among developers is that "you can't make any changes, no matter how promising, that weaken the engine - even by 1 Elo point!". This rigid mode of thinking can work very well as far gaining points against other engines, but tends to leave noticeable knowledge gaps that show up in actual play, IF anyone pays close attention to the games.

Don't get me wrong, the strongest engines of today play marvelous chess for the most part, but then how come we still find common openings and structures (like the KID) that they grossly misevaluate?

At the end of the day, the Elo list-climbing approach will clearly beat out the knowledge-laden one in terms of pure chess strength, and this puts the latter in an undeserved bad light. My take on this is that we should not lose sight of the benefits of having knowledge-based engines as realistic analysis and sparring partners. After all, engines are created for humans' enjoyment.

I wouldn't gripe about a hypothetical program that could play a decent King's Indian on its own but was "only" about 2400-2500 Elo strong due to relative "tactical weakness" induced by all the extra knowledge! Would anyone view it as a "failure" because it's nowhere near Komodo in strength? I know I'd be impressed by such an achievement!

If you are a programmer who would like to add a lot more knowledge to your engine, I'd encourage you to go ahead without hesitation and let the chips fall where they may. Never let anyone cause you to mistake your achievements as being "inferior" to some pre-defined standard.

Regards,
CL

bob · Post by **bob** » Sat Dec 20, 2014 2:49 am

Laskos wrote:
bob wrote:
This doesn't work very well. depth=1 does NOT mean the same thing to everyone. To some it is simply a 1 ply search followed by q-search. Some allow check extensions before going to q-search. Some allow check extensions AFTER q-search. Some have a couple of plies of threats between basic search and start of q-search. Some extend at the root if in check, some don't. Some extend if giving check at the root, some don't. Etc. I played a match Crafty vs Stockfish a while back to compare evaluations. It took quite a bit of work to make them search the same basic tree space.

I ended up having to modify code in both programs to reach a consistent meaning for "set depth = 1". After all the work was done, the net conclusion from playing 100K games was "no statistical difference".
Yes, correct, but it seems hard to measure comprehensibly the static eval without messing up with the source, where available.

I agree, it is quite difficult. In fact, effectively impossible. Unless you are willing to get "hands dirty" and look at the source. And then you run the risk of introducing an unintentional bug that biases the results even further.

My opinion? This is hopeless. One can compare engines in their entirety, but trying to isolate and measure just one part of an engine is a really difficult project. You can't even try the ultimate and graft each engine's evaluation onto a common search, because many engines compute some parts of their evaluation incrementally within the search. The ideal would be a chess engine with an evaluation that needs access to no information other than the chess position description. But nobody does that since everyone does some sort of king safety analysis that depends on attack information, some sort of mobility analysis that needs the same, etc...

It's a desirable goal, but one that is likely impossible to reach.

Ferdy · Post by **Ferdy** » Sat Dec 20, 2014 3:56 am

Steve Maughan wrote:It is an interesting question.

I think we can peel the onion a little more and ask the following questions:

1. Which engine has the most knowledge of special positions (like the ones on this thread)

2. Which engine has the most parameters / factors in its evaluation function

3. Which engine has the most accurate static evaluation function

Note that an engine which tops the league in question 2 will not necessarily rank highly in question 3, since the coefficients applied to each factor may be incorrect. And as the number of factors increases and evaluate more subtle elements of a position, it becomes more difficult to establish the right value for the parameter.

And of course the engine which tops the league in question 3 may be not be the strongest engine. As well as search efficiencies, the time required to perform the evaluation may be so long that the engine isn't able to search as deeply as others.

Steve

I think I like the idea in (3), but not the static eval but a combination of search and eval so that the two will help each other to bring the score close to zero. Collect those interesting positions which we knew would result to a draw, most likely endings, then run engines at 1 sec/pos and grade its performance depending on how far the score is from zero.
1. Engines with score (+/-) 50 cp and below will get 100 points, +40 cp, or -10 cp does not matter, they all get 100 points.
2. Engines with score 51 to 100 will get 61 to 70 points.
3. Engines with score 101 to 150 will get 51 to 60 points.
4. Engines with score 151 to 200 will get 41 to 50 points.
5. Engines with score 201 to 250 will get 31 to 40 points.
6. Engines with score 251 to 300 will get 21 to 30 points.
7. Engines with score 301 to 350 will get 11 to 20 points.
8. Engines with score 351 to 400 will get 1 to 10 points.
9. Other scores no points.
The engine with highest points earned will be declared the good one

.
I am not sure about the points allocation but that would just be the initial guess.

jdart · Post by **jdart** » Sat Dec 20, 2014 5:06 am

I have been looking at this area recently. It is a general rule I think that if you have a material configuration that would be a draw without pawns, and if you have only a one or two pawn advantage, then you probably still have a draw. I am sure there are exceptions though.

--Jon

Uri Blass · Post by **Uri Blass** » Sat Dec 20, 2014 9:19 am

carldaman wrote:
Henryval wrote:Which Chessprogram(s) have the most implemented chessknowing. And are there more than a few proofs out there in the chesscommunity?

I wonder why we don't have any ranking about this issue.
Diep was supposed to be loaded with knowledge. Too bad it wasn't made publicly available by its author, so it's hard to be more specific.

I wish there were more engines that emphasized knowledge and positional factors, even at the expense of tactics or speed. Even weaker engines are already quite strong tactically when compared to a human master, but many are so lacking in positional understanding and (especially) endgame knowledge.

A more viable AI-oriented approach would just build up the knowledge and produce a more well-rounded, human-like engine. This would likely serve to de-emphasize tactics, since it would by necessity slow down the search speed.

The prevalent (non-AI) philosophy among developers is that "you can't make any changes, no matter how promising, that weaken the engine - even by 1 Elo point!". This rigid mode of thinking can work very well as far gaining points against other engines, but tends to leave noticeable knowledge gaps that show up in actual play, IF anyone pays close attention to the games.

Don't get me wrong, the strongest engines of today play marvelous chess for the most part, but then how come we still find common openings and structures (like the KID) that they grossly misevaluate?

At the end of the day, the Elo list-climbing approach will clearly beat out the knowledge-laden one in terms of pure chess strength, and this puts the latter in an undeserved bad light. My take on this is that we should not lose sight of the benefits of having knowledge-based engines as realistic analysis and sparring partners. After all, engines are created for humans' enjoyment.

I wouldn't gripe about a hypothetical program that could play a decent King's Indian on its own but was "only" about 2400-2500 Elo strong due to relative "tactical weakness" induced by all the extra knowledge! Would anyone view it as a "failure" because it's nowhere near Komodo in strength? I know I'd be impressed by such an achievement!

If you are a programmer who would like to add a lot more knowledge to your engine, I'd encourage you to go ahead without hesitation and let the chips fall where they may. Never let anyone cause you to mistake your achievements as being "inferior" to some pre-defined standard.

Regards,
CL

1)I do not see how you get rating of only 2400-2500 elo due to relative tactical weakness
Even being 10 times slower than Komodo is clearly more than 2400-2500.

2)I think that a good static evaluation should be not in pawns but in probabilities.
The program should give probability
for win for draw and for loss and only later get a score based on the probabilities.

The probabilities can be used for better search because it is better to extend lines when you are unsure about the results and to prune lines when you are sure about the result.

If you are sure that the result is going to be a draw then it does not make sense to continue to search.

Note that stockfish at least knew in the past not to continue to search
in positions like KB vs K but for some reason even with no proved advantage they decided to remove this knowledge because it passed as simplification.

Here is the relevant change by joerge oster that it was better not
to allow in the first place
https://github.com/joergoster/Stockfish ... ...f902b24

Here is the relevant test

http://tests.stockfishchess.org/tests/v ... 2db1a06439

carldaman · Post by **carldaman** » Sat Dec 20, 2014 9:50 am

Uri Blass wrote:
1)I do not see how you get rating of only 2400-2500 elo due to relative tactical weakness
Even being 10 times slower than Komodo is clearly more than 2400-2500.

I was being conservative, for the sake of the argument. Mileage can vary here. Diep itself was probably in that rating range.

My comments were not necessarily aimed at the likes of Stockfish, or others vying for the top position. Of course, I'd still prefer to see more flexibility from them, where they could trade a handful of Elo for something really useful, hypothetically.

cdani · Post by **cdani** » Sat Dec 20, 2014 10:19 am

I have the idea that if you improve a lot an engine without adding chess knowledge to it, you are overfitting to the current knowledge. So it will be less able to profit to some new knowledge, or it will require redoing bigger part of the already done work. And of course it will be more evident the lack of some knowledge as new engines come into his level of strength.

http://www.quora.com/What-is-an-intuiti ... verfitting

Laskos · Post by **Laskos** » Sat Dec 20, 2014 11:39 am

hgm wrote:I would expect slow searchers like The Baron to have far more knowledge.

I found Baron 2.23 (Winboard), but it seems to not perform so well at fixed depth=1

Code: Select all

Rank Name                        ELO   Games   Score   Draws
   1 Komodo                       64    1000     59%     18%
   2 Houdini                      59    1000     58%     29%
   3 Hannibal                     20    1000     53%     28%
   4 SF                           20    1000     53%     21%
   5 Hiarcs                      -46    1000     43%     18%
   6 Baron                      -121    1000     33%     24%
Finished match

Anyway, as Adam and Bob said, depth=1 means different things for each engine, so my test is almost pointless.

Adam Hair · Post by **Adam Hair** » Sat Dec 20, 2014 1:41 pm

bob wrote:
Laskos wrote:
bob wrote:
This doesn't work very well. depth=1 does NOT mean the same thing to everyone. To some it is simply a 1 ply search followed by q-search. Some allow check extensions before going to q-search. Some allow check extensions AFTER q-search. Some have a couple of plies of threats between basic search and start of q-search. Some extend at the root if in check, some don't. Some extend if giving check at the root, some don't. Etc. I played a match Crafty vs Stockfish a while back to compare evaluations. It took quite a bit of work to make them search the same basic tree space.

I ended up having to modify code in both programs to reach a consistent meaning for "set depth = 1". After all the work was done, the net conclusion from playing 100K games was "no statistical difference".
Yes, correct, but it seems hard to measure comprehensibly the static eval without messing up with the source, where available.
I agree, it is quite difficult. In fact, effectively impossible. Unless you are willing to get "hands dirty" and look at the source. And then you run the risk of introducing an unintentional bug that biases the results even further.

My opinion? This is hopeless. One can compare engines in their entirety, but trying to isolate and measure just one part of an engine is a really difficult project. You can't even try the ultimate and graft each engine's evaluation onto a common search, because many engines compute some parts of their evaluation incrementally within the search. The ideal would be a chess engine with an evaluation that needs access to no information other than the chess position description. But nobody does that since everyone does some sort of king safety analysis that depends on attack information, some sort of mobility analysis that needs the same, etc...

It's a desirable goal, but one that is likely impossible to reach.

Here is my idea of trying to roughly compare evaluation functions:

1) Determine a time per move for each engine so that they score ~50% against each other in normal play.

2) For each piece of chess knowledge being tested, run a boatload of games. For example, to judge how well each engine handles a particular material imbalance (such as the Exchange), collect a large number of positions featuring that material imbalance (using some care in selecting positions that are not too biased for a particular side), and use those positions as the starting positions in the testing.

The hope is that something can be discerned from the noise.

Ferdy · Post by **Ferdy** » Sat Dec 20, 2014 3:30 pm

Henryval wrote:Which Chessprogram(s) have the most implemented chessknowing. And are there more than a few proofs out there in the chesscommunity?

I wonder why we don't have any ranking about this issue.

Here is a sample system of identifying a good engine based on given positions where we know the result is draw but engines have problem showing its score to be even although they pick the best move. Got the idea from Steve.
I have collected some uci engines and let them analyse 8 positions (with fortress) at 1 sec per position, the search score returned is used so engine will get points, more if it is close to zero.

Code: Select all

A. Platform&#58;
System   &#58; Windows
Release  &#58; 7
Version  &#58; 6.1.7601
Machine  &#58; AMD64
processor&#58; Intel64 Family 6 Model 42 Stepping 7, GenuineIntel

B. Engine parameters&#58;
Threads  &#58; 1
Hash     &#58; 64mb
Time/pos &#58; 1000ms

C. Test settings&#58;
Total engine count  &#58; 45
Total positions     &#58; 8 &#40;input file&#58; test.fen&#41;
Total max points    &#58; 800
Estimated total time&#58; 8 pos x 1000ms/pos = 8000 ms

D. Summary high points is better&#58;
 1 id name Fire 4 x64                         &#40;time 6101 ms, Points 537, ratio  67.1%)
 2 id name Gull 3 x64                         &#40;time 8000 ms, Points 475, ratio  59.4%)
 3 id name Houdini 4 x64                      &#40;time 8000 ms, Points 472, ratio  59.0%)
 4 id name Critter 1.6a 64-bit                &#40;time 6110 ms, Points 466, ratio  58.2%)
 5 id name Strelka 6 w32                      &#40;time 8000 ms, Points 466, ratio  58.2%)
 6 id name Komodo 6 64-bit                    &#40;time 7203 ms, Points 358, ratio  44.8%)
 7 id name Texel 1.04 64-bit                  &#40;time 6930 ms, Points 341, ratio  42.6%)
 8 id name Stockfish 131214 64 POPCNT         &#40;time 7139 ms, Points 296, ratio  37.0%)
 9 id name HIARCS 14 WCSC                     &#40;time 6161 ms, Points 269, ratio  33.6%)
10 id name Bouquet 1.8 x64                    &#40;time 8112 ms, Points 266, ratio  33.2%)
11 id name Hannibal 1.4x64                    &#40;time 5572 ms, Points 260, ratio  32.5%)
12 id name Booot 5.2.0&#40;64&#41;                    &#40;time  140 ms, Points 212, ratio  26.5%)
13 id name Equinox 3.30 x64mp                 &#40;time 6660 ms, Points 204, ratio  25.5%)
14 id name Deuterium v14.4.35.17 64bit POPCNT &#40;time 6902 ms, Points 200, ratio  25.0%)
15 id name Fruit reloaded 2.1                 &#40;time 6099 ms, Points 199, ratio  24.9%)
16 id name Octochess revision 5190            &#40;time 4945 ms, Points 186, ratio  23.2%)
17 id name Amyan 1.72                         &#40;time   80 ms, Points 173, ratio  21.6%)
18 id name Naum 4.6                           &#40;time 6726 ms, Points 163, ratio  20.4%)
19 id name Protector 1.7.0                    &#40;time 7488 ms, Points 159, ratio  19.9%)
20 id name Spike 1.4                          &#40;time 7488 ms, Points 130, ratio  16.2%)
21 id name DiscoCheck 5.2.1                   &#40;time 6705 ms, Points 127, ratio  15.9%)
22 id name Ruffian 1.0.5                      &#40;time 5920 ms, Points 115, ratio  14.4%)
23 id name Yace 0.99.87                       &#40;time 7936 ms, Points 115, ratio  14.4%)
24 id name Gaviota v1.0                       &#40;time 6661 ms, Points 110, ratio  13.8%)
25 id name Andscacs 0.71                      &#40;time 5981 ms, Points  96, ratio  12.0%)
26 id name Maverick 0.51 x64                  &#40;time 6193 ms, Points  94, ratio  11.8%)
27 id name Nebula 2.0                         &#40;time 6490 ms, Points  93, ratio  11.6%)
28 id name cheng4 0.36c                       &#40;time 6666 ms, Points  90, ratio  11.2%)
29 id name Deuterium v14.3.34.130             &#40;time 7316 ms, Points  90, ratio  11.2%)
30 id name Nemo SP64o 1.0.1 Beta              &#40;time 8000 ms, Points  88, ratio  11.0%)
31 id name AnMon 5.75                         &#40;time 7083 ms, Points  85, ratio  10.6%)
32 id name Rybka 2.3.2a mp                    &#40;time 5625 ms, Points  80, ratio  10.0%)
33 id name Rodent 1.6 &#40;build 6&#41;               &#40;time 5835 ms, Points  73, ratio   9.1%)
34 id name Senpai 1.0                         &#40;time 7032 ms, Points  70, ratio   8.8%)
35 id name Arasan 17.4                        &#40;time 8149 ms, Points  67, ratio   8.4%)
36 id name Rhetoric 1.4.1 x64                 &#40;time 5505 ms, Points  66, ratio   8.2%)
37 id name Bobcat 3.25                        &#40;time 6503 ms, Points  49, ratio   6.1%)
38 id name GreKo 12.1                         &#40;time 6255 ms, Points  49, ratio   6.1%)
39 id name Vajolet2 1.45                      &#40;time 6844 ms, Points  41, ratio   5.1%)
40 id name spark-1.0                          &#40;time 8112 ms, Points  37, ratio   4.6%)
41 id name DisasterArea-1.54                  &#40;time 6724 ms, Points  32, ratio   4.0%)
42 id name Daydreamer 1.75 JA                 &#40;time 6030 ms, Points  19, ratio   2.4%)
43 id name GNU Chess 5.60-64                  &#40;time 6357 ms, Points  14, ratio   1.8%)
44 id name Quazar 0.4 x64                     &#40;time 7207 ms, Points   8, ratio   1.0%)
45 id name iCE 2.0 v2240 x64/popcnt           &#40;time 2028 ms, Points   6, ratio   0.8%)

E. Positions&#58;
 1 6k1/8/6PP/3B1K2/8/2b5/8/8 b - - 0 1
 2 8/8/r5kP/6P1/1R3K2/8/8/8 w - - 0 1
 3 7k/R7/7P/6K1/8/8/2b5/8 w - - 0 1
 4 8/8/5k2/8/8/4qBB1/6K1/8 w - - 0 1
 5 8/8/8/3K4/8/4Q3/2p5/1k6 w - - 0 1
 6 8/8/4nn2/4k3/8/Q4K2/8/8 w - - 0 1
 7 8/k7/p7/Pr6/K1Q5/8/8/8 w - - 0 1
 8 k7/p4R2/P7/1K6/8/6b1/8/8 w - - 0 1

F. Point System&#58;
score <= abs&#40;50&#41;  &#58; 100 points
score <= abs&#40;100&#41; &#58; 61 - 70, points
score <= abs&#40;150&#41; &#58; 51 - 60, points
score <= abs&#40;200&#41; &#58; 41 - 50, points
score <= abs&#40;250&#41; &#58; 31 - 40, points
score <= abs&#40;300&#41; &#58; 21 - 30, points
score <= abs&#40;350&#41; &#58; 11 - 20, points
score <= abs&#40;400&#41; &#58;  1 - 10, points
Other scores      &#58;  0 points

G. Engine that does not report time&#58;
 1 id name Gull 3 x64                  
 2 id name Nemo SP64o 1.0.1 Beta

Sample positions. See section E for complete 8 positions.
[d]8/k7/p7/Pr6/K1Q5/8/8/8 w - - 0 1
[d]8/8/5k2/8/8/4qBB1/6K1/8 w - - 0 1
[d]8/8/r5kP/6P1/1R3K2/8/8/8 w - - 0 1

Chessprogams with the most chessknowing

Re: Chessprogams with the most chessknowing

Re: Chessprogams with the most chessknowing

Re: Chessprogams with the most chessknowing

Re: Chessprogams with the most chessknowing

Re: Chessprogams with the most chessknowing

Re: Chessprogams with the most chessknowing

Re: Chessprogams with the most chessknowing

Re: Chessprogams with the most chessknowing

Re: Chessprogams with the most chessknowing

Re: Chessprogams with the most chessknowing