100 long games Rybka 4 vs Houdini 1.03a

Tord Romstad · Post by **Tord Romstad** » Tue Nov 02, 2010 11:23 am

mcostalba wrote:
Uri Blass wrote: I disagree with you and I believe that strong players can help to improve stockfish's evaluation based on watching stockfish's games
Here is my opinion regarding this point.

Implementation of any evaluation idea in chess engines is always the combined effect of two contributions.

1 - The actual idea, for instance a new way to evaluate pawns structure or a new way to evaluate passed pawns.

2 - The tuning of the coefficents that _weights_ the idea among the other evaluation terms.

IMHO a strong player could be effective regarding point one, i.e. to propose an interesting idea. Then the next step is "to make the idea to work" (because normally even a good idea does _not_ work at first try) finding the right coefficents bounded to the idea in the evaluation code and this is a task up to real games testing possibly without human intervention.

This is all true. Another problem worth pointing out is the counterintuitive phenomenon that a new evaluation term that is correct in most or even almost all positions can sometimes reduce the playing strength. Lots of times I have identified some piece of missing evaluation knowledge, implemented it as well as I could, and seen the playing strength drop by 10 or 15 Elo points. The most frustrating thing is that it often looks like the engine is playing better with the new piece of knowledge, and I can easily identify games where the new knowledge helps it win. Nevertheless, when a large number of games are played, the statistics prove that the program played better without the new knowledge. Sometimes tuning the weights can help, but not always.

The wrong thing to do IMHO is to pretend a human is effective in both the first and the second points.

PauloSoare · Post by **PauloSoare** » Tue Nov 02, 2010 1:31 pm

Personal insullts. Would that those who insult over the internet would have the courage to do so in the presence of the person? I do not believe.
The subject discussed is interesting.

Milos · Post by **Milos** » Tue Nov 02, 2010 3:53 pm

Tord Romstad wrote:Another problem worth pointing out is the counterintuitive phenomenon that a new evaluation term that is correct in most or even almost all positions can sometimes reduce the playing strength. Lots of times I have identified some piece of missing evaluation knowledge, implemented it as well as I could, and seen the playing strength drop by 10 or 15 Elo points. The most frustrating thing is that it often looks like the engine is playing better with the new piece of knowledge, and I can easily identify games where the new knowledge helps it win. Nevertheless, when a large number of games are played, the statistics prove that the program played better without the new knowledge. Sometimes tuning the weights can help, but not always.

The wrong thing to do IMHO is to pretend a human is effective in both the first and the second points.

Agree totally.
There is also one more phenomenon that most of the ppl are not aware when judging "evaluation". They see the position on board and how they would evaluate it now. However, engine came to the point where exact evaluation of this position is necessary 30 plies before. And this position is just a very small part of the three. The exact evaluation of the actual position is of little importance. Why? Because search is much "faster" and will reach more accurate relative score results sooner than complex evaluation.
I say relative because this is the only important thing and one more thing ppl just overlook.
The only important thing for strength is to sort moves correctly and not even all, but just couple of strongest moves.
This means that whether engine scores the best move as -2 or +3, as long as this is the best score of all other possible moves it will not affect strength at all. Our misconception is that seeing score -2 when it should actually be +3 must be hurting strength. In most of the cases it's not hurting strength at all.

michiguel · Post by **michiguel** » Tue Nov 02, 2010 4:16 pm

mcostalba wrote:
Uri Blass wrote: I disagree with you and I believe that strong players can help to improve stockfish's evaluation based on watching stockfish's games
Here is my opinion regarding this point.

Implementation of any evaluation idea in chess engines is always the combined effect of two contributions.

0 - Identify the problem (before developing the idea). This is where a strong player could help. And of course, a strong player can identify problems even in the top engines.

Miguel

1 - The actual idea, for instance a new way to evaluate pawns structure or a new way to evaluate passed pawns.

2 - The tuning of the coefficents that _weights_ the idea among the other evaluation terms.

IMHO a strong player could be effective regarding point one, i.e. to propose an interesting idea. Then the next step is "to make the idea to work" (because normally even a good idea does _not_ work at first try) finding the right coefficents bounded to the idea in the evaluation code and this is a task up to real games testing possibly without human intervention.

The wrong thing to do IMHO is to pretend a human is effective in both the first and the second points.

Another wrong thing to do is to consider an human useless because is not able to come up with point 2.

Robert Flesher · Post by **Robert Flesher** » Tue Nov 02, 2010 5:52 pm

M ANSARI wrote:
Milos wrote:
Uri Blass wrote:This is simply not correct.

20 elo is not a year and we usually get more than 20 elo per year in software(if you compare rybka with the best software of 2005 you get clearly more than 100 elo) and Rybka clearly earns more speed from 64 bits relative to IvanHoe.
It is exactly correct today and for Rybka (talking about 2005 in 2010 is just wrong). Rybka improved less than 40 elo for more than 2 years.
Anyone thinking Vas could improve more than 40 elo if wanted to is nothing but a dreamer or a fan.

The reason is that Vasik did not care to optimize rybka for 32 bits so the gap between 32 bits and 64 bits is bigger for rybka.

I believe that
Vas can make Rybka 32 bits 20-30% faster but he does not care about it because most customers who use 32 bits do not care about rating.
You believe it, but that's a classic example of wishful thinking. Everyone assumes they know for sure what Vas is capable of, and like always reality strongly refutes them...
Again ??? What the hell is it with you and Vas. Why are you so jealous of Vas that you continuously try to portray him as someone who has no clue how to make a strong chess engine. Most of the people here on this forum have seen the progress of Rybka 1 beta to Rybka 4 and I am pretty sure that most would agree that there has been some progress with every new iteration of Rybka.

The fact is that Vas was able to improve (with copied Fruit code) his engine in strength at a pace that would have seemed highly unlikely before Fruit came along. It was a few points from Shredder here, maybe a few points from Fritz there ... but progress seemed to have hit a wall UNTIL Rybka came along. You obviously don't seem to remember that, or at least are suffering from selective amnesia.

The problem that most have with Rybka is that some of the simple (yet seemingly easy to fix) bugs were not taken care of in a manner most would have expected or would have liked. I don't really blame them as I consider myself one of those disappointed customers, and if this discussion was that Rybka could improve in the area of attempting to fix bugs in a timely manner, or improve customer satisfaction in general, I would be first in line. But please don't try to make it seem like Vas can't make improvements on an engine he has consistently shown huge gains in strength, because there is volumes of data that show otherwise.

I like Rybka, and I have purchased every version. However, I did make some corrections that IMHO speaks to the truth of the evolution of Rybka.

Gino Figlio · Post by **Gino Figlio** » Tue Nov 02, 2010 7:20 pm

Albert Silver wrote:
Uri Blass wrote:My point is that I disagree with
"Improving 3200 elo engine by looking for weaknesses in play, that's just ridiculous."
It could be that he is just very weak and thus the concept looks like science fiction. There are many weaknesses in its evaluation that do not require a GM to see, but if even these are beyond one's reach, then the idea of correcting them must truly seem ludicrous.

Here is an easy example:

[D]1R6/1P4k1/7p/1r5P/6K1/8/8/8 w - - 0 1

Analysis by Deep Rybka 4 x64:

1.Kf4 Kh7 2.Ke4 Kg7 3.Kd4 Rb6 4.Kc5 Rb3 5.Kc6 Rc3+ 6.Kb5 Rb3+[] 7.Kc5 Rb1 8.Kc6 Rc1+ 9.Kb6 Rb1+[] 10.Kc7 Rc1+ 11.Kd7 Rb1 12.Kc6 Rc1+ 13.Kb6 Rb1+[] 14.Kc7 Rc1+ 15.Kd7 Rb1 16.Kc6
+- (3.35) Depth: 31 00:01:27 1359kN, tb=6850
1.Kf4 Rb3 2.Ke5 Kh7 3.Kd6[] Kg7 4.Kc7 Rc3+ 5.Kb6 Rb3+ 6.Ka5 Rb1 7.Ka6 Ra1+[] 8.Kb6[] Rb1+[] 9.Kc5 Rc1+ 10.Kb6 Rb1+ 11.Kc5 Rc1+ 12.Kb6 Rb1+ 13.Kc5 Rc1+ 14.Kb6 Rb1+ 15.Kc5 Rc1+ 16.Kb6
+- (3.35) Depth: 32 00:02:05 2225kN, tb=10450

Houdini also misevaluates and gives it +1.54, while Stockfish 1.9.1 gets it right (0.00) after a bit of thought.

Instructive example. I knew how Naum Mexico II or Houdini gave more realistic scores in the endgame than DR4, especially rook and pawn endings. I will now start using Stockfish for the endgame as well. Thanks.

mcostalba · Post by **mcostalba** » Tue Nov 02, 2010 8:36 pm

michiguel wrote:
mcostalba wrote:
Uri Blass wrote: I disagree with you and I believe that strong players can help to improve stockfish's evaluation based on watching stockfish's games
Here is my opinion regarding this point.

Implementation of any evaluation idea in chess engines is always the combined effect of two contributions.

0 - Identify the problem (before developing the idea). This is where a strong player could help. And of course, a strong player can identify problems even in the top engines.

Identify the problem to me gives the same help then a politician saying that taxes are too high and he wants to reduce them

Identify the problem to me, that I am a very weak player, does not help at all

100 long games Rybka 4 vs Houdini 1.03a

Re: 100 long games Rybka 4 vs Houdini 1.03a

Re: 100 long games Rybka 4 vs Houdini 1.03a

Re: 100 long games Rybka 4 vs Houdini 1.03a

Re: 100 long games Rybka 4 vs Houdini 1.03a

Re: 100 long games Rybka 4 vs Houdini 1.03a

Re: 100 long games Rybka 4 vs Houdini 1.03a

Re: 100 long games Rybka 4 vs Houdini 1.03a