IPON ratings calculation

lkaufman · Post by **lkaufman** » Thu Dec 29, 2011 9:52 pm

Could you please comment on how Komodo 4 uses its time in your testing, compared to other engines and also compared to what you think is ideal? We did not design our time control for ponder on games, and I'm thinking this was a big mistake. This probably hurts our results in your testing and IPON. Maybe we can correct this.

I still believe ponder on testing is less efficient than giving double the time, even your own statistics agree. But since you and IPON do it, we need to pay attention to it.

Best regards,
Larry

Don · Post by **Don** » Thu Dec 29, 2011 10:02 pm

Jouni wrote:If engine has positive score against all others, isn't it the best and strongest automatically - no need to calculate anything ?!

Jouni

Not if it's better against some and worse against others.

Frank Quisinsky · Post by **Frank Quisinsky** » Thu Dec 29, 2011 10:17 pm

Hi Larry,

I am visiting a lot of Komodo games live and I am looking in each game on the clock before the second or third time control started (40 in 10 repeatedly I used with ponder = on).

I think in case of Komodo it is to 95% perfect. No lost on time games from Komodo in actual versions and if 35-38 moves or 75-78 moves played Komodo have enough time on the clock.

From my point of view nothing to do here in Komodo. Also Komodo is playing the first moves after opening not to fast or to slow. A very fine work again if I am looking on time controls and ponder in Komodo games.

BTW:
Hash-Tables it the topic.
Komodo is very strong in endgames.

Hash-Tables are more important for endgames, not for the middle games. So it make sense to do the following:

If 32 pieces on the board = xMb for hash, example 128Mb
If 16 pieces on the board = xMb for hash, example 256Mb
If 08 pieces on the board = xMb for hash, example 512Mb

Could give Komodo again a little jumping with perhaps 2-3 ELO.

Easy to set as UCI option:
Variable Hash-Tables: yes / no

If yes ...
Hash from beginning = xMb
Hash with 16 pieces = xMb
Hash with 08 pieces = xMb

No program have it, I can't understand why?

Perhaps the UCI programmer have to set such an option in UCI protocol.

2.
The high move average in Komodo games comes with the problem that Komodo don't support endgame databases. My wish is the support Gaviota endgame databases.

At the moment two SWCR games are still running with KR vs. KR. Made no sense but I am playing without resign.

An interesting stat for yourself:
In 212 of 160.000 SWCR games much engine gave an advantage from +6. In most of cases bishop endgames with wrong pawns (not to win). Without resign game ended correct with 50-moves rule. With resign game will be ended with 1:0 or 0:1.

3.
Open positions are very danger for chess engines.
It made sense to calculate possible moves for each position.

If 50 moves are possible ... for an example after move 24. an engine should get a time bonus, should used a bit time more for calculate the answere. if 15 moves are possible after move 50 for an example the engine should play faster. This could be again 2-3 ELO. I think in blitz games much more.

In my analyzes I find out, that in open positions engines play to fast. For a progam like Komodo with all the positional strengths could be such an option 2-3 ELO, not sure. Never a program should play after a ponder hit directly if a lot of moves are possible. Better is to give the engine more time for such complicated positions.

I am not a programmer, tester only ...
Perhaps your team have interest to thinking about this ideas.

Best
Frank

IWB · Post by **IWB** » Thu Dec 29, 2011 10:17 pm

Hi Larry,

lkaufman wrote:... We did not design our time control for ponder on games, and I'm thinking this was a big mistake. This probably hurts our results in your testing and IPON. Maybe we can correct this. ...

For a very simple reason I am a bit suprised by this statement. Try to think the other way around! Basicaly engines are used only for two things:

1. Analysis (mainly)
2. To play against (OTB or on a server)

In case 2 the question is who is playing ponder OFF? The only people who are doing this are a few rating lists (for historic reasons - and now they dont want to trow away the games). Everyone else (!) is always plaing PON (and is loosing, therefore a good method to limit playing strength is important as well)! So, a good Ponder ON time management is much more important than the Ponder off thing!
I consider Ponder off as completly artifical and useless, sorry. You are right with the number of games, but that is an argument coming from times where there where onyl single CPUs. Nowadays it is possible to play a sufficiant number of games with ponder on. (I admit that engine development, with very short time controls is more practicable with POFF, but that has nothing to do with real game play - the other devices where ponder off might be used are smartphones or other mobile devices to save energy, but there, against humans, the timing of that ponder off games is less important ...)

Again, any real chess game played by humans, in a computer WC or at a server is Ponder ON. I consider this as real chess and ponder OFF as some kind of subgroup for special purposes.

Regards and a few more "happy holidays"
Ingo

EDIT: If you start to make developments to please the rating list and not the users this will backfire! Someone will come up with a new, better method of testing (it already happened and will happen again imho) and then you have to adapt again, and again ...

Robert Flesher · Post by **Robert Flesher** » Thu Dec 29, 2011 10:19 pm

lkaufman wrote:Could you please comment on how Komodo 4 uses its time in your testing, compared to other engines and also compared to what you think is ideal? We did not design our time control for ponder on games, and I'm thinking this was a big mistake. This probably hurts our results in your testing and IPON. Maybe we can correct this.

I still believe ponder on testing is less efficient than giving double the time, even your own statistics agree. But since you and IPON do it, we need to pay attention to it.

Best regards,
Larry

In regards to time management, I have noticed some strange time management in SD games in which each engine will have 40 min each with pondering off. Komodo can have 30 min left on the clock and it will sometimes only use 6-7 seconds and then move, this seems like poor time management. However, because it did not lose these games, maybe it knew what it was doing in each particular position and figured it was fine.

zullil · Post by **zullil** » Thu Dec 29, 2011 11:02 pm

Don wrote:
Jouni wrote:If engine has positive score against all others, isn't it the best and strongest automatically - no need to calculate anything ?!

Jouni
Not if it's better against some and worse against others.

I don't understand your response.

Sven · Post by **Sven** » Thu Dec 29, 2011 11:23 pm

lkaufman wrote:
Michel wrote:
Albert Silver wrote:I can only assume it is I who lack the proper understanding of how the ratings are calculated, but watching the IPON results of Critter 1.4, I began to wonder why its performance was 2978 after 2106 games. I took the 22 performances, added them up, and then divided them by 22 and came up with 3000.59, so why is the total performance 2978?
The calculation method of BayesElo is explained here:

http://remi.coulom.free.fr/Bayesian-Elo/#theory

The elo's are the result of a maximum likelihood calculation seeded
with a prior (afaics this can only be theoretically justified in a Bayesian
setting).

The actual algorithm is derived from this paper

http://www.stat.psu.edu/~dhunter/papers/bt.pdf
I think the "prior" may be the problem; it appears to have way too much weight. If an engine performs 3000 against every opponent in over 2000 games, it should get a rating very close to 3000, maybe 2999. But apparently the prior gets way too much weight, because I believe such an engine on the IPON list would get only around 2975.

Part of the problem is that "match performance" is an almost irrelevant number, and also that you can't take the arithmetic average of it due to non-linearity of the percentage expectancy curve. See also the other thread where this has been discussed (link was provided above).

Sven

Sven · Post by **Sven** » Thu Dec 29, 2011 11:35 pm

Jouni wrote:If engine has positive score against all others, isn't it the best and strongest automatically - no need to calculate anything ?!

No. Say you have 10 engines playing a round robin, with an equal number of games for each match. Suppose A scores 51% against all others. B scores 60% against all others, except for A of course. Which engine should get the highest rating from that tourney? I think it is clearly B, since it scores substantially better than A against all opponents but one.

Sven

Don · Post by **Don** » Thu Dec 29, 2011 11:43 pm

zullil wrote:
Don wrote:
Jouni wrote:If engine has positive score against all others, isn't it the best and strongest automatically - no need to calculate anything ?!

Jouni
Not if it's better against some and worse against others.
I don't understand your response.

His statement is correct, but my point is that it's rarely the case when it's so clear. For example even in IPON list Komodo 4 won it's 100 game match against Houdini 2.0, and yet we know Houdini is still stronger. So in this example not a single program has a positive score against all others.

zullil · Post by **zullil** » Fri Dec 30, 2011 12:13 am

Don wrote:
zullil wrote:
Don wrote:
Jouni wrote:If engine has positive score against all others, isn't it the best and strongest automatically - no need to calculate anything ?!

Jouni
Not if it's better against some and worse against others.
I don't understand your response.
His statement is correct, but my point is that it's rarely the case when it's so clear. For example even in IPON list Komodo 4 won it's 100 game match against Houdini 2.0, and yet we know Houdini is still stronger. So in this example not a single program has a positive score against all others.

Oh, OK.

IPON ratings calculation

Re: Not realistic!

Re: IPON ratings calculation

Re: Komodo with ponder and 2 ideas for chess programmers!

Re: Not realistic!

Re: Not realistic!

Re: IPON ratings calculation

Re: IPON ratings calculation

Re: IPON ratings calculation

Re: IPON ratings calculation

Re: IPON ratings calculation