The pure list has been updated. Everything is updated together, it is an automated process. If an engine doesn't have enough games against pure list opponents, then it doesn't go on there until it has. In the case of Texel:
"Texel 1.05 64-bit 4CPU" out (only 132 games), "Texel 1.04 64-bit 4CPU" in
Texel 1.05 64-bit 4CPU needs more games before it can replace 1.04 on the pure list.
The Pure list is not necessarily better. You reduce the distortion yes, but you replace it with far greater margins of error due to the smaller database with much less games.
Modern Times wrote:The Pure list is not necessarily better. You reduce the distortion yes, but you replace it with far greater margins of error due to the smaller database with much less games.
Currently the 40/40 pure list for free engines has a problem. All 4CPU engines seem to be around 600-700 elo below their expected elo level. The problem is that even though all participants in this list have played a sufficient amount of games against other participants in the list, there are no games between the 4CPU and the 1CPU engines.
petero2 wrote:
Currently the 40/40 pure list for free engines has a problem. All 4CPU engines seem to be around 600-700 elo below their expected elo level. The problem is that even though all participants in this list have played a sufficient amount of games against other participants in the list, there are no games between the 4CPU and the 1CPU engines.
Testers are very aware of the need to play 1CPU vs 4CPU to connect the engines. If they don't then there is a problem for sure !
Modern Times wrote:The Pure list is not necessarily better. You reduce the distortion yes, but you replace it with far greater margins of error due to the smaller database with much less games.
Currently the 40/40 pure list for free engines has a problem. All 4CPU engines seem to be around 600-700 elo below their expected elo level. The problem is that even though all participants in this list have played a sufficient amount of games against other participants in the list, there are no games between the 4CPU and the 1CPU engines.
Right. Ordo can be useful to detect this automatically with the switch -g
Total games 37786
- White wins 14141
- Draws 12204
- Black wins 11441
- Truncated 0
Unique head to head 17.01%
Reference rating 2300.0 (average of the pool)
Loose Anchors = none
Relative Anchors = none
groups=2
Encounters, Total=6427, Main=6427, @ Interface between groups=0
And the output is in the file g.txt, listing the two groups, to know exactly who needs connecting games (see at the bottom)
petero2 wrote:Currently the 40/40 pure list for free engines has a problem. All 4CPU engines seem to be around 600-700 elo below their expected elo level. The problem is that even though all participants in this list have played a sufficient amount of games against other participants in the list, there are no games between the 4CPU and the 1CPU engines.
Testers are very aware of the need to play 1CPU vs 4CPU to connect the engines. If they don't then there is a problem for sure !
Yes, and in the full list I have never seen this to be a problem. It may not be a problem in the pure lists most of the time either. This is the first time I have seen the problem. In this particular case the lowest ranked 4CPU engine is Protector 1.7.0 (elo 3091 in the full list) and the highest ranked 1CPU engine is Spike 1.4 Leiden (elo 2924 in the full list). It is understandable if they have not been played against each other.
Overall the CCRL list is great. I just wanted to illustrate one more reason for why the pure lists are not necessarily better than the full lists.
The scripts will for example automatically exclude two new engines where they have only played each other, knowing that bayeselo cannot give a rating in that situation. So they do check that the database is connected. But they check in the pre-processing stage on the overall database. There are clearly no such checks when the pure lists are being created. Thanks for pointing this out.
michiguel wrote:
Right. Ordo can be useful to detect this automatically with the switch -g
Very useful to know, but we don't use Ordo.
It does not matter. You can use Ordo just to figure it out if you need connections. The -g switch does not produce a ranking. For the ranking, you can use anything else, of course.
On a side note, I think the default view should be the pure list, not the distorted list. With so many versions of the same engines (or clones) being tested, rating distortions become quite significant. What do you think ?
I never really take much notice of the pure list when organising my testing to be honest. Perhaps I should.
Clones/derivatives are always a tricky subject.
Which engines qualify as clones/derivatives? Who decides? On what basis and what criteria is the decision made? Is there unanimous agreement over such criteria?
Graham, I asked you to continue the test of Sting SF 4.8.3 and I am very happy that you did (doing) it. The version 4.8.3 is not stable and can hang up. But the result of this version is much more important for me than the next 4.8.4. Believe me I respect you much more than the prominent authors of the prominent engines.
I know programmers, they feel like Gods!