How to rate my engine in CCRL?

No4b · Post by **No4b** » Sun Sep 27, 2020 6:48 pm

mvanthoor wrote: ↑Sun Sep 27, 2020 3:14 pm
maksimKorzh wrote: ↑Fri Sep 25, 2020 1:16 pm ...
Hi Maksim,

I don't know if you've seen this already, but Wukong has been rated on CCRL's Blitz list.

Wukong on CCRL

Rating: 1474

Congratulations

It is not an advanced chess engine (I know it wasn't ever supposed to be), but its rating gives me good hopes for my own engine

Seeing that such a basic engine as Wukong already scores 1474 CCRL, I wonder what one needs to do (or omit) to build an engine that scores in the 1200's...

Well, when i was getting started on my chess variant AI, it was VERY weak. At that time i was yet to watch VICE video and never heard of this forum.
So my engine didnt have Quiescence search, made/unmade moves directly on the UI board (one can only imagine how slow it was), and the only move ordering technique it used was my pretty strange implementation of the Killer moves.
I never measured its strength, but i bet it would be pretty low.

Also i must note that any bugs in the code can contribute to the strenght loss very significantly (i often fixed bugs that cost like ~60 elo).

Another significant elo-eater might be inefficient make\unmake or MoveGen. For example my current AI for unity game has very convoluted Make\Unmake functions, because there are pieces with very different move rules, and it should be considered (f.e. some pieces do not move if they are capturing smth, some do not disappear while promoting, but can do it only once in a game, some do not disseapear if they are captured... etc - this brings in many additional IFs in the code and i bet slowdown engine hard).

mvanthoor · Post by **mvanthoor** » Sun Sep 27, 2020 8:52 pm

No4b wrote: ↑Sun Sep 27, 2020 6:48 pm Well, when i was getting started on my chess variant AI, it was VERY weak. At that time i was yet to watch VICE video and never heard of this forum.
So my engine didnt have Quiescence search, made/unmade moves directly on the UI board (one can only imagine how slow it was), and the only move ordering technique it used was my pretty strange implementation of the Killer moves.
I never measured its strength, but i bet it would be pretty low.

Also i must note that any bugs in the code can contribute to the strenght loss very significantly (i often fixed bugs that cost like ~60 elo).

Another significant elo-eater might be inefficient make\unmake or MoveGen. For example my current AI for unity game has very convoluted Make\Unmake functions, because there are pieces with very different move rules, and it should be considered (f.e. some pieces do not move if they are capturing smth, some do not disappear while promoting, but can do it only once in a game, some do not disseapear if they are captured... etc - this brings in many additional IFs in the code and i bet slowdown engine hard).

That seems to be quite a complicated chess variant. Can I play this somewhere/somehow?

(I did try some chess variants, but some just have too many different pieces with too many different capabilities; it becomes hard to remember what piece can actually perform which moves, and in what situations.)

Gabor Szots · Post by **Gabor Szots** » Mon Sep 28, 2020 10:53 am

I am just testing BBC. First results do not show +600 compared to TSCP, though.

maksimKorzh · Post by **maksimKorzh** » Mon Sep 28, 2020 10:58 am

Gabor Szots wrote: ↑Mon Sep 28, 2020 10:53 am I am just testing BBC. First results do not show +600 compared to TSCP, though.

Thanks you so much Gabor.
It shouldn't be +600)))

BBC 1.0 Should be +100/150 greater than TSCP
The current development version is already as strong a VICE (after tuning evaluation) but I need to do lots of tests before releasing next version.

So it should be just stronger than TSCP.
Also I played to little games.

How many games did BBC already play?
Did it crush?

Gabor Szots · Post by **Gabor Szots** » Mon Sep 28, 2020 11:15 am

maksimKorzh wrote: ↑Mon Sep 28, 2020 10:58 am How many games did BBC already play?
Did it crush?

66 games, no crash.

In another thread you wrote it beat TSCP 15,5-0,5. Based upon that and remaining on the cautious side I selected opponents around 2200. I'm going to change that a bit. I assess in the end its rating will be somewhere near 2000.

No4b · Post by **No4b** » Mon Sep 28, 2020 1:50 pm

mvanthoor wrote: ↑Sun Sep 27, 2020 8:52 pm
No4b wrote: ↑Sun Sep 27, 2020 6:48 pm Well, when i was getting started on my chess variant AI, it was VERY weak. At that time i was yet to watch VICE video and never heard of this forum.
So my engine didnt have Quiescence search, made/unmade moves directly on the UI board (one can only imagine how slow it was), and the only move ordering technique it used was my pretty strange implementation of the Killer moves.
I never measured its strength, but i bet it would be pretty low.

Also i must note that any bugs in the code can contribute to the strenght loss very significantly (i often fixed bugs that cost like ~60 elo).

Another significant elo-eater might be inefficient make\unmake or MoveGen. For example my current AI for unity game has very convoluted Make\Unmake functions, because there are pieces with very different move rules, and it should be considered (f.e. some pieces do not move if they are capturing smth, some do not disappear while promoting, but can do it only once in a game, some do not disseapear if they are captured... etc - this brings in many additional IFs in the code and i bet slowdown engine hard).
That seems to be quite a complicated chess variant. Can I play this somewhere/somehow?

(I did try some chess variants, but some just have too many different pieces with too many different capabilities; it becomes hard to remember what piece can actually perform which moves, and in what situations.)

Well, its a Unity game and its currently work in progress.
I can PM you a link to a previous test version i made for my friends back in june (there are some progress after it, but i didnt do all i wanted yet), the only problem i can see is that all text regarding movesets of the pieces are currently only in Russian, althought i suppose i can briefly describe each one, dont know.

No4b · Post by **No4b** » Mon Sep 28, 2020 1:59 pm

maksimKorzh wrote: ↑Mon Sep 28, 2020 10:58 am
Gabor Szots wrote: ↑Mon Sep 28, 2020 10:53 am I am just testing BBC. First results do not show +600 compared to TSCP, though.
Thanks you so much Gabor.
It shouldn't be +600)))

BBC 1.0 Should be +100/150 greater than TSCP
The current development version is already as strong a VICE (after tuning evaluation) but I need to do lots of tests before releasing next version.

So it should be just stronger than TSCP.
Also I played to little games.

How many games did BBC already play?
Did it crush?

I decided to have a quick match of the BBC 1.0 against Drofa 1.0 (lunix compile vs lunix compile)

Code: Select all

Score of Drofa_v.1.0 vs bbc_1.0_64bit_linux: 13 - 3 - 4 [0.750]
Elo difference: 190.85 +/- 172.61

20 of 20 games finished.

It somewhat confirm your ~100-150 suggestion, although for an accurate result much more games are needed.
As i watched some games unfold, i came to my attention that BBC 1.0 has some sort of a bug, where it prints 0.00 score even in a completely lost positions (see game below). I suppose it is either repetition or TT issue, but could be excessive pruning as well. If this is not fixed yet, i have a feeling that such bug may have really big negative impact on overall strength. If you want, i can PM you archive with all games played.

[pgn][Event "bbc_test"]
[Site "?"]
[Date "2020.09.28"]
[Round "1"]
[White "Drofa_v.1.0"]
[Black "bbc_1.0_64bit_linux"]
[Result "1-0"]
[ECO "A40"]
[Opening "Queen's pawn"]
[PlyCount "65"]
[TimeControl "60+1"]

1. d4 {book} e6 {book} 2. c4 {book} d5 {book} 3. Nc3 {book} Bb4 {book}
4. e3 {book} Nf6 {book} 5. Qb3 {book} Bd6 {book} 6. Bd2 {book} c6 {book}
7. Nf3 {book} O-O {book} 8. Bd3 {book} Nbd7 {book} 9. O-O-O {book} a5 {book}
10. c5 {book} Bc7 {book} 11. e4 {+0.53/8 2.5s} e5 {0.00/10 3.0s}
12. Qa4 {+0.36/8 3.9s} Re8 {0.00/9 2.9s} 13. Bg5 {+0.37/8 2.7s} h6 {0.00/9 2.8s}
14. Bh4 {+0.46/8 3.4s} g5 {0.00/9 2.8s} 15. Bg3 {+0.45/8 2.1s}
exd4 {0.00/10 2.7s} 16. Qxd4 {+0.25/9 3.3s} g4 {0.00/10 2.6s}
17. Nd2 {+0.16/8 3.4s} Bxg3 {+0.52/9 2.6s} 18. hxg3 {+0.45/9 2.7s}
Qe7 {+0.50/9 2.5s} 19. exd5 {+0.49/8 3.2s} Qxc5 {+0.62/9 2.5s}
20. Qf4 {+1.39/9 2.3s} Ne5 {0.00/9 2.5s} 21. Qxh6 {+4.54/8 3.0s}
Nxd3+ {-4.21/9 2.4s} 22. Kb1 {+6.33/9 2.9s} Nh7 {-4.55/9 2.4s}
23. Qxh7+ {+7.25/9 1.9s} Kf8 {0.00/9 2.3s} 24. Qh6+ {+7.25/8 2.8s}
Ke7 {-6.85/9 2.3s} 25. Nde4 {+7.79/8 2.7s} Qb4 {0.00/9 2.3s}
26. Rxd3 {+12.32/8 2.0s} Rg8 {-13.49/9 2.2s} 27. d6+ {+20.71/8 1.6s}
Kd7 {-21.83/9 2.2s} 28. Qf6 {+999.89/8 1.4s} Qxe4 {0.00/10 2.1s}
29. Nxe4 {+999.91/8 2.5s} c5 {-M10/11 2.1s} 30. Qxf7+ {+999.93/8 1.9s}
Kc6 {-M8/11 2.1s} 31. Qc7+ {+999.95/8 2.4s} Kb5 {-M6/11 2.0s}
32. Rb3+ {+999.91/8 2.3s} Ka6 {-M4/12 2.0s}
33. Qb6# {+999.99/9 2.2s, White mates} 1-0[/pgn]

maksimKorzh · Post by **maksimKorzh** » Mon Sep 28, 2020 5:55 pm

No4b wrote: ↑Mon Sep 28, 2020 1:59 pm
maksimKorzh wrote: ↑Mon Sep 28, 2020 10:58 am
Gabor Szots wrote: ↑Mon Sep 28, 2020 10:53 am I am just testing BBC. First results do not show +600 compared to TSCP, though.
Thanks you so much Gabor.
It shouldn't be +600)))

BBC 1.0 Should be +100/150 greater than TSCP
The current development version is already as strong a VICE (after tuning evaluation) but I need to do lots of tests before releasing next version.

So it should be just stronger than TSCP.
Also I played to little games.

How many games did BBC already play?
Did it crush?
I decided to have a quick match of the BBC 1.0 against Drofa 1.0 (lunix compile vs lunix compile)
Code: Select all
Score of Drofa_v.1.0 vs bbc_1.0_64bit_linux: 13 - 3 - 4 [0.750]
Elo difference: 190.85 +/- 172.61

20 of 20 games finished.
It somewhat confirm your ~100-150 suggestion, although for an accurate result much more games are needed.
As i watched some games unfold, i came to my attention that BBC 1.0 has some sort of a bug, where it prints 0.00 score even in a completely lost positions (see game below). I suppose it is either repetition or TT issue, but could be excessive pruning as well. If this is not fixed yet, i have a feeling that such bug may have really big negative impact on overall strength. If you want, i can PM you archive with all games played.

[pgn][Event "bbc_test"]
[Site "?"]
[Date "2020.09.28"]
[Round "1"]
[White "Drofa_v.1.0"]
[Black "bbc_1.0_64bit_linux"]
[Result "1-0"]
[ECO "A40"]
[Opening "Queen's pawn"]
[PlyCount "65"]
[TimeControl "60+1"]

1. d4 {book} e6 {book} 2. c4 {book} d5 {book} 3. Nc3 {book} Bb4 {book}
4. e3 {book} Nf6 {book} 5. Qb3 {book} Bd6 {book} 6. Bd2 {book} c6 {book}
7. Nf3 {book} O-O {book} 8. Bd3 {book} Nbd7 {book} 9. O-O-O {book} a5 {book}
10. c5 {book} Bc7 {book} 11. e4 {+0.53/8 2.5s} e5 {0.00/10 3.0s}
12. Qa4 {+0.36/8 3.9s} Re8 {0.00/9 2.9s} 13. Bg5 {+0.37/8 2.7s} h6 {0.00/9 2.8s}
14. Bh4 {+0.46/8 3.4s} g5 {0.00/9 2.8s} 15. Bg3 {+0.45/8 2.1s}
exd4 {0.00/10 2.7s} 16. Qxd4 {+0.25/9 3.3s} g4 {0.00/10 2.6s}
17. Nd2 {+0.16/8 3.4s} Bxg3 {+0.52/9 2.6s} 18. hxg3 {+0.45/9 2.7s}
Qe7 {+0.50/9 2.5s} 19. exd5 {+0.49/8 3.2s} Qxc5 {+0.62/9 2.5s}
20. Qf4 {+1.39/9 2.3s} Ne5 {0.00/9 2.5s} 21. Qxh6 {+4.54/8 3.0s}
Nxd3+ {-4.21/9 2.4s} 22. Kb1 {+6.33/9 2.9s} Nh7 {-4.55/9 2.4s}
23. Qxh7+ {+7.25/9 1.9s} Kf8 {0.00/9 2.3s} 24. Qh6+ {+7.25/8 2.8s}
Ke7 {-6.85/9 2.3s} 25. Nde4 {+7.79/8 2.7s} Qb4 {0.00/9 2.3s}
26. Rxd3 {+12.32/8 2.0s} Rg8 {-13.49/9 2.2s} 27. d6+ {+20.71/8 1.6s}yea
Kd7 {-21.83/9 2.2s} 28. Qf6 {+999.89/8 1.4s} Qxe4 {0.00/10 2.1s}
29. Nxe4 {+999.91/8 2.5s} c5 {-M10/11 2.1s} 30. Qxf7+ {+999.93/8 1.9s}
Kc6 {-M8/11 2.1s} 31. Qc7+ {+999.95/8 2.4s} Kb5 {-M6/11 2.0s}
32. Rb3+ {+999.91/8 2.3s} Ka6 {-M4/12 2.0s}
33. Qb6# {+999.99/9 2.2s, White mates} 1-0[/pgn]

Yeah, send me PGNs please.

maksimKorzh · Post by **maksimKorzh** » Mon Sep 28, 2020 6:03 pm

Gabor Szots wrote: ↑Mon Sep 28, 2020 11:15 am
maksimKorzh wrote: ↑Mon Sep 28, 2020 10:58 am How many games did BBC already play?
Did it crush?
66 games, no crash.
In another thread you wrote it beat TSCP 15,5-0,5. Based upon that and remaining on the cautious side I selected opponents around 2200. I'm going to change that a bit. I assess in the end its rating will be somewhere near 2000.

Thank you Gabor. Even 2000 seems a bit too much. I think it should be around 1950 because version 1.0 is weaker than VICE which is around 2000. I'm now fixing bugs and also improved evaluation so next version should be much stronger.

Re: result vs TSCP
- that was 30 sec + 0, in 2min +1sec result should be worth for bbc. It happens due to the difference in search depth, it's more critical on ultra short time controls.

Gabor Szots · Post by **Gabor Szots** » Mon Sep 28, 2020 7:54 pm

maksimKorzh wrote: ↑Mon Sep 28, 2020 6:03 pmRe: result vs TSCP
- that was 30 sec + 0, in 2min +1sec result should be worth for bbc. It happens due to the difference in search depth, it's more critical on ultra short time controls.

Christophe Théron, author of Chess Tiger said once: if an engine is sensitive to the time control, then it is badly written.

How to rate my engine in CCRL?

Re: How to rate my engine in CCRL?

Re: How to rate my engine in CCRL?

Re: How to rate my engine in CCRL?

Re: How to rate my engine in CCRL?

Re: How to rate my engine in CCRL?

Re: How to rate my engine in CCRL?

Re: How to rate my engine in CCRL?

Re: How to rate my engine in CCRL?

Re: How to rate my engine in CCRL?

Re: How to rate my engine in CCRL?