Weird results Tunguska 1.1 vs Jonny 4.0

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Necromancer
Posts: 33
Joined: Wed Nov 23, 2016 1:30 am
Location: Brazil

Weird results Tunguska 1.1 vs Jonny 4.0

Post by Necromancer »

So while browsing my engine results at

http://ccrl.chessdom.com/ccrl/404/cgi/e ... 1_1_64-bit

I saw this:
Tunguska 1.1 (2471) vs Jonny 4.0 (2746)
+36−5=8
It's a 275 ELO difference, so Tunguska winning chances are ~17%. I saw some games and they look normal. The weird thing is that Jonny appears to play well against engines of it's own level. Maybe a bug in the stronger engine?
The truth comes from inside.
https://github.com/fernandotenorio/Tunguska
Dann Corbit
Posts: 12538
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Weird results Tunguska 1.1 vs Jonny 4.0

Post by Dann Corbit »

Lots of possible explanations.
Some engines have a nemesis that simply beats them better than one would expect given the Elo difference, even over a large number of trials.
The game count is small. Random fluctuation can cause all sorts of strange looking things with just a few trials.
Jonny seems to perform well with a giant pile of cores. Was the test single threaded?
Jonny could be misconfigured.

There are many other possibilities.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
Necromancer
Posts: 33
Joined: Wed Nov 23, 2016 1:30 am
Location: Brazil

Re: Weird results Tunguska 1.1 vs Jonny 4.0

Post by Necromancer »

Dann Corbit wrote: Tue May 14, 2019 7:07 am Was the test single threaded?
Jonny could be misconfigured.

There are many other possibilities.
I don't know, it's from CCRL 40/4. Reading about Jonny
...Jonny uses a 0x88 board representation, and applies a sophisticated distributed and parallel search.
So maybe that was the problem, thanks!
The truth comes from inside.
https://github.com/fernandotenorio/Tunguska
User avatar
Guenther
Posts: 4605
Joined: Wed Oct 01, 2008 6:33 am
Location: Regensburg, Germany
Full name: Guenther Simon

Re: Weird results Tunguska 1.1 vs Jonny 4.0

Post by Guenther »

Necromancer wrote: Tue May 14, 2019 7:26 am
Dann Corbit wrote: Tue May 14, 2019 7:07 am Was the test single threaded?
Jonny could be misconfigured.

There are many other possibilities.
I don't know, it's from CCRL 40/4. Reading about Jonny
...Jonny uses a 0x88 board representation, and applies a sophisticated distributed and parallel search.
So maybe that was the problem, thanks!
Normally we would need eval/depth (which is not available for the 40/4 games) to investigate the case the best (missconfigs or cpu overloads happen...), but in this case it is extremely unlikely that Jonny 4.00 was hit 3 times by an asteroid in the last 2 or 3 months.

I am saying this, because Jonny 4.00 suffered the same extreme outlier negative result, not only vs. Tunguska,
but also vs. Topple and FranMAD. I am convinced, something went wrong here.

Code: Select all

–   Topple 0.5.0 64-bit 4CPU    2845    +24 -24 (+99)   0.5 - 44.5 (+0-44=1) 1.1% 0.5/45   0.0%   -590
–   Francesca MAD 0.21 64-bit   2709    +20 -20 (-37)   1.5 - 37.5 (+0-36=3) 3.8% 1.5/39 100.0%   -532
A fishy example game below.
(even w/o eval/depth it should be possible to analyse, if Jonny did not reach normal depth here to lose that way)
I checked a few positions after the opening and Jonny played often moves, which are already discarded here at depth 10-12
after one second...(on my slow 10 years old quadcore - 1 cpu of course), e.g. 13. Qa4?? and others.

[pgn][Event "CCRL 40/4"] [Site "CCRL"] [Date "2019.02.23"] [Round "469.4.161"] [White "Jonny 4.00"] [Black "Francesca MAD 0.21 64-bit"] [Result "0-1"] [ECO "A04"] [Opening "Reti opening"] [PlyCount "64"] [WhiteElo "2746"] [BlackElo "2709"] 1. Nf3 e6 2. c4 b6 3. d4 Bb7 4. Nc3 Nf6 5. Bf4 Be7 6. h4 O-O 7. h5 d5 8. e3 c5 9. h6 g6 10. dxc5 bxc5 11. cxd5 exd5 12. Bb5 Nc6 13. Qa4 Qb6 14. O-O-O Rfd8 15. Ne5 Na5 16. f3 a6 17. Bd7 Bd6 18. Bh3 Bxe5 19. Bxe5 Nc4 20. Nxd5 Bxd5 21. Rxd5 Rxd5 22. Bxf6 Nxe3 23. Qf4 c4 24. Re1 Rd3 25. a4 Qb3 26. Kb1 Nd5 27. Bf5 c3 28. Qc1 Nb4 29. Re8+ Rxe8 30. Be6 c2+ 31. Qxc2 Qxc2+ 32. Ka1 Rd1# 0-1[/pgn]

Code: Select all

[Event "CCRL 40/4"]
[Site "CCRL"]
[Date "2019.02.23"]
[Round "469.4.161"]
[White "Jonny 4.00"]
[Black "Francesca MAD 0.21 64-bit"]
[Result "0-1"]
[ECO "A04"]
[Opening "Reti opening"]
[PlyCount "64"]
[WhiteElo "2746"]
[BlackElo "2709"]

1. Nf3 e6 2. c4 b6 3. d4 Bb7 4. Nc3 Nf6 5. Bf4 Be7 6. h4 O-O 7. h5 d5 8. e3 c5
9. h6 g6 10. dxc5 bxc5 11. cxd5 exd5 12. Bb5 Nc6 13. Qa4 Qb6 14. O-O-O Rfd8 15.
Ne5 Na5 16. f3 a6 17. Bd7 Bd6 18. Bh3 Bxe5 19. Bxe5 Nc4 20. Nxd5 Bxd5 21. Rxd5
Rxd5 22. Bxf6 Nxe3 23. Qf4 c4 24. Re1 Rd3 25. a4 Qb3 26. Kb1 Nd5 27. Bf5 c3 28.
Qc1 Nb4 29. Re8+ Rxe8 30. Be6 c2+ 31. Qxc2 Qxc2+ 32. Ka1 Rd1# 0-1
https://rwbc-chess.de

trollwatch:
Chessqueen + chessica + AlexChess + Eduard + Sylwy
User avatar
Graham Banks
Posts: 41415
Joined: Sun Feb 26, 2006 10:52 am
Location: Auckland, NZ

Re: Weird results Tunguska 1.1 vs Jonny 4.0

Post by Graham Banks »

Sergio ran the Tunguska v Jonny games under Arena 3.51.
Might pay to check with him.
gbanksnz at gmail.com
User avatar
xr_a_y
Posts: 1871
Joined: Sat Nov 25, 2017 2:28 pm
Location: France

Re: Weird results Tunguska 1.1 vs Jonny 4.0

Post by xr_a_y »

Same thing appends with Minic 0.47 on CCRL 40/4.
User avatar
Guenther
Posts: 4605
Joined: Wed Oct 01, 2008 6:33 am
Location: Regensburg, Germany
Full name: Guenther Simon

Re: Weird results Tunguska 1.1 vs Jonny 4.0

Post by Guenther »

Guenther wrote: Tue May 14, 2019 9:40 am
Normally we would need eval/depth (which is not available for the 40/4 games) to investigate the case the best (missconfigs or cpu overloads happen...), but in this case it is extremely unlikely that Jonny 4.00 was hit 3 times by an asteroid in the last 2 or 3 months.

...

Code: Select all

–   Topple 0.5.0 64-bit 4CPU    2845    +24 -24 (+99)   0.5 - 44.5 (+0-44=1) 1.1% 0.5/45   0.0%   -590
–   Francesca MAD 0.21 64-bit   2709    +20 -20 (-37)   1.5 - 37.5 (+0-36=3) 3.8% 1.5/39 100.0%   -532
Around that time Jonny 4.00 had two other results vs. FranMad 0.22 and 0.23:
14.5 : 20.5 FranMAD 0.22
16.5 : 15.5 FranMAD 0.23
both inside normal error bars, unlike the result vs. FranMAD 0.21.

After downloading the 40/4 games file of Jonny 4.00, it appears that the problem is at least
manifested since February 2019 and there are other completely unlikely bad results and
lots of quick strange losses in the pgn. Probably all played on the same quirky setup.
(Inbetween there are normal results)

Code: Select all

CCRL 40/4  2019 (Jonny 4.00 games in 2019)

Jonny 4.00    2746 - Winter 0.4a 64-bit           2811   10.0 - 22.0    +6/=8/-18    31.25%
Jonny 4.00    2746 - Topple 0.3.4 64-bit          2664   23.5 - 7.5    +20/=7/-4     75.81%
Jonny 4.00    2746 - Francesca MAD 0.21 64-bit    2709    1.5 - 37.5    +0/=3/-36     3.85% XXX
Jonny 4.00    2746 - Topple 0.3.5 64-bit          2701   15.0 - 15.0    +12/=6/-12   50.00%
Jonny 4.00    2746 - chess22k 1.12 64-bit         3083    0.0 - 1.0     +0/=0/-1      0.00%
Jonny 4.00    2746 - Dirty CUCUMBER 64-bit        2928    0.0 - 1.0     +0/=0/-1      0.00%
Jonny 4.00    2746 - Amyan 1.72                   2604    0.0 - 1.0     +0/=0/-1      0.00%
Jonny 4.00    2746 - Floyd 0.9 64-bit             2585    0.0 - 1.0     +0/=0/-1      0.00%
Jonny 4.00    2746 - Ruffian 2.1.0                2609    0.5 - 0.5     +0/=1/-0     50.00%
Jonny 4.00    2746 - Nebula 2.0 64-bit            2656    0.5 - 0.5     +0/=1/-0     50.00%
Jonny 4.00    2746 - Pharaon 3.5.1                2604    1.0 - 0.0     +1/=0/-0    100.00%
Jonny 4.00    2746 - Ktulu 9                      2782    0.5 - 0.5     +0/=1/-0     50.00%
Jonny 4.00    2746 - BugChess2 1.9 64-bit         2758    1.0 - 0.0     +1/=0/-0    100.00%
Jonny 4.00    2746 - Gaviota 1.0 64-bit           2871    0.0 - 1.0     +0/=0/-1      0.00%
Jonny 4.00    2746 - Gogobello 1.4 64-bit         2756    1.0 - 0.0     +1/=0/-0    100.00%
Jonny 4.00    2746 - Delfi 5.4                    2683    1.0 - 0.0     +1/=0/-0    100.00%
Jonny 4.00    2746 - Gogobello 2.0 64-bit         2834   15.0 - 17.0   +11/=8/-13    46.88%
Jonny 4.00    2746 - Francesca MAD 0.22 64-bit    2730   14.5 - 20.5   +11/=7/-17    41.43%
Jonny 4.00    2746 - Igel 1.4 64-bit              2634   22.0 - 10.0   +19/=6/-7     68.75%
Jonny 4.00    2746 - Fridolin 3.10 64-bit 4CPU    2796    3.0 - 26.0    +0/=6/-23    10.34% XXX
Jonny 4.00    2746 - Topple 0.5.0 64-bit          2829   10.0 - 22.0    +5/=10/-17   31.25%
Jonny 4.00    2746 - Topple 0.5.0 64-bit 4CPU     2845    0.5 - 44.5    +0/=1/-44     1.11% XXX
Jonny 4.00    2746 - Minic 0.47 64-bit 4CPU       2880    5.0 - 42.0    +1/=8/-38    10.64% XXX
Jonny 4.00    2746 - Francesca MAD 0.23 64-bit    2771   16.5 - 15.5   +12/=9/-11    51.56%
Jonny 4.00    2746 - Tunguska 1.1 64-bit          2471    9.0 - 40.0    +5/=8/-36    18.37% XXX


https://rwbc-chess.de

trollwatch:
Chessqueen + chessica + AlexChess + Eduard + Sylwy
User avatar
Guenther
Posts: 4605
Joined: Wed Oct 01, 2008 6:33 am
Location: Regensburg, Germany
Full name: Guenther Simon

Re: Weird results Tunguska 1.1 vs Jonny 4.0

Post by Guenther »

xr_a_y wrote: Tue May 14, 2019 10:23 am Same thing appends with Minic 0.47 on CCRL 40/4.
Yes that's true, I have added a bigger result list for whole 2019 now regarding Jonny 4.00.
(irregular results marked with 'XXX')
https://rwbc-chess.de

trollwatch:
Chessqueen + chessica + AlexChess + Eduard + Sylwy
User avatar
xr_a_y
Posts: 1871
Joined: Sat Nov 25, 2017 2:28 pm
Location: France

Re: Weird results Tunguska 1.1 vs Jonny 4.0

Post by xr_a_y »

I checked some of the game, there are not that bad.
User avatar
Guenther
Posts: 4605
Joined: Wed Oct 01, 2008 6:33 am
Location: Regensburg, Germany
Full name: Guenther Simon

Re: Weird results Tunguska 1.1 vs Jonny 4.0

Post by Guenther »

Removed the irrelevant single game matches from gauntlets.
Jonny 4.00 CCRL 40/4 2017 -2019:

Again 2 extreme outliers already in 2017, both in December.

Check this example game, 20. f4? and 21.Qd6?? appear only for a fraction of a second
in Jonny 4.00's search at depth 6-8 or so... (hash filled stepping through the game quickly)

[pgn][Event "CCRL 40/4"] [Site "CCRL"] [Date "2017.12.14"] [Round "148.5"] [White "Jonny 4.00"] [Black "Scorpio 2.7.8 64-bit"] [Result "0-1"] [ECO "D45"] [WhiteElo "2746"] [BlackElo "2861"] [PlyCount "47"] [EventDate "2017.??.??"] 1. d4 d5 2. Nf3 c6 3. c4 Nf6 4. e3 e6 5. Nc3 Nbd7 6. Qc2 Bd6 7. Bd3 O-O 8. O-O dxc4 9. Bxc4 b5 10. Bd3 Bb7 11. Bd2 b4 12. Na4 c5 13. dxc5 Nxc5 14. Nxc5 Bxc5 15. Qxc5 Qxd3 16. Bxb4 Bxf3 17. gxf3 Nd5 18. Ba3 Qg6+ 19. Kh1 Qh5 20. f4 Rfc8 21. Qd6 Qf3+ 22. Kg1 Nxe3 23. fxe3 Qg4+ 24. Kh1 0-1[/pgn]

or this one:
16. Rxe8?? 17. Qxf5??? never appear here in Jonny 4.00's search at least from depth 4 or so
(this game is from an omitted mini match - gauntlet)

[pgn][Event "CCRL 40/4"] [Site "CCRL"] [Date "2017.09.27"] [Round "140.6"] [White "Jonny 4.00"] [Black "Chronos 1.9.9 64-bit"] [Result "0-1"] [ECO "C89"] [WhiteElo "2746"] [BlackElo "2739"] [PlyCount "38"] [EventDate "2017.??.??"] 1. e4 e5 2. Nf3 Nc6 3. Bb5 a6 4. Ba4 Nf6 5. O-O Be7 6. Re1 b5 7. Bb3 O-O 8. c3 d5 9. exd5 Nxd5 10. Nxe5 Nxe5 11. Rxe5 c6 12. Re1 Bd6 13. d4 Bf5 14. Nd2 Qc7 15. Qf3 Rfe8 16. Rxe8+ Rxe8 17. Qxf5 Re1+ 18. Nf1 Bxh2+ 19. Kh1 Rxf1# 0-1[/pgn]

Code: Select all

CCRL 40/4  2017

Jonny 4.00    2746 - Laser 1.3 64-bit                 2948    9.0 - 28.0     +5/=8/-24    24.32%
Jonny 4.00    2746 - Amoeba 2.1 64-bit                2747   25.5 - 14.5    +20/=11/-9    63.75%
Jonny 4.00    2746 - Tornado 8.0 64-bit               2825   11.5 - 22.5     +7/=9/-18    33.82%
Jonny 4.00    2746 - Zurichess Jura 64-bit            2773   20.0 - 12.0    +16/=8/-8     62.50%
Jonny 4.00    2746 - Carballo 1.7 64-bit              2723   14.5 - 19.5    +10/=9/-15    42.65%
Jonny 4.00    2746 - ChessBrainVB 3.20                2803   21.0 - 29.0    +15/=12/-23   42.00%
Jonny 4.00    2746 - Amoeba 2.3 64-bit                2782   22.5 - 27.5    +13/=19/-18   45.00%
Jonny 4.00    2746 - Cheese 1.9 64-bit                2731   22.5 - 8.5     +19/=7/-5     72.58%
Jonny 4.00    2746 - RuyDos 1.0.2 64-bit              2669   19.5 - 9.5     +14/=11/-4    67.24%
Jonny 4.00    2746 - Zurichess Luzern 64-bit          2842   13.0 - 18.0     +7/=12/-12   41.94%
Jonny 4.00    2746 - ChessBrainVB 3.31                2828   15.5 - 12.5    +10/=11/-7    55.36%
Jonny 4.00    2746 - chess22k 1.4 64-bit              2689   20.0 - 12.0    +12/=16/-4    62.50%
Jonny 4.00    2746 - Gandalf 7 64-bit                 2668   23.0 - 9.0     +18/=10/-4    71.88%
Jonny 4.00    2746 - chess22k 1.5 64-bit              2738   16.5 - 15.5    +12/=9/-11    51.56%
Jonny 4.00    2746 - GNU Chess 6.25 64-bit            2683    6.5 - 23.5     +3/=7/-20    21.67%
Jonny 4.00    2746 - Defenchess (SCTR) 1.0 64-bit     2843   11.5 - 14.5     +7/=9/-10    44.23%
Jonny 4.00    2746 - RuyDos 1.0.27 64-bit             2758   14.0 - 14.0     +9/=10/-9    50.00%
Jonny 4.00    2746 - Fruit 2.3.1                      2780   10.5 - 15.5     +7/=7/-12    40.38%
Jonny 4.00    2746 - Ethereal 8.28 64-bit             2754   30.5 - 19.5    +23/=15/-12   61.00%
Jonny 4.00    2746 - Marvin 2.2.0 64-bit              2697   19.5 - 12.5    +15/=9/-8     60.94%
Jonny 4.00    2746 - ECE X3 64-bit                    2656   20.5 - 11.5    +17/=7/-8     64.06%
Jonny 4.00    2746 - The Baron 3.41 64-bit            2823    5.5 - 24.5     +3/=5/-22    18.33%
Jonny 4.00    2746 - Devel 1.8090                     2707   19.0 - 13.0    +16/=6/-10    59.38%
Jonny 4.00    2746 - Ethereal 8.37 64-bit             2825   12.5 - 19.5     +8/=9/-15    39.06%
Jonny 4.00    2746 - chess22k 1.6 64-bit              2829   10.5 - 21.5     +5/=11/-16   32.81%
Jonny 4.00    2746 - Defenchess (SCTR) 1.1e 64-bit    3035    7.0 - 25.0     +4/=6/-22    21.88%
Jonny 4.00    2746 - Scorpio 2.7.8 64-bit             2861    0.5 - 29.5     +0/=1/-29     1.67% XXX
Jonny 4.00    2746 - Tucano 7.00 64-bit               2871    3.5 - 25.5     +1/=5/-23    12.07% XXX

CCRL 40/4  2018

Jonny 4.00    2746 - Scorpio 2.7.9 64-bit             2883   14.0 - 42.0     +4/=20/-32   25.00%
Jonny 4.00    2746 - GreKo 2017 64-bit                2614   24.0 -  8.0    +21/=6/-5     75.00%
Jonny 4.00    2746 - Karballo 1.8 64-bit              2753   15.0 - 51.0     +5/=20/-41   22.73%
Jonny 4.00    2746 - Shield 2.1 64-bit                2735   18.5 - 12.5    +12/=13/-6    59.68%
Jonny 4.00    2746 - Marvin 3.0.0 64-bit              2706   19.0 - 14.0    +14/=10/-9    57.58%
Jonny 4.00    2746 - RuyDos 1.1.0 64-bit              2777   21.0 - 40.0    +15/=12/-34   34.43%
Jonny 4.00    2746 - Daydreamer 2.0.0-pre2 64-bit     2896    8.5 - 21.5     +4/=9/-17    28.33%
Jonny 4.00    2746 - Devel 2.0000                     2713   32.5 - 29.5    +20/=25/-17   52.42%
Jonny 4.00    2746 - Godel 4.0.7 64-bit               2805   30.5 - 81.5    +20/=21/-71   27.23%
Jonny 4.00    2746 - Marvin 3.1.0 64-bit              2789   24.0 - 56.0    +17/=14/-49   30.00%
Jonny 4.00    2746 - Counter 2.9 64-bit               2715   26.0 - 43.0    +19/=14/-36   37.68%
Jonny 4.00    2746 - RubiChess 1.0 64-bit             2651   42.0 - 18.0    +34/=16/-10   70.00%
Jonny 4.00    2746 - Pirarucu 2.3.8 64-bit            2865   12.0 - 20.0     +7/=10/-15   37.50%
Jonny 4.00    2746 - RofChade 1.0 64-bit              2792   13.5 - 18.5     +9/=9/-14    42.19%
Jonny 4.00    2746 - GreKo 2018.08 64-bit             2717   19.0 - 14.0    +14/=10/-9    57.58%
Jonny 4.00    2746 - RubiChess 1.1 64-bit             2780   13.5 - 18.5     +9/=9/-14    42.19%
Jonny 4.00    2746 - Monolith 1.0 64-bit              2818   15.0 - 17.0     +9/=12/-11   46.88%
Jonny 4.00    2746 - Donna 4.1 64-bit                 2700   36.0 - 26.0    +26/=20/-16   58.06%
Jonny 4.00    2746 - Marvin 3.2.0 64-bit              2826   10.0 - 22.0     +6/=8/-18    31.25%
Jonny 4.00    2746 - Cheese 2.0 64-bit                2765   14.5 - 17.5    +11/=7/-14    45.31%
Jonny 4.00    2746 - Winter 0.3 64-bit                2739   11.5 - 17.5     +8/=7/-14    39.66%
Jonny 4.00    2746 - RofChade 2.0 64-bit              3129    3.0 - 20.0     +0/=6/-17    13.04%
Jonny 4.00    2746 - Arminius 2018-12-23 64-bit       2760   10.0 - 18.0     +9/=2/-17    35.71%

CCRL 40/4  2019

Jonny 4.00    2746 - Winter 0.4a 64-bit           2811   10.0 - 22.0    +6/=8/-18    31.25%
Jonny 4.00    2746 - Topple 0.3.4 64-bit          2664   23.5 - 7.5    +20/=7/-4     75.81%
Jonny 4.00    2746 - Francesca MAD 0.21 64-bit    2709    1.5 - 37.5    +0/=3/-36     3.85% XXX
Jonny 4.00    2746 - Topple 0.3.5 64-bit          2701   15.0 - 15.0    +12/=6/-12   50.00%
Jonny 4.00    2746 - Gogobello 2.0 64-bit         2834   15.0 - 17.0   +11/=8/-13    46.88%
Jonny 4.00    2746 - Francesca MAD 0.22 64-bit    2730   14.5 - 20.5   +11/=7/-17    41.43%
Jonny 4.00    2746 - Igel 1.4 64-bit              2634   22.0 - 10.0   +19/=6/-7     68.75%
Jonny 4.00    2746 - Fridolin 3.10 64-bit 4CPU    2796    3.0 - 26.0    +0/=6/-23    10.34% XXX
Jonny 4.00    2746 - Topple 0.5.0 64-bit          2829   10.0 - 22.0    +5/=10/-17   31.25%
Jonny 4.00    2746 - Topple 0.5.0 64-bit 4CPU     2845    0.5 - 44.5    +0/=1/-44     1.11% XXX
Jonny 4.00    2746 - Minic 0.47 64-bit 4CPU       2880    5.0 - 42.0    +1/=8/-38    10.64% XXX
Jonny 4.00    2746 - Francesca MAD 0.23 64-bit    2771   16.5 - 15.5   +12/=9/-11    51.56%
Jonny 4.00    2746 - Tunguska 1.1 64-bit          2471    9.0 - 40.0    +5/=8/-36    18.37% XXX
https://rwbc-chess.de

trollwatch:
Chessqueen + chessica + AlexChess + Eduard + Sylwy