MRI - Match Result Inspector

Discussion of anything and everything relating to chess playing software and machines.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
User avatar
Rebel
Posts: 4894
Joined: Thu Aug 18, 2011 10:04 am

MRI - Match Result Inspector

Post by Rebel » Mon Feb 10, 2020 11:10 am

MRI (Match Result Inspector) is a tool to extract valuable information from a PGN engine-engine match provided the PGN output is created by ChessBase, Cutechess or Arena.

It's currently in alpha stage (not downloadable yet) and I prefer some feedback for improvements, more ideas before a beta release.

Image

Functions:
1. Suspect opening lines overview
2. Crazy games (incompatible scores)
3. Games that should have been won
4. Drop in score (horizon effects mostly)
5. Double opening detector
6. Lost games analysis


Examples:

1. Suspect opening lines overview

Time to remove opening lines like these to remove from your test set?

2. Crazy games (incompatible scores)
When 2 engines both show a very good positive score then usually one is wrong.

Obviously ProDeo overestimated its passed pawns evaluation.


Ethereal too optimistic.

3. Games that should have been won
When an engine shows a score of +9.92 one might expect it to win the game, not so for Stockfish 8.


Whoops, a bug.

4. Drop in score (horizon effects mostly)


5. Double opening detector

Code: Select all

1. e4 {book} e6 {book} 2. d4 {book} d5 {book} 3. e5 {book} c5 {book}
4. c3 {book} Nc6 {book} 5. Nf3 {book} Bd7 {book} 6. Be2 {book} f6 {book}
7. O-O {book} Qc7 {book} 8. Re1 {book} O-O-O {book} 9. Bb5 {+0.54/17 7.5s}
and:

Code: Select all

1. e4 {book} e6 {book} 2. d4 {book} d5 {book} 3. e5 {book} c5 {book}
4. c3 {book} Nc6 {book} 5. Nf3 {book} f6 {book} 6. Bd3 {book} Qc7 {book}
7. O-O {book} Bd7 {book} 8. Re1 {book} O-O-O {book} 9. Bb5 {+0.61/17 7.3s}
end in the same start position at move 9.

6. Lost games analysis
This option tries to find the moment where an engine starts to lose, it's not perfect but valuable it is.
90% of coding is debugging, the other 10% is writing bugs.

User avatar
Rebel
Posts: 4894
Joined: Thu Aug 18, 2011 10:04 am

Re: MRI - Match Result Inspector

Post by Rebel » Mon Feb 10, 2020 11:17 am

MRI output of a 1000 game match between Stockfish 8 and Komodo 10 at 40/80.

http://rebel13.nl/output.htm
90% of coding is debugging, the other 10% is writing bugs.

Ratosh
Posts: 75
Joined: Mon Apr 16, 2018 4:56 pm

Re: MRI - Match Result Inspector

Post by Ratosh » Tue Feb 11, 2020 8:32 pm

Great tool! Some things i would like to see:
Report:
  • Window for`Drop` (able to find X score drop in Y plies).
  • Show FEN positions in html report (easier to see/copy the FEN).
  • Phase overview in functions (Like the phase overview, but for functions - e.g. number of score drops per phase)
Functions:
  • Reverse games with different outcome (Show first diverged move and score).
Really like pgn output files.

Leo
Posts: 868
Joined: Fri Sep 16, 2016 4:55 pm
Location: USA/Minnesota
Full name: Leo Anger

Re: MRI - Match Result Inspector

Post by Leo » Tue Feb 11, 2020 8:49 pm

Looks interesting and useful.
Advanced Micro Devices fan.

Dann Corbit
Posts: 10305
Joined: Wed Mar 08, 2006 7:57 pm
Location: Redmond, WA USA
Contact:

Re: MRI - Match Result Inspector

Post by Dann Corbit » Tue Feb 11, 2020 9:08 pm

Suggestion:
Emit fully decorated EPD (with all the analysis from the logs) for any of the unusual data points found (inverted score, sudden drop, etc.)
It would be useful for building test suites that are tuned to an engines specific problems.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.

User avatar
Rebel
Posts: 4894
Joined: Thu Aug 18, 2011 10:04 am

Re: MRI - Match Result Inspector

Post by Rebel » Tue Feb 11, 2020 11:53 pm

Ratosh wrote:
Tue Feb 11, 2020 8:32 pm
Great tool! Some things i would like to see:
Report:
  • Window for`Drop` (able to find X score drop in Y plies).
Click on view, ir's all in the created PGN.
Ratosh wrote:
Tue Feb 11, 2020 8:32 pm
[*]Show FEN positions in html report (easier to see/copy the FEN)
Makes sense.
Ratosh wrote:
Tue Feb 11, 2020 8:32 pm
[*]Phase overview in functions (Like the phase overview, but for functions - e.g. number of score drops per phase)
[/list]
I rewrote the Phase Overview, an example of a match when developng Benjamin against Fruit 2.3

Code: Select all

Phase          Won Games (numbers)           Late Endgame           Match
Overview       MIDG END1 END2 END3      QUEEN ROOK LIGHT PAWN       Score
Fruit 2.3      1065   56  699  281         24   79    62    4   2724.5 (54.5%)
Benjamin        991   24  458  184         21   48    39    2   2275.5 (45.5%)

Phase              Won games %               Late Endgame           Match
Overview       MIDG END1 END2 END3      QUEEN ROOK LIGHT PAWN       Score
Fruit 2.3      51.8 70.0 60.4 60.4       53.3 62.2  61.4 66.7   2724.5 (54.5%)
Benjamin       48.2 30.0 39.6 39.6       46.7 37.8  38.6 33.3   2275.5 (45.5%)

Depths         MIDG END1 END2 END3     BOOK             TIME
Benjamin       11.5 12.0 13.6 15.3     8.0 (moves)      0:00
Fruit 2.3      11.6 12.1 15.8 18.7     8.0 (moves)      0:00
What immediately springs in mind is Benjamin's weak point, the endgame and looking at the depths it's likely outsearched.
Ratosh wrote:
Tue Feb 11, 2020 8:32 pm
Functions:
  • Reverse games with different outcome (Show first diverged move and score).
Nice idea, I am afraid that list will be long.
Ratosh wrote:
Tue Feb 11, 2020 8:32 pm
Really like pgn output files.
Thank you.
90% of coding is debugging, the other 10% is writing bugs.

User avatar
Rebel
Posts: 4894
Joined: Thu Aug 18, 2011 10:04 am

Re: MRI - Match Result Inspector

Post by Rebel » Tue Feb 11, 2020 11:56 pm

Dann Corbit wrote:
Tue Feb 11, 2020 9:08 pm
Suggestion:
Emit fully decorated EPD (with all the analysis from the logs) for any of the unusual data points found (inverted score, sudden drop, etc.)
It would be useful for building test suites that are tuned to an engines specific problems.
Will do.
90% of coding is debugging, the other 10% is writing bugs.

Ferdy
Posts: 4189
Joined: Sun Aug 10, 2008 1:15 pm
Location: Philippines

Re: MRI - Match Result Inspector

Post by Ferdy » Wed Feb 12, 2020 8:59 am

Rebel wrote:
Mon Feb 10, 2020 11:10 am
6. Lost games analysis
All good, and I like that feature, showing the fen where it first made a suboptimal move.
Two things:
1. A single blunder that cost the game
2. An initial small mistake that leads to defeat

Guenther
Posts: 3278
Joined: Wed Oct 01, 2008 4:33 am
Location: Regensburg, Germany
Full name: Guenther Simon
Contact:

Re: MRI - Match Result Inspector

Post by Guenther » Wed Feb 12, 2020 9:27 am

Ratosh wrote:
Tue Feb 11, 2020 8:32 pm
Great tool! Some things i would like to see:
Report:
  • Window for`Drop` (able to find X score drop in Y plies).
  • Show FEN positions in html report (easier to see/copy the FEN).

    ...
The two points above exist already in Toms Game Analyser and it even has a GUI and eval graphs.

viewtopic.php?f=7&t=66554&p=750330&hili ... er#p750330
viewtopic.php?t=62066&highlight=game+analyser
Last edited by Guenther on Wed Feb 12, 2020 9:53 am, edited 1 time in total.
https://rwbc-chess.de/chronology.htm
--------------------------------------------------
The troll explosion at talkchess:
https://docs.google.com/spreadsheets/d/ ... KSptBx9AUs

Guenther
Posts: 3278
Joined: Wed Oct 01, 2008 4:33 am
Location: Regensburg, Germany
Full name: Guenther Simon
Contact:

Re: MRI - Match Result Inspector

Post by Guenther » Wed Feb 12, 2020 9:47 am

Rebel wrote:
Mon Feb 10, 2020 11:10 am
MRI (Match Result Inspector) is a tool to extract valuable information from a PGN engine-engine match provided the PGN output is created by ChessBase, Cutechess or Arena.

It's currently in alpha stage (not downloadable yet) and I prefer some feedback for improvements, more ideas before a beta release.

...

Functions:
...
3. Games that should have been won
...
The title of 3. is a bit missleading, a part of the games are actually not won, but missevaluated, so there never was a win, but a fata morgana.
The example in your post, might be even worse, a problem on the user side, also indicated by an incomprehensible sudden depth loss.
(ofc I wont rule out completely a hash bug in SF8)

Strange low depths single outsiders would be an interesting thing to find anyway ;-)
(do this since long with some stats sheets to sanity check pgn files found in talkchess from time to time)

Code: Select all

115... Nc2 +9.92/29
It is impossible for me to reproduce that incredible score with SF8.
Neither with 5men Syzygy nor with no TBS at all. The score is always between 0.17 and 0.33 for all depths up to 60.
https://rwbc-chess.de/chronology.htm
--------------------------------------------------
The troll explosion at talkchess:
https://docs.google.com/spreadsheets/d/ ... KSptBx9AUs

Post Reply