The problem with adjudication based on score

Discussion of anything and everything relating to chess playing software and machines.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Joerg Oster
Posts: 689
Joined: Fri Mar 10, 2006 3:29 pm
Location: Germany

Re: The problem with adjudication based on score

Post by Joerg Oster » Wed Apr 17, 2019 4:26 pm

MikeB wrote:
Wed Apr 17, 2019 3:18 pm
Joerg Oster wrote:
Wed Apr 17, 2019 10:01 am
konsolas wrote:
Wed Apr 17, 2019 9:48 am
Do any engines correctly score



to be a draw?
Yes. :D
Just recently added a 1st blockade detection rule to my SF fork.
Patch is here: https://github.com/joergoster/Stockfish ... 020293f1ef

Code: Select all

position fen 4k3/rr6/2p1p1p1/1pPpPpPp/pP1P1P1P/P7/8/3K4 w - - 0 1
d

 +---+---+---+---+---+---+---+---+
 |   |   |   |   | k |   |   |   |
 +---+---+---+---+---+---+---+---+
 | r | r |   |   |   |   |   |   |
 +---+---+---+---+---+---+---+---+
 |   |   | p |   | p |   | p |   |
 +---+---+---+---+---+---+---+---+
 |   | p | P | p | P | p | P | p |
 +---+---+---+---+---+---+---+---+
 | p | P |   | P |   | P |   | P |
 +---+---+---+---+---+---+---+---+
 | P |   |   |   |   |   |   |   |
 +---+---+---+---+---+---+---+---+
 |   |   |   |   |   |   |   |   |
 +---+---+---+---+---+---+---+---+
 |   |   |   | K |   |   |   |   |
 +---+---+---+---+---+---+---+---+

Fen: 4k3/rr6/2p1p1p1/1pPpPpPp/pP1P1P1P/P7/8/3K4 w - - 0 1
PositionKey: 00FEC3309217ABF3
MaterialKey: FB26979F6BE2A3DB
PawnKey:     A90DE04509EB6C76
Checkers: 
eval
     Term    |     White     |     Black     |     Total    
             |   MG     EG   |   MG     EG   |   MG     EG  
 ------------+---------------+---------------+--------------
    Material |   4.69   6.52 |  14.57  16.83 |  -9.88 -10.31
   Imbalance |   0.57   0.57 |   0.52   0.52 |   0.05   0.05
  Initiative |   ----   ---- |   ----   ---- |   0.00  -0.07
       Pawns |   0.79   0.03 |   0.69  -0.06 |   0.10   0.09
     Knights |   0.00   0.00 |   0.00   0.00 |   0.00   0.00
     Bishops |   0.00   0.00 |   0.00   0.00 |   0.00   0.00
       Rooks |   0.00   0.00 |   0.00   0.00 |   0.00   0.00
      Queens |   0.00   0.00 |   0.00   0.00 |   0.00   0.00
    Mobility |   0.00   0.00 |   0.22   1.03 |  -0.22  -1.03
 King safety |  -0.36  -0.18 |  -0.41  -0.12 |   0.06  -0.06
     Threats |   0.05   0.05 |   0.00   0.00 |   0.05   0.05
      Passed |   0.00   0.00 |   0.00   0.00 |   0.00   0.00
       Space |   0.00   0.00 |   0.00   0.00 |   0.00   0.00
 ------------+---------------+---------------+--------------
       Total |   ----   ---- |   ----   ---- |  -9.83 -11.26

Total evaluation: 0.10 (white side)

info depth 31 seldepth 32 multipv 1 score cp -10 nodes 26051077 nps 2604847 hashfull 152 tbhits 0 time 10001 pv d1e2 b7g7 e2f1 a7f7 f1g1 f7b7 g1f2 e8f8 f2g2 b7f7 g2h2 f7c7 h2g2 f8g8 g2f1 c7f7 f1g1 g8h8 g1g2 f7b7 g2f1 h8h7 f1g1 g7d7 g1h2 d7c7 h2g2 c7f7 g2h1 h7g8 h1g1
bestmove d1e2 ponder b7g7

Nice - I'm trying to merge it into in current SF code and it's not working as expected - any suggestions to what I have below:

Code: Select all

template<Tracing T>
	ScaleFactor Evaluation<T>::scale_factor(Value eg) const {
		
		Color strongSide = eg > VALUE_DRAW ? WHITE : BLACK;
		int sf = me->scale_factor(pos, strongSide);
		
		// Try to handle a fully blocked position with all pawns still
		// on the board and directly blocked by their counterpart,
		// and all remaining pieces on their respective side.
		// Test position r7/1b1r4/k1p1p1p1/1p1pPpPp/p1PP1P1P/PP1K4/8/4Q3 w - - bm Qa5+
		if (   pos.count<PAWN>() == 16
			&& popcount(shift<NORTH>(pos.pieces(WHITE, PAWN)) & pos.pieces(BLACK, PAWN)) == 8)
		{
			Bitboard b, Camp[COLOR_NB];
			
			for (Color c : { WHITE, BLACK })
			{
				b = pos.pieces(c, PAWN);
				Camp[c] = 0;
				
				while (b)
				{
					Square s = pop_lsb(&b);
					Camp[c] |= forward_file_bb(~c, s);
				}
			}
			
			if (   !(pos.pieces(WHITE) & Camp[BLACK])
				&& !(pos.pieces(BLACK) & Camp[WHITE]))
			return SCALE_FACTOR_DRAW;
		}
		
		// If scale is not already specific, scale down the endgame via general heuristics
		if (sf == SCALE_FACTOR_NORMAL)
		{
			if (   pos.opposite_bishops()
				&& pos.non_pawn_material(WHITE) == BishopValueMg
				&& pos.non_pawn_material(BLACK) == BishopValueMg)
			sf = 16 + 4 * pe->passed_count();
			else
			sf = std::min(40 + (pos.opposite_bishops() ? 2 : 7) * pos.count<PAWN>(strongSide), sf);
			
		}
		
		return ScaleFactor(sf);
	}
Thanks in advance!
At a quick glance it seems you're not checking for sf == SCALE_FACTOR_NORMAL.
In my mod I slightly changed behavior to directly return ScaleFactor in case sf is already changed so I don't need to do this check.

Try moving the blockade detection code directly above if (pos.opposite_bishops()) part,
so that it is inside the check for SCALE_FACTOR_NORMAL.

if (sf == SCALE_FACTOR_NORMAL)
{
Blockade detection code
Opposite bishops code
}
Jörg Oster

MikeB
Posts: 3461
Joined: Thu Mar 09, 2006 5:34 am
Location: Pen Argyl, Pennsylvania

Re: The problem with adjudication based on score

Post by MikeB » Wed Apr 17, 2019 4:33 pm

Joerg Oster wrote:
Wed Apr 17, 2019 4:26 pm
MikeB wrote:
Wed Apr 17, 2019 3:18 pm
Joerg Oster wrote:
Wed Apr 17, 2019 10:01 am
konsolas wrote:
Wed Apr 17, 2019 9:48 am
Do any engines correctly score



to be a draw?
Yes. :D
Just recently added a 1st blockade detection rule to my SF fork.
Patch is here: https://github.com/joergoster/Stockfish ... 020293f1ef

Code: Select all

position fen 4k3/rr6/2p1p1p1/1pPpPpPp/pP1P1P1P/P7/8/3K4 w - - 0 1
d

 +---+---+---+---+---+---+---+---+
 |   |   |   |   | k |   |   |   |
 +---+---+---+---+---+---+---+---+
 | r | r |   |   |   |   |   |   |
 +---+---+---+---+---+---+---+---+
 |   |   | p |   | p |   | p |   |
 +---+---+---+---+---+---+---+---+
 |   | p | P | p | P | p | P | p |
 +---+---+---+---+---+---+---+---+
 | p | P |   | P |   | P |   | P |
 +---+---+---+---+---+---+---+---+
 | P |   |   |   |   |   |   |   |
 +---+---+---+---+---+---+---+---+
 |   |   |   |   |   |   |   |   |
 +---+---+---+---+---+---+---+---+
 |   |   |   | K |   |   |   |   |
 +---+---+---+---+---+---+---+---+

Fen: 4k3/rr6/2p1p1p1/1pPpPpPp/pP1P1P1P/P7/8/3K4 w - - 0 1
PositionKey: 00FEC3309217ABF3
MaterialKey: FB26979F6BE2A3DB
PawnKey:     A90DE04509EB6C76
Checkers: 
eval
     Term    |     White     |     Black     |     Total    
             |   MG     EG   |   MG     EG   |   MG     EG  
 ------------+---------------+---------------+--------------
    Material |   4.69   6.52 |  14.57  16.83 |  -9.88 -10.31
   Imbalance |   0.57   0.57 |   0.52   0.52 |   0.05   0.05
  Initiative |   ----   ---- |   ----   ---- |   0.00  -0.07
       Pawns |   0.79   0.03 |   0.69  -0.06 |   0.10   0.09
     Knights |   0.00   0.00 |   0.00   0.00 |   0.00   0.00
     Bishops |   0.00   0.00 |   0.00   0.00 |   0.00   0.00
       Rooks |   0.00   0.00 |   0.00   0.00 |   0.00   0.00
      Queens |   0.00   0.00 |   0.00   0.00 |   0.00   0.00
    Mobility |   0.00   0.00 |   0.22   1.03 |  -0.22  -1.03
 King safety |  -0.36  -0.18 |  -0.41  -0.12 |   0.06  -0.06
     Threats |   0.05   0.05 |   0.00   0.00 |   0.05   0.05
      Passed |   0.00   0.00 |   0.00   0.00 |   0.00   0.00
       Space |   0.00   0.00 |   0.00   0.00 |   0.00   0.00
 ------------+---------------+---------------+--------------
       Total |   ----   ---- |   ----   ---- |  -9.83 -11.26

Total evaluation: 0.10 (white side)

info depth 31 seldepth 32 multipv 1 score cp -10 nodes 26051077 nps 2604847 hashfull 152 tbhits 0 time 10001 pv d1e2 b7g7 e2f1 a7f7 f1g1 f7b7 g1f2 e8f8 f2g2 b7f7 g2h2 f7c7 h2g2 f8g8 g2f1 c7f7 f1g1 g8h8 g1g2 f7b7 g2f1 h8h7 f1g1 g7d7 g1h2 d7c7 h2g2 c7f7 g2h1 h7g8 h1g1
bestmove d1e2 ponder b7g7

Nice - I'm trying to merge it into in current SF code and it's not working as expected - any suggestions to what I have below:

Code: Select all

template<Tracing T>
	ScaleFactor Evaluation<T>::scale_factor(Value eg) const {
		
		Color strongSide = eg > VALUE_DRAW ? WHITE : BLACK;
		int sf = me->scale_factor(pos, strongSide);
		
		// Try to handle a fully blocked position with all pawns still
		// on the board and directly blocked by their counterpart,
		// and all remaining pieces on their respective side.
		// Test position r7/1b1r4/k1p1p1p1/1p1pPpPp/p1PP1P1P/PP1K4/8/4Q3 w - - bm Qa5+
		if (   pos.count<PAWN>() == 16
			&& popcount(shift<NORTH>(pos.pieces(WHITE, PAWN)) & pos.pieces(BLACK, PAWN)) == 8)
		{
			Bitboard b, Camp[COLOR_NB];
			
			for (Color c : { WHITE, BLACK })
			{
				b = pos.pieces(c, PAWN);
				Camp[c] = 0;
				
				while (b)
				{
					Square s = pop_lsb(&b);
					Camp[c] |= forward_file_bb(~c, s);
				}
			}
			
			if (   !(pos.pieces(WHITE) & Camp[BLACK])
				&& !(pos.pieces(BLACK) & Camp[WHITE]))
			return SCALE_FACTOR_DRAW;
		}
		
		// If scale is not already specific, scale down the endgame via general heuristics
		if (sf == SCALE_FACTOR_NORMAL)
		{
			if (   pos.opposite_bishops()
				&& pos.non_pawn_material(WHITE) == BishopValueMg
				&& pos.non_pawn_material(BLACK) == BishopValueMg)
			sf = 16 + 4 * pe->passed_count();
			else
			sf = std::min(40 + (pos.opposite_bishops() ? 2 : 7) * pos.count<PAWN>(strongSide), sf);
			
		}
		
		return ScaleFactor(sf);
	}
Thanks in advance!
At a quick glance it seems you're not checking for sf == SCALE_FACTOR_NORMAL.
In my mod I slightly changed behavior to directly return ScaleFactor in case sf is already changed so I don't need to do this check.

Try moving the blockade detection code directly above if (pos.opposite_bishops()) part,
so that it is inside the check for SCALE_FACTOR_NORMAL.

if (sf == SCALE_FACTOR_NORMAL)
{
Blockade detection code
Opposite bishops code
}
Thx, will try it. In the middle of a test run, so it will be later.

Dann Corbit
Posts: 10112
Joined: Wed Mar 08, 2006 7:57 pm
Location: Redmond, WA USA
Contact:

Re: The problem with adjudication based on score

Post by Dann Corbit » Wed Apr 17, 2019 6:24 pm

jp wrote:
Wed Apr 17, 2019 12:42 pm
Dann Corbit wrote:
Wed Apr 17, 2019 12:26 am
7r/6p1/7P/p1r1p1k1/Pp1pPp2/1PpP1Pp1/2P3P1/K6R w - -
Dann, who first came up with this position?
Gurgenidse, Dawid
Die Schwalbe (10693) 2000

From Yacpdb:
http://www.yacpdb.org/#search/N3I2cDE3U ... LzEvMA==/1
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.

Uri Blass
Posts: 8586
Joined: Wed Mar 08, 2006 11:37 pm
Location: Tel-Aviv Israel

Re: The problem with adjudication based on score

Post by Uri Blass » Thu Apr 18, 2019 4:03 am

gordonr wrote:
Wed Apr 17, 2019 11:19 am
Raphexon wrote:
Wed Apr 17, 2019 11:02 am
While I agree with you these situations are very rare and works both ways.
Adjudication massively speeds up the process of testing, and if only 1% of the games produce wrong results it's still worth to adjudicate.

Speed vs accuracy.
I agree with your point that these situations are so rare that it's not worth slowly down the testing. However, I'm not even sure if it is a wrong result. If the White player wrongly evaluates the position as "clearly lost", enough to effectively resign, then I think a win for Black is the correct result. An engine's rating should factor in positions where it thinks it is clearly lost even when it is a draw. Then if the engine improves its evaluation of such positions, its rating will increase.
Or not increase because when it is the better side of this draw the position is not going to be adjudicated as a win for it.

I do not think that it is logical that the rating is based on the evaluation and not on the moves of the engine.

User avatar
hgm
Posts: 23718
Joined: Fri Mar 10, 2006 9:06 am
Location: Amsterdam
Full name: H G Muller
Contact:

Re: The problem with adjudication based on score

Post by hgm » Thu Apr 18, 2019 4:15 am

The simplest solution is to never let the engine print very negative scores. Always divide the score by 10 when it is negative before printing. This should gain you some Elo.

User avatar
mclane
Posts: 16170
Joined: Thu Mar 09, 2006 5:40 pm
Location: US of Europe, germany
Full name: Thorsten Czub
Contact:

Re: The problem with adjudication based on score

Post by mclane » Thu Apr 18, 2019 5:40 am

gordonr wrote:
Wed Apr 17, 2019 11:19 am
Raphexon wrote:
Wed Apr 17, 2019 11:02 am
While I agree with you these situations are very rare and works both ways.
Adjudication massively speeds up the process of testing, and if only 1% of the games produce wrong results it's still worth to adjudicate.

Speed vs accuracy.
I agree with your point that these situations are so rare that it's not worth slowly down the testing. However, I'm not even sure if it is a wrong result. If the White player wrongly evaluates the position as "clearly lost", enough to effectively resign, then I think a win for Black is the correct result. An engine's rating should factor in positions where it thinks it is clearly lost even when it is a draw. Then if the engine improves its evaluation of such positions, its rating will increase.
A „rare“ position. I do remember these rare positions coming on board at championships. E.g. in earlier years we had these rare KBNK on board the programmers did not put in because they are so “rare”. Or we had under promotion bugs of software, and suddenly on championship games you had exactly the “rare” situation on board.
You lost the important point. And c’est la vie.
A good program that relies on knowledge and not on search should handle those “rare” stuff.
What seems like a fairy tale today may be reality tomorrow.
Here we have a fairy tale of the day after tomorrow....

jp
Posts: 818
Joined: Mon Apr 23, 2018 5:54 am

Re: The problem with adjudication based on score

Post by jp » Thu Apr 18, 2019 2:25 pm

Dann Corbit wrote:
Wed Apr 17, 2019 6:24 pm
Gurgenidse, Dawid
Die Schwalbe (10693) 2000

From Yacpdb:
http://www.yacpdb.org/#search/N3I2cDE3U ... LzEvMA==/1
Thanks. That site looks a very valuable resource.

Post Reply