Interpretation of a result for time and node count

sje · Post by **sje** » Tue Apr 12, 2011 10:18 am

Interpretation of a result for time and node count:

When reporting the result of a search, many programs (maybe nearly all of them) report the total time used and total node count. But are these an accurate description of the effort needed to produce the result?

I suggest that a better report would use the time/nodes numbers at the point in the search when the result PV and score were established. Alternatively, the time/nodes numbers reported would be from the point in the search after the point at which any PV could be established; i.e., the last time a score/bound was determined for a root move selection. The idea is to not credit any effort in which all data generated was thrown away when a resource limit (e.g., allocated elapsed time) was encountered.

The change is easy to implement. A program need only capture the time/nodes numbers at the time a PV is established and pass these along with the final PV when the result report is generated.

rbarreira · Post by **rbarreira** » Tue Apr 12, 2011 11:15 am

sje wrote: When reporting the result of a search, many programs (maybe nearly all of them) report the total time used and total node count. But are these an accurate description of the effort needed to produce the result?

Depends on what you mean by "result". If result is just the PV or the best move, then no. If the result is that plus the number of examined root moves at that depth, then yes.

Sven · Post by **Sven** » Tue Apr 12, 2011 11:18 am

I don't think that following your proposal would be accurate enough. When a PV is established you don't know exactly whether this PV will be kept until the end of an iteration. In many cases it will but sometimes it won't. So the time needed to *confirm* the PV by proving that all other moves are not better is part of the time needed to produce the result.

You may argue that this "confirmation" is often wasted. But there you are using a posteriori knowledge.

You are right in general about the timeout issue. Here you have these cases:
A. timeout during search of first root move (no new PV),
B. timeout during search of second root move (could produce new PV but no new best move),
C. timeout after search of second root move (could produce new PV and sometimes also new best move).

What would you report in these cases? Would that match the expectations? According to your proposal it seems you would like to report the time/nodes needed to produce the PV of the previous iteration in case A, and that of the current iteration in case B+C, which might give surprising results for case A at least.

Let's assume an engine needs 60 seconds to complete all iterations until depth X-1, then 30 seconds until the PV of iteration X pops up, then another 30 seconds to confirm that the remaining moves in X are not better, then starts searching the best move from X to depth X+1, and gets a timeout after another 45 seconds right before completing that search.

Total time elapsed: 2:45 minutes
Your reported time: 1:30 minutes

So what would you think as a user about what the engine did in the remaining 1:15 minutes?

Engines that do not even start a new iteration if completing it is unlikely might avoid this problem and can also save a lot of time, especially in cases A and B.

Another issue, more from the statistical viewpoint: it is hard to compare reported times and node counts which are not referring to fully completed iterations. This also affects calculating the EBF.

Sven

rbarreira · Post by **rbarreira** » Tue Apr 12, 2011 11:21 am

There is another possible case: the search of another root move failed high in the zero window search, but the search times out when searching it with a full window.

In this case you have a new best move but no PV.

hgm · Post by **hgm** » Tue Apr 12, 2011 11:29 am

sje wrote:The change is easy to implement. A program need only capture the time/nodes numbers at the time a PV is established and pass these along with the final PV when the result report is generated.

This change should be considered a protocol violation. In their thinking output engines are expected to print the true time and number of nodes upto that point.

If you want to let the user know how much time and nodes were needed to find a certain PV, you should simply have the engine emit a line of thinking output at the moment it finds that PV. You can then repeat that (with new times and nod counts) at the end of the iteration, if the PV did not change.

sje · Post by **sje** » Tue Apr 12, 2011 12:36 pm

A few points:

1) What exactly is the interpretation of the result time? It could be wall clock time, CPU time, the sum of all CPU time for multiple CPUs, the sum of all core time (physical or virtual), or maybe something else.

2) What exactly is an iteration? Not all programs use an iterative search. Also, an iteration in an iterated best-first search is likely something quite different from that in an iterated depth-first search.

3) What exactly is a node? Is it the same as the number of static evaluations? Is it the same as the number of execute/retract pairs? Is it the count of unique positions created?

4) What exactly is the reported analysis draft? Is it the number of completed iterations? Or the number of started iterations? Is it the length of the PV?

Few will agree on answers for all of the above.

sje · Post by **sje** » Tue Apr 12, 2011 12:42 pm

Code: Select all

* &#40;simpleton &#40;string->pos "2kr3r/pp1q1ppp/5n2/1Nb5/2Pp1B2/7Q/P4PPP/1R3RK1 w - - 0 1"))

&#91;si&#93; Starting iteration one
&#91;pv&#93; 1 Bc1 Qxh3 2 gxh3   score&#58; -1.348
&#91;pv&#93; 1 Bd2 Qxh3 2 gxh3   score&#58; -1.068
&#91;pv&#93; 1 Kh1 Qxh3 2 gxh3   score&#58; -0.978
&#91;pv&#93; 1 Qd3   score&#58; -0.934
&#91;pv&#93; 1 Rfe1 Qxh3 2 gxh3   score&#58; -0.858
&#91;si&#93; Starting iteration two
&#91;pv&#93; 1 Rfe1 d3   score&#58; -0.972
&#91;pv&#93; 1 Qxd7+ Rxd7 2 Rfe1   score&#58; -0.874
&#91;si&#93; Starting iteration three
&#91;pv&#93; 1 Qxd7+ Rxd7 2 Rfe1 d3   score&#58; -1.092
&#91;pv&#93; 1 Rfe1 d3 2 Nc3 Qxh3 3 gxh3   score&#58; -0.936
&#91;pv&#93; 1 Qg3 Qf5 2 Rfe1   score&#58; -0.904
&#91;si&#93; Starting iteration four
&#91;pv&#93; 1 Qg3 g6 2 Bg5 Ne4   score&#58; -1.018
&#91;pv&#93; 1 Qxd7+ Rxd7 2 Rfe1 d3 3 Nc3   score&#58; -0.952
&#91;si&#93; Starting iteration five
&#91;pv&#93; 1 Qxd7+ Rxd7 2 Rfe1 d3 3 Rb2 h5   score&#58; -1.124
&#91;pv&#93; 1 Qg3 g6 2 Rfe1 Rde8 3 h4   score&#58; -0.956
&#91;pv&#93; 1 Rfe1 Qxh3 2 gxh3 d3 3 Nc3   score&#58; -0.936
&#91;st&#93; Resource exhaustion &#40;time&#41;   Elapsed&#58; 000.00&#58;05&#58;00.000

Search statistics summary report at 2011.04.12 06&#58;23&#58;51
  Root position FEN&#58; 2kr3r/pp1q1ppp/5n2/1Nb5/2Pp1B2/7Q/P4PPP/1R3RK1 w - - 0 1
  Root moves &#40;forty-seven&#41;&#58;
    &#40;Bb8 Bc1 Bc7 Bd2 Bd6 Be3 Be5 Bg3 Bg5 Bh6 Kh1 Na3 Nc3 Nc7 Nd6+ Nxa7+ Nxd4
     Qa3 Qb3 Qc3 Qd3 Qe3 Qe6 Qf3 Qf5 Qg3 Qg4 Qh4 Qh5 Qh6 Qxd7+ Qxh7 Ra1 Rb2 Rb3
     Rb4 Rbc1 Rbd1 Rbe1 Rfc1 Rfd1 Rfe1 a3 a4 f3 g3 g4&#41;
  PV&#58; 1 Rfe1 Qxh3 2 gxh3 d3 3 Nc3
  Expectation&#58; -0.936
  Analysis scalars&#58;
    Draft&#58; 5   Nodes&#58; 498,502   Time&#58; 000.00&#58;01&#58;16.067
    Frequency&#58; 6.5534596 KHz   Period&#58; 152.59117 us
  Total node counts&#58;
    All&#58; 2,040,525   Capture&#58; 1,495,178   Evasion&#58; 292,019   General&#58; 79,660   PlyZero&#58; 5
  Total elapsed time&#58; 000.00&#58;05&#58;00.000
  Termination status&#58; Resource exhaustion &#40;time&#41;
  Transposition table statistics &#40;general position tables by color&#41;&#58;
    &#40;white&#41; size&#58; 65,536   usage&#58; 65,526   match&#58; 108,175   probe&#58; 806,434   store&#58; 753,412
    &#40;black&#41; size&#58; 65,536   usage&#58; 65,536   match&#58; 166,725   probe&#58; 1,233,190   store&#58; 1,113,436
  Transposition table statistics &#40;pawn structure tables by color&#41;&#58;
    &#40;white&#41; size&#58; 4,096   usage&#58; 3,574   match&#58; 665,879   probe&#58; 688,627   store&#58; 22,748
    &#40;black&#41; size&#58; 4,096   usage&#58; 3,727   match&#58; 783,963   probe&#58; 806,551   store&#58; 22,588
  Transposition pruning counts&#58;
    Attempt&#58; 2,039,624   Match&#58; 274,900   Sufficient draft&#58; 213,905   Success&#58; 172,767

&#91;-0.936/5/76.067/498,502/0&#93; 1 Rfe1 Qxh3 2 gxh3 d3 3 Nc3

hgm · Post by **hgm** » Tue Apr 12, 2011 1:00 pm

sje wrote:A few points:

1) What exactly is the interpretation of the result time? It could be wall clock time, CPU time, the sum of all CPU time for multiple CPUs, the sum of all core time (physical or virtual), or maybe something else.

2) What exactly is an iteration? Not all programs use an iterative search. Also, an iteration in an iterated best-first search is likely something quite different from that in an iterated depth-first search.

3) What exactly is a node? Is it the same as the number of static evaluations? Is it the same as the number of execute/retract pairs? Is it the count of unique positions created?

4) What exactly is the reported analysis draft? Is it the number of completed iterations? Or the number of started iterations? Is it the length of the PV?

Few will agree on answers for all of the above.

In WB protocol the time issue at least is defined: engines should report wall-clock time, unless a previous nps 0 command has instructed them to use user CPU time.

I guess the specs can use some tightening up for the SMP case. In a future release I will specify the nps 0 reporting should be the sum of the user CPU time of all threads.

The other details are indeed implementation dependent. You can report what you think is most useful for your implementation. As long as you do it the same way every time, it does not matter.

Sven · Post by **Sven** » Tue Apr 12, 2011 1:09 pm

sje wrote:A few points:

1) What exactly is the interpretation of the result time? It could be wall clock time, CPU time, the sum of all CPU time for multiple CPUs, the sum of all core time (physical or virtual), or maybe something else.

Wall clock time. Never seen anything different.

sje wrote:2) What exactly is an iteration? Not all programs use an iterative search. Also, an iteration in an iterated best-first search is likely something quite different from that in an iterated depth-first search.

I would guess that 98% of all chess engines use iterative deepening, and I think I can assume everyone knows what an iteration is in that context. It is better to have a notion for something that is used by 98% of all programs, than to have nothing just because 2% are different.

sje wrote:3) What exactly is a node? Is it the same as the number of static evaluations? Is it the same as the number of execute/retract pairs? Is it the count of unique positions created?

Making a move during search enters a new node of the search tree, regardless whether that node is being evaluated or not, and whether it is unique or has already been searched before.

I agree, though, that there are probably too many different ways of node counting around. A common definition that is used by *everyone* (i.e., 98%) might certainly help. Some do not count illegal positions, some do not count hash hits as a node, and so on. This has already been discussed a couple of times.

sje wrote:4) What exactly is the reported analysis draft? Is it the number of completed iterations? Or the number of started iterations? Is it the length of the PV?

Usually the number of started iterations, since it is reported together with the PV which can pop up at any time during an iteration. Surely it is not the length of the PV, due to things like search extensions, qsearch, or extending PV from hash.

Sven

sje · Post by **sje** » Tue Apr 12, 2011 2:27 pm

rbarreira wrote:There is another possible case: the search of another root move failed high in the zero window search, but the search times out when searching it with a full window.

In this case you have a new best move but no PV.

You'd have a PV, but it would be only one move in length.

Interpretation of a result for time and node count

Interpretation of a result for time and node count

Re: Interpretation of a result for time and node count

Re: Interpretation of a result for time and node count

Re: Interpretation of a result for time and node count

Re: Interpretation of a result for time and node count

A few points

Sample statistics report form the new CIL Toolkit

Re: A few points

Re: A few points

Re: Interpretation of a result for time and node count