Error bar calculations - help!

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

asanjuan
Posts: 214
Joined: Thu Sep 01, 2011 5:38 pm
Location: Seville, Spain

Error bar calculations - help!

Post by asanjuan »

Does anybody know where can i find a paper that describes how to calculate the error bars for elo in a tournament?
I'm trying to improve my G.A with tournament play, but taking into acount the error margin to build a more precise fitness function.

Maybe Remi or Jesus muñoz can give me more info.
Still learning how to play chess...
knigths move in "L" shape ¿right?
User avatar
Kempelen
Posts: 620
Joined: Fri Feb 08, 2008 10:44 am
Location: Madrid - Spain

Re: Error bar calculations - help!

Post by Kempelen »

Fermin Serrano
Author of 'Rodin' engine
http://sites.google.com/site/clonfsp/
User avatar
Ajedrecista
Posts: 1968
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: Error bar calculations - help!

Post by Ajedrecista »

Hello Alberto!
asanjuan wrote:Does anybody know where can i find a paper that describes how to calculate the error bars for elo in a tournament?
I'm trying to improve my G.A with tournament play, but taking into acount the error margin to build a more precise fitness function.

Maybe Remi or Jesus muñoz can give me more info.
First of all, thanks for think in me! :) I am totally surprised, because I am not comparable with Rémi Coulom, which is the author of BayesElo. I am sure that BayesElo (or even EloStat (other programme)) is better than mine, although I get similar results.

Fermín has done a good job (thanks!) posting those two links: the first was unknown to me and seems good and well explained (although I do the things a little different, mainly because I take into account the draw ratio in the standard deviation); I already knew the second, which is not exact, but orientative: that table is calculated for a draw ratio of 32% (as can be read in the site). If I am not wrong, the table of the second link provides the minimum score that one must get in order to not have regressions (I mean, negative gains of Elo) with the given confidence (95%, 99% and 99.9% respectively). The most typical value is 95% (~ 1.96-sigma confidence), so it is the value you should put your eyes on.

I give two links that I posted in the same thread early in this year:

http://talkchess.com/forum/viewtopic.ph ... 69&t=41773

http://talkchess.com/forum/viewtopic.ph ... 43&t=41773

I tried to be clear in those posts but I do not know if I managed that goal:

· The first link explains the source where I extracted the formula of the standard deviation (the only link of that post) and the method that I use ('invented' by me, but I see that is very similar to the method of the first link given by Fermín, so I feel that I started to do the things quite well, or at least not bad).

· The second link contains other links to papers, which is what you was looking for. I also demostrated there (hoping no errors) that both versions of the standard deviation (the one I use (extracted from a well-known Russian Forum) and the other (extracted from some papers)) are the same, except in dividing between n or n - 1 respectively (n is the number of games of the match). But taking in mind that n will be large, both values of the standard deviation would be almost equal.

When you referred to a tournament, I understand that you let play Rhetoric against a bunch of different engines (which gives a better idea of rating performance than self-play (the current development version of Rhetoric versus the last released version of Rhetoric)). Disgracefully, my humble and possible buggy software can only handle a match between two opponents (for example, the self-play test), but at least is very easy to use; I honestly recommend you BayesElo for your purposes.

I upload for you my clumsy programme (Elo_uncertainties_calculator) written in Fortran 95; I suppose that the link will dead in few days, so please hurry in dowload in the case you are interested:

Elo_uncertainties_calculator.rar (0.6 MB)

I post the code, which is also included in the file:

Code: Select all

! Programa para el cálculo de incertidumbres (± Elo) en un torneo entre dos motores.

program Elo

implicit none

integer :: wins, loses, draws, n, k
real(KIND=3) :: score, draw_ratio, sigma, rd
real(KIND=3) :: lower_rd(1:3), upper_rd(1:3), error_menos(1:3), error_mas(1:3), error(1:3), Kn(1:3)

write(*,*)
write(*,*) 'Elo_uncertainties_calculator, © 2012.'
write(*,*)
write(*,*) 'Calculation of Elo uncertainties in a match between two engines:'
write(*,*) '----------------------------------------------------------------'
write(*,*)
write(*,*) '(The input and output data is referred to the first engine).'
write(*,*)
write(*,*) 'Please write down non-negative integers.'
write(*,*)
write(*,*) 'Write down the number of wins:'
write(*,*)
read(*,*) wins
write(*,*)
if &#40;wins < 0&#41; then
  write&#40;*,*) 'Incorrect number of wins.'
  write&#40;*,*)
  write&#40;*,*) 'Please close and try again. Press Enter to exit.'
  read&#40;*,'()')
  stop
end if

write&#40;*,*) 'Write down the number of loses&#58;'
write&#40;*,*)
read&#40;*,*) loses
write&#40;*,*)
if &#40;loses < 0&#41; then
  write&#40;*,*) 'Incorrect number of loses.'
  write&#40;*,*)
  write&#40;*,*) 'Please close and try again. Press Enter to exit.'
  read&#40;*,'()')
  stop
end if

write&#40;*,*) 'Write down the number of draws&#58;'
write&#40;*,*)
read&#40;*,*) draws
write&#40;*,*)
if &#40;draws < 0&#41; then
  write&#40;*,*) 'Incorrect number of draws.'
  write&#40;*,*)
  write&#40;*,*) 'Please close and try again. Press Enter to exit.'
  read&#40;*,'()')
  stop
end if

n = wins + loses + draws

if &#40;n == 0&#41; then
  write&#40;*,*) 'There are no games.'
  write&#40;*,*)
  write&#40;*,*) 'Please close and try again. Press Enter to exit.'
  read&#40;*,'()')
  stop
end if

score = &#40;wins + draws/2d0&#41;/n
if (&#40;score < 1d-1&#41; .or. &#40;score > 9d-1&#41;) then
  write&#40;*,*) 'Score&#58; ', 1d2*score, '%'
  write&#40;*,*)
  write&#40;*,*) 'The mathematic model used in this programme is not good for scores smaller than 10% or greater than 90%'
  write&#40;*,*)
  write&#40;*,*) 'Please close and try again. Press Enter to exit.'
  read&#40;*,'()')
  stop
end if

rd = 4d2*log&#40;score/&#40;1d0-score&#41;)/log&#40;1d1&#41;  ! Elo rating difference by definition.
draw_ratio = 1d0*draws/n
sigma = sqrt&#40;&#40;score*&#40;1d0 - score&#41; - draw_ratio/4d0&#41;/n&#41;

write&#40;*,*) '***************************************'
write&#40;*,*) '1-sigma confidence ~ 68.27% confidence.'
write&#40;*,*) '2-sigma confidence ~ 95.45% confidence.'
write&#40;*,*) '3-sigma confidence ~ 99.73% confidence.'
write&#40;*,*) '***************************************'
write&#40;*,*)
write&#40;*,*) '-----------------------------------------------------------------------'
write&#40;*,*)

do k = 1, 3
  lower_rd&#40;k&#41; = 4d2*log&#40;&#40;score - k*sigma&#41;/&#40;1d0-score + k*sigma&#41;)/log&#40;1d1&#41;
  upper_rd&#40;k&#41; = 4d2*log&#40;&#40;score + k*sigma&#41;/&#40;1d0-score - k*sigma&#41;)/log&#40;1d1&#41;
  error_menos&#40;k&#41; = lower_rd&#40;k&#41; - rd
  error_mas&#40;k&#41; = upper_rd&#40;k&#41; - rd
  error&#40;k&#41; = &#40;error_mas&#40;k&#41; - error_menos&#40;k&#41;)/2d0
  Kn&#40;k&#41; = error&#40;k&#41;*sqrt&#40;n*1d0&#41;
  write&#40;*,*) 'Confidence interval for ', k, '-sigma&#58;'
  write&#40;*,*)
  write&#40;*,*) 'Elo rating difference&#58; ', rd, 'Elo'
  write&#40;*,*)
  write&#40;*,*) 'Lower rating difference&#58; ', lower_rd&#40;k&#41;, 'Elo'
  write&#40;*,*) 'Upper rating difference&#58; ', upper_rd&#40;k&#41;, 'Elo'
  write&#40;*,*)
  write&#40;*,*) 'Lower bound uncertainty&#58; ', error_menos&#40;k&#41;, 'Elo'
  write&#40;*,*) 'Upper bound uncertainty&#58; ', error_mas&#40;k&#41;, 'Elo'
  write&#40;*,*) 'Average error&#58; +-', error&#40;k&#41;, 'Elo'
  write&#40;*,*)
  write&#40;*,*) 'K = &#40;average error&#41;*&#91;sqrt&#40;n&#41;&#93; = ', Kn&#40;k&#41;
  write&#40;*,*)
  write&#40;*,*) 'Elo interval&#58; &#93;', lower_rd&#40;k&#41;, ',', upper_rd&#40;k&#41;, '&#91;'
  write&#40;*,*) '-----------------------------------------------------------------------'
  write&#40;*,*)
end do

write&#40;*,*) 'Number of games of the match&#58; ', n
write&#40;*,*) 'Score&#58; ', 1d2*score, '%'
write&#40;*,*) 'Elo rating difference&#58; ', rd, 'Elo'
write&#40;*,*) 'Draw ratio&#58; ', 1d2*draw_ratio, '%'
write&#40;*,*)
write&#40;*,*) '*****************************************************************'
write&#40;*,*) '1-sigma&#58; ', 1d2*sigma, '% of the points of the match.'
write&#40;*,*) '2-sigma&#58; ', 2d2*sigma, '% of the points of the match.'
write&#40;*,*) '3-sigma&#58; ', 3d2*sigma, '% of the points of the match.'
write&#40;*,*) '*****************************************************************'
write&#40;*,*)
write&#40;*,*) 'End of the calculations.'
write&#40;*,*)
write&#40;*,*) 'Thanks for using Elo_uncertainties_calculator. Press Enter to exit.'
read&#40;*,'()')

end program Elo
(Some extra commas appeared in various write(*,*) and I do not know why; I have copied it from a Notepad).

The use is simple: make double click in the executable (please note that the programme can be slow (~ 6 seconds in my PC) at the start, due to a DLL that is needed), follow the instructions and you will obtain the results. For example, if your development version scores +380 -320 = 300 against Rhethoric 0.10d, you must type 380 (press enter), 320 (press enter), 300 (press enter) and you will get the results (in less than a second) with an ugly format, but they are fully understandable (here is an example of the horrible output).

It calculates 1-sigma, 2-sigma and 3-sigma confidence intervals... you should look to 2-sigma confidence results, which means ~ 95.45% confidence (very similar to 95% confidence). Those results are more accurate in an even match (near 50%-50%) than in a very unbalanced match (near 90%-10%).

The results are given with so many decimals, but do not be afraid: just round up them! Elo_uncertainties_calculator can be used even if it is compressed (I uploaded it compressed with WinRAR).

I hope you will find useful my little info. I am still surprised of the fact that you thought on me because I am less than a newbie. Please do not hesitate to ask me if you have more questions. Good luck with Rhetoric!

Regards from Spain.

Ajedrecista.
User avatar
Ajedrecista
Posts: 1968
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: Error bar calculations - help! (Little lapsus).

Post by Ajedrecista »

Hello:
(Some extra commas appeared in various write(*,*) and I do not know why; I have copied it from a Notepad).
My mind went crazy, it was just a sad lapsus: all the code is OK. Otherwise, it would not work. Sorry for my bad statement about the commas. I ran out of time for edit my post by just one minute!

Regards from Spain.

Ajedrecista.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Error bar calculations - help!

Post by Don »

asanjuan wrote:Does anybody know where can i find a paper that describes how to calculate the error bars for elo in a tournament?
I'm trying to improve my G.A with tournament play, but taking into acount the error margin to build a more precise fitness function.

Maybe Remi or Jesus muñoz can give me more info.
The sources to bayeselo are available if you are willing to dive into that.

For a fitness function, I have discovered a pretty good method with tests I have done in the past. The problem with playing a large number of games is that the effort is enormous for a one-off calculation to used as a single step in GA. However I don't believe in anything else, other than playing games as a reasonable fitness function.

So here is my 2 cents: Use divide and conquer. The purpose of a fitness function is to produce one (in the case of PBIL) or 2 or more in the case of GA's parent templates. So you can use a probabilistic divide and conquer algorithm to do this which is relatively cheap. The algorithm I suggest is the well know elimination tournament, or KO tournament. You spend VERY LITTLE time on first round losers, and as the tournament progresses you continue to lop off work but in a methodical and efficient way. And it turns out that late round players are MUCH more likely to be stronger than early round losers. You can also play more rounds late in the tournament with only a small impact on the amortized performance. In my experiments with PBIL I increased the number of rounds when I got close to the final round in order to improve the quality of the final player.

I have not proved this, but I believe that if your goal is to "find" the strongest possible player with the least amount of effort given an arbitrary population of players (which is EXACTLY what you need) then K/O is very close to optimal.

The intuition here is that if you are in a hurry, it's easier to find a strong player with superficial tests from a large population than it is to carefully measure a tiny population. For example in a population of 256 players, the K/O is with very high probability going to find a player stronger that if you put the same effort into a much more thorough test of just 16 players. It's like cherry picking, don't obsess over every cherry, just quickly grab the ones that look good and get a lot more cherries to choose from.

Anyway, this is just a suggestion. And a KO or knockout tournament is recursively trivial to implement:

Code: Select all

player_t  ko&#40; int round )
&#123;
  player_t  a;
  player_t  b;

  if &#40;round == 1&#41; &#123;
    a = generatePlayer&#40;);
    b = generatePlayer&#40;);
  &#125; else &#123;
    a = ko&#40; round - 1 );
    a = ko&#40; round - 1 );
  &#125;

  player_t  winner = playMatch&#40; a, b );

  return winner; 
&#125;
generatePlayer() creates a player randomly, so they are created on the fly or on demand as needed.

so if you call it like this:

player_t goodPlayer = ko(6);

you would get a 6 round (64 player) k/o tournament, and the winner will be one of the very best of the 64 with a high probability. It does depend of course on the strength distribution of the players.

If you want 2 players or more, you can just grab the intermediate results, for example there are four round 4 winners (or eight round 4 players) in a 6 round tournament.

You can study the behavior by running simulations which is what I did. You can generate simulated players with fake ELO strengths, construct them using a realistic bell curve, and play games between them using the expected result derived from their ELO strength. I experimented with K/O, round robin and other methods with the goal of discovering which method produced the strongest players with the least effort. Round Robin was absolutely horrible in terms of effort. K/O was orders of magnitude more efficient for non-trivial population sizes.

For whatever it's worth that is my suggestion .....
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
asanjuan
Posts: 214
Joined: Thu Sep 01, 2011 5:38 pm
Location: Seville, Spain

Re: Error bar calculations - help!

Post by asanjuan »

The sources to bayeselo are available if you are willing to dive into that.
I wasn't looking for sources, but for a simple explanation of the ecuations. So the links provided by Fermín and Jesus were very helpful.

But on the other hand... Dan, i'm very impressed with your response.
The algorithm I suggest is the well know elimination tournament, or KO tournament. You spend VERY LITTLE time on first round losers, and as the tournament progresses you continue to lop off work but in a methodical and efficient way. And it turns out that late round players are MUCH more likely to be stronger than early round losers.
I was really thinking in a round-robin tournament (and was afraid of the time needed, indeed), but your explanation has opened my mind, because i need only a fast way to rank my population, and the winner of a ko tournament can be clearly a good indicator.
In fact, the fitness function can be just the sum of the points (+1 for a win, +0.5 for a draw) gained in the entire tournament, so the weaker individuals will sum only points from one match and the stronger will sum points from N match.
Then, after rank the population, i can build the next generation using the roulette rule, elitism, or something else, as I was doing.

Thanks a lot!!
Still learning how to play chess...
knigths move in "L" shape ¿right?
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Error bar calculations - help!

Post by Don »

asanjuan wrote:
The sources to bayeselo are available if you are willing to dive into that.
I wasn't looking for sources, but for a simple explanation of the ecuations. So the links provided by Fermín and Jesus were very helpful.

But on the other hand... Dan, i'm very impressed with your response.
The algorithm I suggest is the well know elimination tournament, or KO tournament. You spend VERY LITTLE time on first round losers, and as the tournament progresses you continue to lop off work but in a methodical and efficient way. And it turns out that late round players are MUCH more likely to be stronger than early round losers.
I was really thinking in a round-robin tournament (and was afraid of the time needed, indeed), but your explanation has opened my mind, because i need only a fast way to rank my population, and the winner of a ko tournament can be clearly a good indicator.
In fact, the fitness function can be just the sum of the points (+1 for a win, +0.5 for a draw) gained in the entire tournament, so the weaker individuals will sum only points from one match and the stronger will sum points from N match.
Then, after rank the population, i can build the next generation using the roulette rule, elitism, or something else, as I was doing.

Thanks a lot!!
I thought you might like it. The light bulb went off in my head when I realized that the goal was not to identify the best player or rank them perfectly in some limited sample - GA calls for getting two "good" parents in a probabilistic way for the sake of diversity, not to obsess over getting a perfect ranking or somehow missing the best player.

One caveat is that for diversity K/O might work TOO well. You can control how much diversity it generates by the number of games played in each match and how many rounds. In general if the players are really close together in strength, such as in late generations, you need more games to discriminate between them. For example if they were all within 5 or 10 ELO of each other the winner of a K/O where only 1 game was played for each round would be almost random. You still cannot get around the error margin! In early generations of GA there is probably a HUGE difference in the ELO of players and a weak player is unlikely to even survive one or two rounds.

What I did was play N games per match in early rounds where N is an even number such as 2 or 4 or more depending on the requirements. If it ended up in a tie, I chose one randomly. There is no crime in doing this as the evidence so far indicates approximate equality anyway and if it's that important N should be higher. You can stop some of these matches early of course if one side already has enough points to win the match. In the last 2 or 3 rounds I increased the number of games per match as my simulations proved that this was a very cheap improvement - once you filter out the weak players then you can spend more time evaluating the remaining stronger players.

You will have fun with this I'm sure!
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
Antonio Torrecillas
Posts: 90
Joined: Sun Nov 02, 2008 4:43 pm
Location: Barcelona

Re: Error bar calculations - help!

Post by Antonio Torrecillas »

Just a few doubts about GA.

I alway thought that the relevant point is that you choose the poorest individual to die,
and among the bests for reproduce.
This way we improve the average strength of the population.

is this true ? or only the correct choice of the best individuals are relevant.


The second question is about the concept of generation. Is this concept realy needed for a GA?

My idea is to use a UCB formula to rank the population (wining percentage with number of game played).
Then choose the top two for the next game. Then choose the bottom two for the next game. and so on.
after each game, take decision: the top individual is eligible for reproduce?, the bottom one for die?
This way the dominant individuals fight for reproduce, and the weakest to survive.
We concentrate the effort in the relevant population letting the rest be relatively quiet.

Sorry if this is a bit off topic.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Error bar calculations - help!

Post by Don »

Antonio Torrecillas wrote:Just a few doubts about GA.

I alway thought that the relevant point is that you choose the poorest individual to die,
and among the bests for reproduce.
This way we improve the average strength of the population.

is this true ? or only the correct choice of the best individuals are relevant.
There are many ways to do GA and in fact GA is just one of many different types of algorithms that are in general called population based.

The basic idea is that you have a some "population" of individuals and you produce a new population from them using the fittest from the previous generation as "genetic material." In simulated annealing there are similar principles involved.

So there is no killing off of unfit individuals (for most of these algorithms) as the entire generation is killed off for the next generation in most cases. It's all about which players are used to produce the next generation.

My personal favorite is called PBIL. There is no strict population of players, instead an INFINITE population is represented by a probability vector which is floating point value that represents whether a bit is on or off in the "genetic material" of a population. Several individuals are generated from this vector and fittest of this sample is used to make a slight modification to the vector for the next pass. The vector starts out with 0.5 to mean a 0 or 1 is generated with equal probability. If it turns out that a bit really needs to be on (or off) to generate a strong individual, most of the fittest individuals will have the bit set on and that will progressively cause that bit to converge to being always on. It's very simple compared to GA's and has less parameters to tune and for many problems it's superior to GA, but unfortunately there is no single best algorithm that works best on all problems.


The second question is about the concept of generation. Is this concept realy needed for a GA?

My idea is to use a UCB formula to rank the population (wining percentage with number of game played).
Then choose the top two for the next game. Then choose the bottom two for the next game. and so on.
after each game, take decision: the top individual is eligible for reproduce?, the bottom one for die?
This way the dominant individuals fight for reproduce, and the weakest to survive.
We concentrate the effort in the relevant population letting the rest be relatively quiet.

Sorry if this is a bit off topic.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
User avatar
Ajedrecista
Posts: 1968
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Drawbacks of Elo_uncertainties_calculator.

Post by Ajedrecista »

Hi again:

I am glad that you find all the links useful.

My programme has some drawbacks, as everybody can expect. First of all, the error bars I calculate are only dependent of the number of wins, loses and draws, so I do not take into account if one engine has played with white many more times than the other. Obviously, I suppose that the match is fair in the sense of one engine must play the half of the games with white and the other half with black.

I use this formula for standard deviation:

Code: Select all

sigma = sqrt&#40;&#40;score*&#40;1d0 - score&#41; - draw_ratio/4d0&#41;/n&#41;
There is a special (and highly unlikely) case of match between two engines: all games are draws. Then, score = 1/2 and draw_ratio = 1... replacing in the formula:

Code: Select all

sigma = sqrt&#123;&#91;0.5*&#40;1 - 0.5&#41; - 1/4&#93;/n&#125; = sqrt&#91;&#40;1/4 - 1/4&#41;/n&#93; = 0
It leads to error bars of ±0 Elo, which does not seem right (I have just tried +0 -0 =20 for verification). Taking a look in a link I gave in this topic:

ELOStat algorithm?

Rémi proposed that problem (the formula given in the second post of that thread also gives sigma = 0 and also fails). I suppose that BayesElo handles it without problem because it is an excellent programme. OTOH EloStat fails (as it is reported in that thread) in the same way I fail, so maybe I implemented a kind of simplified EloStat algorithm in my programme without knowing it! It is very smart for my part (I should say lucky) but I am not so clever for deal extreme cases (this is why my programme refuses to do its job when the score of the opponent is less than 10% or greater than 90% (the user who posted the formula of the standard deviation that I use wrote that these are the critical bounds for this mathematic model)). I do not know how accurate can be the error bars of my programme for very high draw ratios, which is again a non-typical result of a match. I remember that I ran lots of samples from Stockfish tests provided by its main author (Marco Costalba) in GitHub, and I always got (rounding to 0.1 of Elo) the same results for error bars than Marco. Those tests often had draw ratios up to 70% (and scores near 50%-50%); I suppose that Marco and me are not failing in the same things.

Another problem I found when I was programming Elo_uncertainties_calculator was an error happening with very low number of games. Here is the part of the code that brings this nasty error:

Code: Select all

lower_rd&#40;k&#41; = 4d2*log&#40;&#40;score - k*sigma&#41;/&#40;1d0-score + k*sigma&#41;)/log&#40;1d1&#41; 
upper_rd&#40;k&#41; = 4d2*log&#40;&#40;score + k*sigma&#41;/&#40;1d0-score - k*sigma&#41;)/log&#40;1d1&#41;
The problems must be when score - k*sigma or 1d0-score - k*sigma become negative (or zero in the extreme case), and the logarithm cracks. This error is not important for me because a normal match must have a significative number of games for narrow the error bars. The most extreme case I have found is the next: no draws in a match for raising sigma in the model I use, and the most extreme situation of a score of 90% (can be 10%). Then:

Code: Select all

0.1 - 0.3k/sigma > 0 ; n > 9k² ; &#40;k_max = 3&#41; ; n > 81 &#40;in theory&#41;.
So, +72 -8 =0 (or +8 -72 =0) is the match with more number of played games (n = 80, which is not very high) where Elo_uncertainties_calculator cracks... and this is a very unusual result for a match.

Other thing I discovered a few weeks ago is this:

Code: Select all

integer &#58;&#58; wins, loses, draws, n, k
I did not know that the default integer in Fortran 95 is integer(KIND=3), which supports a maximum of 2^(31) - 1 = 2,147,483,647... so this is the maximum number of games (n) that Elo_uncertainties_calculator can handle if the score of an engine is within 10%-90%. With this code:

Code: Select all

integer&#40;KIND=4&#41; &#58;&#58; wins, loses, draws, n, k
The maximum n (number of games) supported will be 2^(63) - 1 = 9,223,372,036,854,775,807... but I guess than 2,147,483,647 are enough! ;)

The intention of this post is to warn about the use of Elo_uncertainties_calculator: everybody know that it is not infallible (of course not!), but I give some details where it can fail more easily. Just be careful with the results! AFAIK, the error bars are more less accurate in the vast majority of the cases, but not always! I hope no typos in the calculations I have done in this post.

Good luck with the development of genetic algorithms, fitness functions and all this stuff! It is far beyond my ability of knowledge and understanding.

Regards from Spain.

Ajedrecista.