Amazing results of LeelaKnightOdds-Dev

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, chrisw, Rebel

Father
Posts: 1661
Joined: Sun Mar 19, 2006 4:39 am
Location: Colombia
Full name: Pablo Ignacio Restrepo

Re: Amazing results of LeelaKnightOdds-Dev

Post by Father »

Uri Blass wrote: Thu Nov 21, 2024 7:50 pm
lkaufman wrote: Thu Nov 21, 2024 7:08 pm
Marcus91 wrote: Thu Nov 21, 2024 5:12 pm
Uri Blass wrote: Thu Nov 21, 2024 4:21 pm
Father wrote: Thu Nov 21, 2024 4:01 pm
Marcus91 wrote: Thu Nov 21, 2024 3:00 pm
Brunetti wrote: Thu Nov 21, 2024 11:05 am
Marcus91 wrote: Thu Nov 21, 2024 10:40 am maybe I should add an arbitrary rule, like halve 'K' for all draws?
Doing this is like saying that a win is worth 4 draws, which means you're changing the rules of chess and, consequently, the consistency of the Elo formula. I'm not saying this because I'm currently drawing games, but because that's just how it is :)

Alex
The Elo formula remains consistent if you consider draws as half the weight. Four draws are not equal to one win, but rather one win and one loss. I'm not making any difference between players. If you draw against a bot rated higher than you, you will continue to rise, just slowly. By giving half the weight to draws the model assumes a better fit, this means that those who drew a lot were overestimated and those who drew little were under estimated, given that lichess blitz rating is the reference here
Thank you Marcus91 for your time and your message. You know that I am not a mathematician, so I would like you to explain it well and clear up my doubts. Let's take an example context: I find myself in a man versus machine challenge in which myself and the hundred best ranked human and machine players in the world participate, all fighting against the machines in the Harvard Cup style, of a group of coffee and coffee players. not federated, some of us are self-taught and others with non-school education, obtained outside of schools, others excellent representatives of schools. We all know in the competition that it is a race for knowledge, to obtain the
higher score. If in such a hypothetical 20-game competition I obtain 20 draws and Carlsen obtains 9 victories and one draw, does it mean that I would be the champion of the hypothetical tournament and the one with the highest rating given my best performance? If your answer is yes, then my duty is to continue playing, otherwise I don't see any point in doing so. Thank you in advance Marcus91. I remind everyone: "Any protectorate that closes the path to runners for the sole reason of competing without shoes in the marathon is an absurdity."

I understand that basically 50% will give you equal rating to the machine if you play enough games.
20 games are not enough to get a stable rating but I think that 200 games are enough.

The main problem is when you get less than 50%
If you get a draw and a loss in every 2 games that is 25% your rating is going to be smaller than a player who get 20% by a win and 4 losses every 5 games assuming a lot of games by both sides.

I think that it is not fair even if I get a better place in the table relative to humans by the new rule.
It's an arbitrary rule, like in soccer where a win earns three points and a draw earns one. The rationale behind this system is as follows:

1. Discourages excessively drawish playstyles

2. Promotes balanced time controls – This system pushes players toward time controls where they perform closer to their true skill level, discouraging the trend of favoring ultra-fast formats simply because they make it easier to grind out draws and rating points.

3. Improves rating accuracy – By adjusting the rating system to reflect more dynamic play, it aligns better with the Lichess blitz rating, ensuring a more accurate representation of a player's actual strength.

It's a thoughtful adjustment designed to balance the competitive landscape and encourage more engaging play, even if it might feel arbitrary
Although this is not a rule that I advocated, it does have a certain rationale as a sort of compromise between two extreme positions. The problem here is that the engine doesn't consider the human's rating, whereas the human knows the engine's rating at any given time control. You can say that the engine could use the opponent's LiChess rating, but that is problematic because many players are unrated (nominally "1500"), at least at a specific time control. Many strong players have no Rapid rating there. So the human can aim for a draw based on relative ratings, whereas the bot assumes that the ratings are equal, having no other information. The normal way of rating works fine if both players know the ratings of both players. But it is also possible to have a rating system that simply tosses out all draws, which is called WILO (for win/loss, a pun on ELO), which is especially appropriate when one player doesn't know the opponent's rating. This is consistent with running tournaments in which draws are simply replayed rather than scored as 1/2, as was common around the year 1900 and as is done today in Norway chess. Cutting the K factor in half is a compromise between ELO and WILO, so quite easy to justify for rating engine games, although difficult to "sell" to non-mathematicians. It doesn't punish players who score 50% with lots of draws vs few draws, but it does reward players who score say 10% by wins rather than draws. Perhaps it wouldn't be needed if in the future the bot is updated in a way that makes it less inclined to take draws when still down a queen.
I understand the reason for cutting the K factor in half for draws but I think that it is unfair.
I think the new rule also reward players who score more than 50% without losing relative to players who lose with the same percentage.

I think that the engine can use the history against the specific player instead of the rating of the player to decide about contempt.
I think it might be useful to narrate something from my own experience. In approximately 2003, who at that time was the fifth best human player in the world of chess, in a 20-match blitz match against an elite computer, the human obtained 4 points; I played against that same computer in that same place and time, place and historical moment controls, and I got 12 points in my favor against 8 for the computer. To top it all off, if you beat the computers you were persecuted for "alleged foul play on your part." In this way, a lodge protecting the elos of computers and protecting the elos of elite human players was founded by force of habit. I suffered that injustice firsthand; Instead of giving up, I continued the fight, and then accusations came against me in which I was told that the cause of the computers' defeats were bugs. Then I was accused that winning in time is not winning, then that with increase I could not win, and so on the carrot and stick methodology has been developed against me. Still, I didn't care. In this way there were no other paths for me than to be a street fighter, a hunter, and an anti-machine player. All the evidence did not seem to matter, and in the meantime some serious strategic errors were made by the elites of chess, science, and the media: Doors that should never have been closed were closed, such as the dismantling of Deep Blue, closing Alpha Zero to public use, and the like. But the world has changed just a little, and I am getting older, if not my soul. The doors must be open.
I am thinking chess is in a coin.Human beings for ever playing in one face.Now I am playing in the other face:"Antichess". Computers are as a fortres where owner forgot to close a little door behind. You must enter across this door.Forget the front.
Father
Posts: 1661
Joined: Sun Mar 19, 2006 4:39 am
Location: Colombia
Full name: Pablo Ignacio Restrepo

Re: Amazing results of LeelaKnightOdds-Dev

Post by Father »

Father wrote: Thu Nov 21, 2024 8:40 pm
Uri Blass wrote: Thu Nov 21, 2024 7:50 pm
lkaufman wrote: Thu Nov 21, 2024 7:08 pm
Marcus91 wrote: Thu Nov 21, 2024 5:12 pm
Uri Blass wrote: Thu Nov 21, 2024 4:21 pm
Father wrote: Thu Nov 21, 2024 4:01 pm
Marcus91 wrote: Thu Nov 21, 2024 3:00 pm
Brunetti wrote: Thu Nov 21, 2024 11:05 am
Marcus91 wrote: Thu Nov 21, 2024 10:40 am maybe I should add an arbitrary rule, like halve 'K' for all draws?
Doing this is like saying that a win is worth 4 draws, which means you're changing the rules of chess and, consequently, the consistency of the Elo formula. I'm not saying this because I'm currently drawing games, but because that's just how it is :)

Alex
The Elo formula remains consistent if you consider draws as half the weight. Four draws are not equal to one win, but rather one win and one loss. I'm not making any difference between players. If you draw against a bot rated higher than you, you will continue to rise, just slowly. By giving half the weight to draws the model assumes a better fit, this means that those who drew a lot were overestimated and those who drew little were under estimated, given that lichess blitz rating is the reference here
Thank you Marcus91 for your time and your message. You know that I am not a mathematician, so I would like you to explain it well and clear up my doubts. Let's take an example context: I find myself in a man versus machine challenge in which myself and the hundred best ranked human and machine players in the world participate, all fighting against the machines in the Harvard Cup style, of a group of coffee and coffee players. not federated, some of us are self-taught and others with non-school education, obtained outside of schools, others excellent representatives of schools. We all know in the competition that it is a race for knowledge, to obtain the
higher score. If in such a hypothetical 20-game competition I obtain 20 draws and Carlsen obtains 9 victories and one draw, does it mean that I would be the champion of the hypothetical tournament and the one with the highest rating given my best performance? If your answer is yes, then my duty is to continue playing, otherwise I don't see any point in doing so. Thank you in advance Marcus91. I remind everyone: "Any protectorate that closes the path to runners for the sole reason of competing without shoes in the marathon is an absurdity."

I understand that basically 50% will give you equal rating to the machine if you play enough games.
20 games are not enough to get a stable rating but I think that 200 games are enough.

The main problem is when you get less than 50%
If you get a draw and a loss in every 2 games that is 25% your rating is going to be smaller than a player who get 20% by a win and 4 losses every 5 games assuming a lot of games by both sides.

I think that it is not fair even if I get a better place in the table relative to humans by the new rule.
It's an arbitrary rule, like in soccer where a win earns three points and a draw earns one. The rationale behind this system is as follows:

1. Discourages excessively drawish playstyles

2. Promotes balanced time controls – This system pushes players toward time controls where they perform closer to their true skill level, discouraging the trend of favoring ultra-fast formats simply because they make it easier to grind out draws and rating points.

3. Improves rating accuracy – By adjusting the rating system to reflect more dynamic play, it aligns better with the Lichess blitz rating, ensuring a more accurate representation of a player's actual strength.

It's a thoughtful adjustment designed to balance the competitive landscape and encourage more engaging play, even if it might feel arbitrary
Although this is not a rule that I advocated, it does have a certain rationale as a sort of compromise between two extreme positions. The problem here is that the engine doesn't consider the human's rating, whereas the human knows the engine's rating at any given time control. You can say that the engine could use the opponent's LiChess rating, but that is problematic because many players are unrated (nominally "1500"), at least at a specific time control. Many strong players have no Rapid rating there. So the human can aim for a draw based on relative ratings, whereas the bot assumes that the ratings are equal, having no other information. The normal way of rating works fine if both players know the ratings of both players. But it is also possible to have a rating system that simply tosses out all draws, which is called WILO (for win/loss, a pun on ELO), which is especially appropriate when one player doesn't know the opponent's rating. This is consistent with running tournaments in which draws are simply replayed rather than scored as 1/2, as was common around the year 1900 and as is done today in Norway chess. Cutting the K factor in half is a compromise between ELO and WILO, so quite easy to justify for rating engine games, although difficult to "sell" to non-mathematicians. It doesn't punish players who score 50% with lots of draws vs few draws, but it does reward players who score say 10% by wins rather than draws. Perhaps it wouldn't be needed if in the future the bot is updated in a way that makes it less inclined to take draws when still down a queen.
I understand the reason for cutting the K factor in half for draws but I think that it is unfair.
I think the new rule also reward players who score more than 50% without losing relative to players who lose with the same percentage.

I think that the engine can use the history against the specific player instead of the rating of the player to decide about contempt.
I think it might be useful to narrate something from my own experience. In approximately 2003, who at that time was the fifth best human player in the world of chess, in a 20-match blitz match against an elite computer, the human obtained 4 points; I played against that same computer in that same place and time, place and historical moment controls, and I got 12 points in my favor against 8 for the computer. To top it all off, if you beat the computers you were persecuted for "alleged foul play on your part." In this way, a lodge protecting the elos of computers and protecting the elos of elite human players was founded by force of habit. I suffered that injustice firsthand; Instead of giving up, I continued the fight, and then accusations came against me in which I was told that the cause of the computers' defeats were bugs. Then I was accused that winning in time is not winning, then that with increase I could not win, and so on the carrot and stick methodology has been developed against me. Still, I didn't care. In this way there were no other paths for me than to be a street fighter, a hunter, and an anti-machine player. All the evidence did not seem to matter, and in the meantime some serious strategic errors were made by the elites of chess, science, and the media: Doors that should never have been closed were closed, such as the dismantling of Deep Blue, closing Alpha Zero to public use, and the like. But the world has changed just a little, and I am getting older, if not my soul. The doors must be open.
[pgn] Event "Casual bullet game"]
[Site "https://lichess.org/LpK8uPfg"]
[Date "2024.11.21"]
[White "LeelaQueenForKnight"]
[Black "Catecan"]
[Result "1/2-1/2"]
[UTCDate "2024.11.21"]
[UTCTime "20:47:02"]
[WhiteElo "2000"]
[BlackElo "2030"]
[WhiteTitle "BOT"]
[Variant "From Position"]
[TimeControl "60+0"]
[ECO "?"]
[Opening "?"]
[Termination "Normal"]
[FEN "r1bqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNB1KBNR w KQkq - 0 1"]
[SetUp "1"]
[Annotator "lichess.org"]

1. Nf3 d5 2. Nc3 e6 3. e4 c6 4. e5 f5 5. d4 a6 6. Ne2 Bd7 7. Nf4 Be7 8. h4 g6 9. Ng5 Bxg5 10. hxg5 Qxg5 11. Nxe6 Qxc1+ 12. Rxc1 Bxe6 13. f4 Ne7 14. a4 h5 15. Be2 Kf7 16. Kf2 b5 17. Ra1 Rhb8 18. a5 Rh8 19. b4 Rag8 20. Rh4 Rh7 21. Rah1 Rgh8 22. R1h3 Kg7 23. Rc3 Kf7 24. Ke1 Rh6 25. Kd2 R6h7 26. Rch3 Rh6 27. Rc3 R6h7 28. Ke1 Rh6 29. Kf2 R6h7 30. Rch3 Rh6 31. Rg3 R6h7 32. Rc3 Rh6 33. Ke1 R6h7 34. Rg3 Rh6 35. Rc3 R6h7 36. Kd2 Rh6 37. Rh2 R6h7 38. Rch3 Rh6 39. Rh1 R6h7 40. Rg3 Rh6 41. Rc3 R6h7 42. Rd3 Rh6 43. Rc3 R6h7 44. Rg3 Rh6 45. Rd3 R6h7 46. Rc3 Rh6 47. Rh2 R6h7 48. Rh4 Rh6 49. g3 R6h7 50. Ra3 Rh6 51. Ra1 R6h7 52. Rah1 Rh6 53. R4h2 Kg7 54. Rh4 R6h7 55. Ke3 Rh6 56. Kf2 R6h7 57. Ke3 Rh6 58. Kd2 R6h7 59. Bf3 Rh6 60. Ke2 R6h7 61. Ke3 Rh6 62. Kd2 R6h7 63. Ke3 Rh6 64. Ke2 R6h7 65. Kd2 Rh6 66. Ke1 R6h7 67. Ke2 Rh6 68. Kd2 R6h7 69. Kd3 Rh6 70. c3 R6h7 71. Ke2 Rh6 72. Kd2 R6h7 73. Ke2 Rh6 74. Kd3 R6h7 75. Kd2 Rh6 76. Ke3 R6h7 77. Be2 Rh6 78. Bf3 R6h7 79. Bd1 Rh6 80. Be2 R6h7 81. Bf3 Rh6 82. R4h2 R6h7 83. Be2 Rh6 84. Bf3 R6h7 85. Kf2 Rh6 86. Be2 R6h7 87. Bd1 Rh6 88. Bf3 R6h7 89. Be2 Rh6 90. Bd1 R6h7 91. Bf3 Rh6 92. Be2 R6h7 93. Bd1 Rh6 94. Bf3 R6h7 95. Rh4 Rh6 96. Be2 R6h7 97. Bd1 Rh6 98. Bf3 R6h7 99. Be2 Rh6 100. Rf1 R6h7 101. Rfh1 Rh6 102. Re1 R6h7 103. Rf1 Rh6 104. Rfh1 R6h7 105. Bd1 Rh6 106. Re1 R6h7 107. Rf1 Rh6 108. Kf3 R6h7 109. Kf2 Rh6 110. Ke1 R6h7 111. Bf3 Rh6 112. Be2 R6h7 113. Bd1 Rh6 114. Kd2 R6h7 115. Ke1 Rh6 116. Ke2 R6h7 117. Kd2 Rh6 118. Be2 R6h7 119. Bd1 Rh6 120. Be2 { The game is a draw. } 1/2-1/2[/pgn]
I am thinking chess is in a coin.Human beings for ever playing in one face.Now I am playing in the other face:"Antichess". Computers are as a fortres where owner forgot to close a little door behind. You must enter across this door.Forget the front.
lkaufman
Posts: 6108
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Amazing results of LeelaKnightOdds-Dev

Post by lkaufman »

Father wrote: Thu Nov 21, 2024 9:53 pm
Father wrote: Thu Nov 21, 2024 8:40 pm
Uri Blass wrote: Thu Nov 21, 2024 7:50 pm
lkaufman wrote: Thu Nov 21, 2024 7:08 pm
Marcus91 wrote: Thu Nov 21, 2024 5:12 pm
Uri Blass wrote: Thu Nov 21, 2024 4:21 pm
Father wrote: Thu Nov 21, 2024 4:01 pm
Marcus91 wrote: Thu Nov 21, 2024 3:00 pm
Brunetti wrote: Thu Nov 21, 2024 11:05 am
Marcus91 wrote: Thu Nov 21, 2024 10:40 am maybe I should add an arbitrary rule, like halve 'K' for all draws?
Doing this is like saying that a win is worth 4 draws, which means you're changing the rules of chess and, consequently, the consistency of the Elo formula. I'm not saying this because I'm currently drawing games, but because that's just how it is :)

Alex
The Elo formula remains consistent if you consider draws as half the weight. Four draws are not equal to one win, but rather one win and one loss. I'm not making any difference between players. If you draw against a bot rated higher than you, you will continue to rise, just slowly. By giving half the weight to draws the model assumes a better fit, this means that those who drew a lot were overestimated and those who drew little were under estimated, given that lichess blitz rating is the reference here
Thank you Marcus91 for your time and your message. You know that I am not a mathematician, so I would like you to explain it well and clear up my doubts. Let's take an example context: I find myself in a man versus machine challenge in which myself and the hundred best ranked human and machine players in the world participate, all fighting against the machines in the Harvard Cup style, of a group of coffee and coffee players. not federated, some of us are self-taught and others with non-school education, obtained outside of schools, others excellent representatives of schools. We all know in the competition that it is a race for knowledge, to obtain the
higher score. If in such a hypothetical 20-game competition I obtain 20 draws and Carlsen obtains 9 victories and one draw, does it mean that I would be the champion of the hypothetical tournament and the one with the highest rating given my best performance? If your answer is yes, then my duty is to continue playing, otherwise I don't see any point in doing so. Thank you in advance Marcus91. I remind everyone: "Any protectorate that closes the path to runners for the sole reason of competing without shoes in the marathon is an absurdity."

I understand that basically 50% will give you equal rating to the machine if you play enough games.
20 games are not enough to get a stable rating but I think that 200 games are enough.

The main problem is when you get less than 50%
If you get a draw and a loss in every 2 games that is 25% your rating is going to be smaller than a player who get 20% by a win and 4 losses every 5 games assuming a lot of games by both sides.

I think that it is not fair even if I get a better place in the table relative to humans by the new rule.
It's an arbitrary rule, like in soccer where a win earns three points and a draw earns one. The rationale behind this system is as follows:

1. Discourages excessively drawish playstyles

2. Promotes balanced time controls – This system pushes players toward time controls where they perform closer to their true skill level, discouraging the trend of favoring ultra-fast formats simply because they make it easier to grind out draws and rating points.

3. Improves rating accuracy – By adjusting the rating system to reflect more dynamic play, it aligns better with the Lichess blitz rating, ensuring a more accurate representation of a player's actual strength.

It's a thoughtful adjustment designed to balance the competitive landscape and encourage more engaging play, even if it might feel arbitrary
Although this is not a rule that I advocated, it does have a certain rationale as a sort of compromise between two extreme positions. The problem here is that the engine doesn't consider the human's rating, whereas the human knows the engine's rating at any given time control. You can say that the engine could use the opponent's LiChess rating, but that is problematic because many players are unrated (nominally "1500"), at least at a specific time control. Many strong players have no Rapid rating there. So the human can aim for a draw based on relative ratings, whereas the bot assumes that the ratings are equal, having no other information. The normal way of rating works fine if both players know the ratings of both players. But it is also possible to have a rating system that simply tosses out all draws, which is called WILO (for win/loss, a pun on ELO), which is especially appropriate when one player doesn't know the opponent's rating. This is consistent with running tournaments in which draws are simply replayed rather than scored as 1/2, as was common around the year 1900 and as is done today in Norway chess. Cutting the K factor in half is a compromise between ELO and WILO, so quite easy to justify for rating engine games, although difficult to "sell" to non-mathematicians. It doesn't punish players who score 50% with lots of draws vs few draws, but it does reward players who score say 10% by wins rather than draws. Perhaps it wouldn't be needed if in the future the bot is updated in a way that makes it less inclined to take draws when still down a queen.
I understand the reason for cutting the K factor in half for draws but I think that it is unfair.
I think the new rule also reward players who score more than 50% without losing relative to players who lose with the same percentage.

I think that the engine can use the history against the specific player instead of the rating of the player to decide about contempt.
I think it might be useful to narrate something from my own experience. In approximately 2003, who at that time was the fifth best human player in the world of chess, in a 20-match blitz match against an elite computer, the human obtained 4 points; I played against that same computer in that same place and time, place and historical moment controls, and I got 12 points in my favor against 8 for the computer. To top it all off, if you beat the computers you were persecuted for "alleged foul play on your part." In this way, a lodge protecting the elos of computers and protecting the elos of elite human players was founded by force of habit. I suffered that injustice firsthand; Instead of giving up, I continued the fight, and then accusations came against me in which I was told that the cause of the computers' defeats were bugs. Then I was accused that winning in time is not winning, then that with increase I could not win, and so on the carrot and stick methodology has been developed against me. Still, I didn't care. In this way there were no other paths for me than to be a street fighter, a hunter, and an anti-machine player. All the evidence did not seem to matter, and in the meantime some serious strategic errors were made by the elites of chess, science, and the media: Doors that should never have been closed were closed, such as the dismantling of Deep Blue, closing Alpha Zero to public use, and the like. But the world has changed just a little, and I am getting older, if not my soul. The doors must be open.
[pgn] Event "Casual bullet game"]
[Site "https://lichess.org/LpK8uPfg"]
[Date "2024.11.21"]
[White "LeelaQueenForKnight"]
[Black "Catecan"]
[Result "1/2-1/2"]
[UTCDate "2024.11.21"]
[UTCTime "20:47:02"]
[WhiteElo "2000"]
[BlackElo "2030"]
[WhiteTitle "BOT"]
[Variant "From Position"]
[TimeControl "60+0"]
[ECO "?"]
[Opening "?"]
[Termination "Normal"]
[FEN "r1bqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNB1KBNR w KQkq - 0 1"]
[SetUp "1"]
[Annotator "lichess.org"]

1. Nf3 d5 2. Nc3 e6 3. e4 c6 4. e5 f5 5. d4 a6 6. Ne2 Bd7 7. Nf4 Be7 8. h4 g6 9. Ng5 Bxg5 10. hxg5 Qxg5 11. Nxe6 Qxc1+ 12. Rxc1 Bxe6 13. f4 Ne7 14. a4 h5 15. Be2 Kf7 16. Kf2 b5 17. Ra1 Rhb8 18. a5 Rh8 19. b4 Rag8 20. Rh4 Rh7 21. Rah1 Rgh8 22. R1h3 Kg7 23. Rc3 Kf7 24. Ke1 Rh6 25. Kd2 R6h7 26. Rch3 Rh6 27. Rc3 R6h7 28. Ke1 Rh6 29. Kf2 R6h7 30. Rch3 Rh6 31. Rg3 R6h7 32. Rc3 Rh6 33. Ke1 R6h7 34. Rg3 Rh6 35. Rc3 R6h7 36. Kd2 Rh6 37. Rh2 R6h7 38. Rch3 Rh6 39. Rh1 R6h7 40. Rg3 Rh6 41. Rc3 R6h7 42. Rd3 Rh6 43. Rc3 R6h7 44. Rg3 Rh6 45. Rd3 R6h7 46. Rc3 Rh6 47. Rh2 R6h7 48. Rh4 Rh6 49. g3 R6h7 50. Ra3 Rh6 51. Ra1 R6h7 52. Rah1 Rh6 53. R4h2 Kg7 54. Rh4 R6h7 55. Ke3 Rh6 56. Kf2 R6h7 57. Ke3 Rh6 58. Kd2 R6h7 59. Bf3 Rh6 60. Ke2 R6h7 61. Ke3 Rh6 62. Kd2 R6h7 63. Ke3 Rh6 64. Ke2 R6h7 65. Kd2 Rh6 66. Ke1 R6h7 67. Ke2 Rh6 68. Kd2 R6h7 69. Kd3 Rh6 70. c3 R6h7 71. Ke2 Rh6 72. Kd2 R6h7 73. Ke2 Rh6 74. Kd3 R6h7 75. Kd2 Rh6 76. Ke3 R6h7 77. Be2 Rh6 78. Bf3 R6h7 79. Bd1 Rh6 80. Be2 R6h7 81. Bf3 Rh6 82. R4h2 R6h7 83. Be2 Rh6 84. Bf3 R6h7 85. Kf2 Rh6 86. Be2 R6h7 87. Bd1 Rh6 88. Bf3 R6h7 89. Be2 Rh6 90. Bd1 R6h7 91. Bf3 Rh6 92. Be2 R6h7 93. Bd1 Rh6 94. Bf3 R6h7 95. Rh4 Rh6 96. Be2 R6h7 97. Bd1 Rh6 98. Bf3 R6h7 99. Be2 Rh6 100. Rf1 R6h7 101. Rfh1 Rh6 102. Re1 R6h7 103. Rf1 Rh6 104. Rfh1 R6h7 105. Bd1 Rh6 106. Re1 R6h7 107. Rf1 Rh6 108. Kf3 R6h7 109. Kf2 Rh6 110. Ke1 R6h7 111. Bf3 Rh6 112. Be2 R6h7 113. Bd1 Rh6 114. Kd2 R6h7 115. Ke1 Rh6 116. Ke2 R6h7 117. Kd2 Rh6 118. Be2 R6h7 119. Bd1 Rh6 120. Be2 { The game is a draw. } 1/2-1/2[/pgn]
Note how in the game you post Leela consistently refuses to make a draw by repetition, despite being a piece down, until the fifty move rule finally terminates the game. This is because Leela QueenForKnight uses the Queen odds net, which assumes you are a queen odds level player, so it "thinks" that it is winning when the game starts with it "only" down queen for knight! So I can say that any draws earned against LeelaQueenForKnight are fully deserved, which is not yet the case for the other three bots.
Komodo rules!
Father
Posts: 1661
Joined: Sun Mar 19, 2006 4:39 am
Location: Colombia
Full name: Pablo Ignacio Restrepo

Re: Amazing results of LeelaKnightOdds-Dev

Post by Father »

lkaufman wrote: Thu Nov 21, 2024 10:39 pm
Father wrote: Thu Nov 21, 2024 9:53 pm
Father wrote: Thu Nov 21, 2024 8:40 pm
Uri Blass wrote: Thu Nov 21, 2024 7:50 pm
lkaufman wrote: Thu Nov 21, 2024 7:08 pm
Marcus91 wrote: Thu Nov 21, 2024 5:12 pm
Uri Blass wrote: Thu Nov 21, 2024 4:21 pm
Father wrote: Thu Nov 21, 2024 4:01 pm
Marcus91 wrote: Thu Nov 21, 2024 3:00 pm
Brunetti wrote: Thu Nov 21, 2024 11:05 am
Marcus91 wrote: Thu Nov 21, 2024 10:40 am maybe I should add an arbitrary rule, like halve 'K' for all draws?
Doing this is like saying that a win is worth 4 draws, which means you're changing the rules of chess and, consequently, the consistency of the Elo formula. I'm not saying this because I'm currently drawing games, but because that's just how it is :)

Alex
The Elo formula remains consistent if you consider draws as half the weight. Four draws are not equal to one win, but rather one win and one loss. I'm not making any difference between players. If you draw against a bot rated higher than you, you will continue to rise, just slowly. By giving half the weight to draws the model assumes a better fit, this means that those who drew a lot were overestimated and those who drew little were under estimated, given that lichess blitz rating is the reference here
Thank you Marcus91 for your time and your message. You know that I am not a mathematician, so I would like you to explain it well and clear up my doubts. Let's take an example context: I find myself in a man versus machine challenge in which myself and the hundred best ranked human and machine players in the world participate, all fighting against the machines in the Harvard Cup style, of a group of coffee and coffee players. not federated, some of us are self-taught and others with non-school education, obtained outside of schools, others excellent representatives of schools. We all know in the competition that it is a race for knowledge, to obtain the
higher score. If in such a hypothetical 20-game competition I obtain 20 draws and Carlsen obtains 9 victories and one draw, does it mean that I would be the champion of the hypothetical tournament and the one with the highest rating given my best performance? If your answer is yes, then my duty is to continue playing, otherwise I don't see any point in doing so. Thank you in advance Marcus91. I remind everyone: "Any protectorate that closes the path to runners for the sole reason of competing without shoes in the marathon is an absurdity."

I understand that basically 50% will give you equal rating to the machine if you play enough games.
20 games are not enough to get a stable rating but I think that 200 games are enough.

The main problem is when you get less than 50%
If you get a draw and a loss in every 2 games that is 25% your rating is going to be smaller than a player who get 20% by a win and 4 losses every 5 games assuming a lot of games by both sides.

I think that it is not fair even if I get a better place in the table relative to humans by the new rule.
It's an arbitrary rule, like in soccer where a win earns three points and a draw earns one. The rationale behind this system is as follows:

1. Discourages excessively drawish playstyles

2. Promotes balanced time controls – This system pushes players toward time controls where they perform closer to their true skill level, discouraging the trend of favoring ultra-fast formats simply because they make it easier to grind out draws and rating points.

3. Improves rating accuracy – By adjusting the rating system to reflect more dynamic play, it aligns better with the Lichess blitz rating, ensuring a more accurate representation of a player's actual strength.

It's a thoughtful adjustment designed to balance the competitive landscape and encourage more engaging play, even if it might feel arbitrary
Although this is not a rule that I advocated, it does have a certain rationale as a sort of compromise between two extreme positions. The problem here is that the engine doesn't consider the human's rating, whereas the human knows the engine's rating at any given time control. You can say that the engine could use the opponent's LiChess rating, but that is problematic because many players are unrated (nominally "1500"), at least at a specific time control. Many strong players have no Rapid rating there. So the human can aim for a draw based on relative ratings, whereas the bot assumes that the ratings are equal, having no other information. The normal way of rating works fine if both players know the ratings of both players. But it is also possible to have a rating system that simply tosses out all draws, which is called WILO (for win/loss, a pun on ELO), which is especially appropriate when one player doesn't know the opponent's rating. This is consistent with running tournaments in which draws are simply replayed rather than scored as 1/2, as was common around the year 1900 and as is done today in Norway chess. Cutting the K factor in half is a compromise between ELO and WILO, so quite easy to justify for rating engine games, although difficult to "sell" to non-mathematicians. It doesn't punish players who score 50% with lots of draws vs few draws, but it does reward players who score say 10% by wins rather than draws. Perhaps it wouldn't be needed if in the future the bot is updated in a way that makes it less inclined to take draws when still down a queen.
I understand the reason for cutting the K factor in half for draws but I think that it is unfair.
I think the new rule also reward players who score more than 50% without losing relative to players who lose with the same percentage.

I think that the engine can use the history against the specific player instead of the rating of the player to decide about contempt.
I think it might be useful to narrate something from my own experience. In approximately 2003, who at that time was the fifth best human player in the world of chess, in a 20-match blitz match against an elite computer, the human obtained 4 points; I played against that same computer in that same place and time, place and historical moment controls, and I got 12 points in my favor against 8 for the computer. To top it all off, if you beat the computers you were persecuted for "alleged foul play on your part." In this way, a lodge protecting the elos of computers and protecting the elos of elite human players was founded by force of habit. I suffered that injustice firsthand; Instead of giving up, I continued the fight, and then accusations came against me in which I was told that the cause of the computers' defeats were bugs. Then I was accused that winning in time is not winning, then that with increase I could not win, and so on the carrot and stick methodology has been developed against me. Still, I didn't care. In this way there were no other paths for me than to be a street fighter, a hunter, and an anti-machine player. All the evidence did not seem to matter, and in the meantime some serious strategic errors were made by the elites of chess, science, and the media: Doors that should never have been closed were closed, such as the dismantling of Deep Blue, closing Alpha Zero to public use, and the like. But the world has changed just a little, and I am getting older, if not my soul. The doors must be open.
[pgn] Event "Casual bullet game"]
[Site "https://lichess.org/LpK8uPfg"]
[Date "2024.11.21"]
[White "LeelaQueenForKnight"]
[Black "Catecan"]
[Result "1/2-1/2"]
[UTCDate "2024.11.21"]
[UTCTime "20:47:02"]
[WhiteElo "2000"]
[BlackElo "2030"]
[WhiteTitle "BOT"]
[Variant "From Position"]
[TimeControl "60+0"]
[ECO "?"]
[Opening "?"]
[Termination "Normal"]
[FEN "r1bqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNB1KBNR w KQkq - 0 1"]
[SetUp "1"]
[Annotator "lichess.org"]

1. Nf3 d5 2. Nc3 e6 3. e4 c6 4. e5 f5 5. d4 a6 6. Ne2 Bd7 7. Nf4 Be7 8. h4 g6 9. Ng5 Bxg5 10. hxg5 Qxg5 11. Nxe6 Qxc1+ 12. Rxc1 Bxe6 13. f4 Ne7 14. a4 h5 15. Be2 Kf7 16. Kf2 b5 17. Ra1 Rhb8 18. a5 Rh8 19. b4 Rag8 20. Rh4 Rh7 21. Rah1 Rgh8 22. R1h3 Kg7 23. Rc3 Kf7 24. Ke1 Rh6 25. Kd2 R6h7 26. Rch3 Rh6 27. Rc3 R6h7 28. Ke1 Rh6 29. Kf2 R6h7 30. Rch3 Rh6 31. Rg3 R6h7 32. Rc3 Rh6 33. Ke1 R6h7 34. Rg3 Rh6 35. Rc3 R6h7 36. Kd2 Rh6 37. Rh2 R6h7 38. Rch3 Rh6 39. Rh1 R6h7 40. Rg3 Rh6 41. Rc3 R6h7 42. Rd3 Rh6 43. Rc3 R6h7 44. Rg3 Rh6 45. Rd3 R6h7 46. Rc3 Rh6 47. Rh2 R6h7 48. Rh4 Rh6 49. g3 R6h7 50. Ra3 Rh6 51. Ra1 R6h7 52. Rah1 Rh6 53. R4h2 Kg7 54. Rh4 R6h7 55. Ke3 Rh6 56. Kf2 R6h7 57. Ke3 Rh6 58. Kd2 R6h7 59. Bf3 Rh6 60. Ke2 R6h7 61. Ke3 Rh6 62. Kd2 R6h7 63. Ke3 Rh6 64. Ke2 R6h7 65. Kd2 Rh6 66. Ke1 R6h7 67. Ke2 Rh6 68. Kd2 R6h7 69. Kd3 Rh6 70. c3 R6h7 71. Ke2 Rh6 72. Kd2 R6h7 73. Ke2 Rh6 74. Kd3 R6h7 75. Kd2 Rh6 76. Ke3 R6h7 77. Be2 Rh6 78. Bf3 R6h7 79. Bd1 Rh6 80. Be2 R6h7 81. Bf3 Rh6 82. R4h2 R6h7 83. Be2 Rh6 84. Bf3 R6h7 85. Kf2 Rh6 86. Be2 R6h7 87. Bd1 Rh6 88. Bf3 R6h7 89. Be2 Rh6 90. Bd1 R6h7 91. Bf3 Rh6 92. Be2 R6h7 93. Bd1 Rh6 94. Bf3 R6h7 95. Rh4 Rh6 96. Be2 R6h7 97. Bd1 Rh6 98. Bf3 R6h7 99. Be2 Rh6 100. Rf1 R6h7 101. Rfh1 Rh6 102. Re1 R6h7 103. Rf1 Rh6 104. Rfh1 R6h7 105. Bd1 Rh6 106. Re1 R6h7 107. Rf1 Rh6 108. Kf3 R6h7 109. Kf2 Rh6 110. Ke1 R6h7 111. Bf3 Rh6 112. Be2 R6h7 113. Bd1 Rh6 114. Kd2 R6h7 115. Ke1 Rh6 116. Ke2 R6h7 117. Kd2 Rh6 118. Be2 R6h7 119. Bd1 Rh6 120. Be2 { The game is a draw. } 1/2-1/2[/pgn]
Note how in the game you post Leela consistently refuses to make a draw by repetition, despite being a piece down, until the fifty move rule finally terminates the game. This is because Leela QueenForKnight uses the Queen odds net, which assumes you are a queen odds level player, so it "thinks" that it is winning when the game starts with it "only" down queen for knight! So I can say that any draws earned against LeelaQueenForKnight are fully deserved, which is not yet the case for the other three bots.
Mr. Larry Kaufman, good afternoon. For me, this whole world of chess against computers is simply fascinating. And even more so when it comes to the Odds. It is a two-way fight of ideas and imagination, programmers and players, all on the work table: "Computers." Thanks again!!
I am thinking chess is in a coin.Human beings for ever playing in one face.Now I am playing in the other face:"Antichess". Computers are as a fortres where owner forgot to close a little door behind. You must enter across this door.Forget the front.
Uri Blass
Posts: 10632
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Amazing results of LeelaKnightOdds-Dev

Post by Uri Blass »

lkaufman wrote: Thu Nov 21, 2024 10:39 pm
Father wrote: Thu Nov 21, 2024 9:53 pm
Father wrote: Thu Nov 21, 2024 8:40 pm
Uri Blass wrote: Thu Nov 21, 2024 7:50 pm
lkaufman wrote: Thu Nov 21, 2024 7:08 pm
Marcus91 wrote: Thu Nov 21, 2024 5:12 pm
Uri Blass wrote: Thu Nov 21, 2024 4:21 pm
Father wrote: Thu Nov 21, 2024 4:01 pm
Marcus91 wrote: Thu Nov 21, 2024 3:00 pm
Brunetti wrote: Thu Nov 21, 2024 11:05 am
Marcus91 wrote: Thu Nov 21, 2024 10:40 am maybe I should add an arbitrary rule, like halve 'K' for all draws?
Doing this is like saying that a win is worth 4 draws, which means you're changing the rules of chess and, consequently, the consistency of the Elo formula. I'm not saying this because I'm currently drawing games, but because that's just how it is :)

Alex
The Elo formula remains consistent if you consider draws as half the weight. Four draws are not equal to one win, but rather one win and one loss. I'm not making any difference between players. If you draw against a bot rated higher than you, you will continue to rise, just slowly. By giving half the weight to draws the model assumes a better fit, this means that those who drew a lot were overestimated and those who drew little were under estimated, given that lichess blitz rating is the reference here
Thank you Marcus91 for your time and your message. You know that I am not a mathematician, so I would like you to explain it well and clear up my doubts. Let's take an example context: I find myself in a man versus machine challenge in which myself and the hundred best ranked human and machine players in the world participate, all fighting against the machines in the Harvard Cup style, of a group of coffee and coffee players. not federated, some of us are self-taught and others with non-school education, obtained outside of schools, others excellent representatives of schools. We all know in the competition that it is a race for knowledge, to obtain the
higher score. If in such a hypothetical 20-game competition I obtain 20 draws and Carlsen obtains 9 victories and one draw, does it mean that I would be the champion of the hypothetical tournament and the one with the highest rating given my best performance? If your answer is yes, then my duty is to continue playing, otherwise I don't see any point in doing so. Thank you in advance Marcus91. I remind everyone: "Any protectorate that closes the path to runners for the sole reason of competing without shoes in the marathon is an absurdity."

I understand that basically 50% will give you equal rating to the machine if you play enough games.
20 games are not enough to get a stable rating but I think that 200 games are enough.

The main problem is when you get less than 50%
If you get a draw and a loss in every 2 games that is 25% your rating is going to be smaller than a player who get 20% by a win and 4 losses every 5 games assuming a lot of games by both sides.

I think that it is not fair even if I get a better place in the table relative to humans by the new rule.
It's an arbitrary rule, like in soccer where a win earns three points and a draw earns one. The rationale behind this system is as follows:

1. Discourages excessively drawish playstyles

2. Promotes balanced time controls – This system pushes players toward time controls where they perform closer to their true skill level, discouraging the trend of favoring ultra-fast formats simply because they make it easier to grind out draws and rating points.

3. Improves rating accuracy – By adjusting the rating system to reflect more dynamic play, it aligns better with the Lichess blitz rating, ensuring a more accurate representation of a player's actual strength.

It's a thoughtful adjustment designed to balance the competitive landscape and encourage more engaging play, even if it might feel arbitrary
Although this is not a rule that I advocated, it does have a certain rationale as a sort of compromise between two extreme positions. The problem here is that the engine doesn't consider the human's rating, whereas the human knows the engine's rating at any given time control. You can say that the engine could use the opponent's LiChess rating, but that is problematic because many players are unrated (nominally "1500"), at least at a specific time control. Many strong players have no Rapid rating there. So the human can aim for a draw based on relative ratings, whereas the bot assumes that the ratings are equal, having no other information. The normal way of rating works fine if both players know the ratings of both players. But it is also possible to have a rating system that simply tosses out all draws, which is called WILO (for win/loss, a pun on ELO), which is especially appropriate when one player doesn't know the opponent's rating. This is consistent with running tournaments in which draws are simply replayed rather than scored as 1/2, as was common around the year 1900 and as is done today in Norway chess. Cutting the K factor in half is a compromise between ELO and WILO, so quite easy to justify for rating engine games, although difficult to "sell" to non-mathematicians. It doesn't punish players who score 50% with lots of draws vs few draws, but it does reward players who score say 10% by wins rather than draws. Perhaps it wouldn't be needed if in the future the bot is updated in a way that makes it less inclined to take draws when still down a queen.
I understand the reason for cutting the K factor in half for draws but I think that it is unfair.
I think the new rule also reward players who score more than 50% without losing relative to players who lose with the same percentage.

I think that the engine can use the history against the specific player instead of the rating of the player to decide about contempt.
I think it might be useful to narrate something from my own experience. In approximately 2003, who at that time was the fifth best human player in the world of chess, in a 20-match blitz match against an elite computer, the human obtained 4 points; I played against that same computer in that same place and time, place and historical moment controls, and I got 12 points in my favor against 8 for the computer. To top it all off, if you beat the computers you were persecuted for "alleged foul play on your part." In this way, a lodge protecting the elos of computers and protecting the elos of elite human players was founded by force of habit. I suffered that injustice firsthand; Instead of giving up, I continued the fight, and then accusations came against me in which I was told that the cause of the computers' defeats were bugs. Then I was accused that winning in time is not winning, then that with increase I could not win, and so on the carrot and stick methodology has been developed against me. Still, I didn't care. In this way there were no other paths for me than to be a street fighter, a hunter, and an anti-machine player. All the evidence did not seem to matter, and in the meantime some serious strategic errors were made by the elites of chess, science, and the media: Doors that should never have been closed were closed, such as the dismantling of Deep Blue, closing Alpha Zero to public use, and the like. But the world has changed just a little, and I am getting older, if not my soul. The doors must be open.
[pgn] Event "Casual bullet game"]
[Site "https://lichess.org/LpK8uPfg"]
[Date "2024.11.21"]
[White "LeelaQueenForKnight"]
[Black "Catecan"]
[Result "1/2-1/2"]
[UTCDate "2024.11.21"]
[UTCTime "20:47:02"]
[WhiteElo "2000"]
[BlackElo "2030"]
[WhiteTitle "BOT"]
[Variant "From Position"]
[TimeControl "60+0"]
[ECO "?"]
[Opening "?"]
[Termination "Normal"]
[FEN "r1bqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNB1KBNR w KQkq - 0 1"]
[SetUp "1"]
[Annotator "lichess.org"]

1. Nf3 d5 2. Nc3 e6 3. e4 c6 4. e5 f5 5. d4 a6 6. Ne2 Bd7 7. Nf4 Be7 8. h4 g6 9. Ng5 Bxg5 10. hxg5 Qxg5 11. Nxe6 Qxc1+ 12. Rxc1 Bxe6 13. f4 Ne7 14. a4 h5 15. Be2 Kf7 16. Kf2 b5 17. Ra1 Rhb8 18. a5 Rh8 19. b4 Rag8 20. Rh4 Rh7 21. Rah1 Rgh8 22. R1h3 Kg7 23. Rc3 Kf7 24. Ke1 Rh6 25. Kd2 R6h7 26. Rch3 Rh6 27. Rc3 R6h7 28. Ke1 Rh6 29. Kf2 R6h7 30. Rch3 Rh6 31. Rg3 R6h7 32. Rc3 Rh6 33. Ke1 R6h7 34. Rg3 Rh6 35. Rc3 R6h7 36. Kd2 Rh6 37. Rh2 R6h7 38. Rch3 Rh6 39. Rh1 R6h7 40. Rg3 Rh6 41. Rc3 R6h7 42. Rd3 Rh6 43. Rc3 R6h7 44. Rg3 Rh6 45. Rd3 R6h7 46. Rc3 Rh6 47. Rh2 R6h7 48. Rh4 Rh6 49. g3 R6h7 50. Ra3 Rh6 51. Ra1 R6h7 52. Rah1 Rh6 53. R4h2 Kg7 54. Rh4 R6h7 55. Ke3 Rh6 56. Kf2 R6h7 57. Ke3 Rh6 58. Kd2 R6h7 59. Bf3 Rh6 60. Ke2 R6h7 61. Ke3 Rh6 62. Kd2 R6h7 63. Ke3 Rh6 64. Ke2 R6h7 65. Kd2 Rh6 66. Ke1 R6h7 67. Ke2 Rh6 68. Kd2 R6h7 69. Kd3 Rh6 70. c3 R6h7 71. Ke2 Rh6 72. Kd2 R6h7 73. Ke2 Rh6 74. Kd3 R6h7 75. Kd2 Rh6 76. Ke3 R6h7 77. Be2 Rh6 78. Bf3 R6h7 79. Bd1 Rh6 80. Be2 R6h7 81. Bf3 Rh6 82. R4h2 R6h7 83. Be2 Rh6 84. Bf3 R6h7 85. Kf2 Rh6 86. Be2 R6h7 87. Bd1 Rh6 88. Bf3 R6h7 89. Be2 Rh6 90. Bd1 R6h7 91. Bf3 Rh6 92. Be2 R6h7 93. Bd1 Rh6 94. Bf3 R6h7 95. Rh4 Rh6 96. Be2 R6h7 97. Bd1 Rh6 98. Bf3 R6h7 99. Be2 Rh6 100. Rf1 R6h7 101. Rfh1 Rh6 102. Re1 R6h7 103. Rf1 Rh6 104. Rfh1 R6h7 105. Bd1 Rh6 106. Re1 R6h7 107. Rf1 Rh6 108. Kf3 R6h7 109. Kf2 Rh6 110. Ke1 R6h7 111. Bf3 Rh6 112. Be2 R6h7 113. Bd1 Rh6 114. Kd2 R6h7 115. Ke1 Rh6 116. Ke2 R6h7 117. Kd2 Rh6 118. Be2 R6h7 119. Bd1 Rh6 120. Be2 { The game is a draw. } 1/2-1/2[/pgn]
Note how in the game you post Leela consistently refuses to make a draw by repetition, despite being a piece down, until the fifty move rule finally terminates the game. This is because Leela QueenForKnight uses the Queen odds net, which assumes you are a queen odds level player, so it "thinks" that it is winning when the game starts with it "only" down queen for knight! So I can say that any draws earned against LeelaQueenForKnight are fully deserved, which is not yet the case for the other three bots.

1)The game was 1+0 game and Leela could win on time by pushing a pawn to avoid a draw by the fifty move rule.
2)I believe draw is a draw and fully deserved in any case. The fact that the engine can be improved to win does not mean that the human does not deserve a draw.
I believe that in any case playing for a draw is easier than playing for a win so significantly weaker players who play for a draw can achieve bigger percentage against the bot.
Father
Posts: 1661
Joined: Sun Mar 19, 2006 4:39 am
Location: Colombia
Full name: Pablo Ignacio Restrepo

Re: Amazing results of LeelaKnightOdds-Dev

Post by Father »

Uri Blass wrote: Thu Nov 21, 2024 11:44 pm
lkaufman wrote: Thu Nov 21, 2024 10:39 pm
Father wrote: Thu Nov 21, 2024 9:53 pm
Father wrote: Thu Nov 21, 2024 8:40 pm
Uri Blass wrote: Thu Nov 21, 2024 7:50 pm
lkaufman wrote: Thu Nov 21, 2024 7:08 pm
Marcus91 wrote: Thu Nov 21, 2024 5:12 pm
Uri Blass wrote: Thu Nov 21, 2024 4:21 pm
Father wrote: Thu Nov 21, 2024 4:01 pm
Marcus91 wrote: Thu Nov 21, 2024 3:00 pm
Brunetti wrote: Thu Nov 21, 2024 11:05 am
Marcus91 wrote: Thu Nov 21, 2024 10:40 am maybe I should add an arbitrary rule, like halve 'K' for all draws?
Doing this is like saying that a win is worth 4 draws, which means you're changing the rules of chess and, consequently, the consistency of the Elo formula. I'm not saying this because I'm currently drawing games, but because that's just how it is :)

Alex
The Elo formula remains consistent if you consider draws as half the weight. Four draws are not equal to one win, but rather one win and one loss. I'm not making any difference between players. If you draw against a bot rated higher than you, you will continue to rise, just slowly. By giving half the weight to draws the model assumes a better fit, this means that those who drew a lot were overestimated and those who drew little were under estimated, given that lichess blitz rating is the reference here
Thank you Marcus91 for your time and your message. You know that I am not a mathematician, so I would like you to explain it well and clear up my doubts. Let's take an example context: I find myself in a man versus machine challenge in which myself and the hundred best ranked human and machine players in the world participate, all fighting against the machines in the Harvard Cup style, of a group of coffee and coffee players. not federated, some of us are self-taught and others with non-school education, obtained outside of schools, others excellent representatives of schools. We all know in the competition that it is a race for knowledge, to obtain the
higher score. If in such a hypothetical 20-game competition I obtain 20 draws and Carlsen obtains 9 victories and one draw, does it mean that I would be the champion of the hypothetical tournament and the one with the highest rating given my best performance? If your answer is yes, then my duty is to continue playing, otherwise I don't see any point in doing so. Thank you in advance Marcus91. I remind everyone: "Any protectorate that closes the path to runners for the sole reason of competing without shoes in the marathon is an absurdity."

I understand that basically 50% will give you equal rating to the machine if you play enough games.
20 games are not enough to get a stable rating but I think that 200 games are enough.

The main problem is when you get less than 50%
If you get a draw and a loss in every 2 games that is 25% your rating is going to be smaller than a player who get 20% by a win and 4 losses every 5 games assuming a lot of games by both sides.

I think that it is not fair even if I get a better place in the table relative to humans by the new rule.
It's an arbitrary rule, like in soccer where a win earns three points and a draw earns one. The rationale behind this system is as follows:

1. Discourages excessively drawish playstyles

2. Promotes balanced time controls – This system pushes players toward time controls where they perform closer to their true skill level, discouraging the trend of favoring ultra-fast formats simply because they make it easier to grind out draws and rating points.

3. Improves rating accuracy – By adjusting the rating system to reflect more dynamic play, it aligns better with the Lichess blitz rating, ensuring a more accurate representation of a player's actual strength.

It's a thoughtful adjustment designed to balance the competitive landscape and encourage more engaging play, even if it might feel arbitrary
Although this is not a rule that I advocated, it does have a certain rationale as a sort of compromise between two extreme positions. The problem here is that the engine doesn't consider the human's rating, whereas the human knows the engine's rating at any given time control. You can say that the engine could use the opponent's LiChess rating, but that is problematic because many players are unrated (nominally "1500"), at least at a specific time control. Many strong players have no Rapid rating there. So the human can aim for a draw based on relative ratings, whereas the bot assumes that the ratings are equal, having no other information. The normal way of rating works fine if both players know the ratings of both players. But it is also possible to have a rating system that simply tosses out all draws, which is called WILO (for win/loss, a pun on ELO), which is especially appropriate when one player doesn't know the opponent's rating. This is consistent with running tournaments in which draws are simply replayed rather than scored as 1/2, as was common around the year 1900 and as is done today in Norway chess. Cutting the K factor in half is a compromise between ELO and WILO, so quite easy to justify for rating engine games, although difficult to "sell" to non-mathematicians. It doesn't punish players who score 50% with lots of draws vs few draws, but it does reward players who score say 10% by wins rather than draws. Perhaps it wouldn't be needed if in the future the bot is updated in a way that makes it less inclined to take draws when still down a queen.
I understand the reason for cutting the K factor in half for draws but I think that it is unfair.
I think the new rule also reward players who score more than 50% without losing relative to players who lose with the same percentage.

I think that the engine can use the history against the specific player instead of the rating of the player to decide about contempt.
I think it might be useful to narrate something from my own experience. In approximately 2003, who at that time was the fifth best human player in the world of chess, in a 20-match blitz match against an elite computer, the human obtained 4 points; I played against that same computer in that same place and time, place and historical moment controls, and I got 12 points in my favor against 8 for the computer. To top it all off, if you beat the computers you were persecuted for "alleged foul play on your part." In this way, a lodge protecting the elos of computers and protecting the elos of elite human players was founded by force of habit. I suffered that injustice firsthand; Instead of giving up, I continued the fight, and then accusations came against me in which I was told that the cause of the computers' defeats were bugs. Then I was accused that winning in time is not winning, then that with increase I could not win, and so on the carrot and stick methodology has been developed against me. Still, I didn't care. In this way there were no other paths for me than to be a street fighter, a hunter, and an anti-machine player. All the evidence did not seem to matter, and in the meantime some serious strategic errors were made by the elites of chess, science, and the media: Doors that should never have been closed were closed, such as the dismantling of Deep Blue, closing Alpha Zero to public use, and the like. But the world has changed just a little, and I am getting older, if not my soul. The doors must be open.
[pgn] Event "Casual bullet game"]
[Site "https://lichess.org/LpK8uPfg"]
[Date "2024.11.21"]
[White "LeelaQueenForKnight"]
[Black "Catecan"]
[Result "1/2-1/2"]
[UTCDate "2024.11.21"]
[UTCTime "20:47:02"]
[WhiteElo "2000"]
[BlackElo "2030"]
[WhiteTitle "BOT"]
[Variant "From Position"]
[TimeControl "60+0"]
[ECO "?"]
[Opening "?"]
[Termination "Normal"]
[FEN "r1bqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNB1KBNR w KQkq - 0 1"]
[SetUp "1"]
[Annotator "lichess.org"]

1. Nf3 d5 2. Nc3 e6 3. e4 c6 4. e5 f5 5. d4 a6 6. Ne2 Bd7 7. Nf4 Be7 8. h4 g6 9. Ng5 Bxg5 10. hxg5 Qxg5 11. Nxe6 Qxc1+ 12. Rxc1 Bxe6 13. f4 Ne7 14. a4 h5 15. Be2 Kf7 16. Kf2 b5 17. Ra1 Rhb8 18. a5 Rh8 19. b4 Rag8 20. Rh4 Rh7 21. Rah1 Rgh8 22. R1h3 Kg7 23. Rc3 Kf7 24. Ke1 Rh6 25. Kd2 R6h7 26. Rch3 Rh6 27. Rc3 R6h7 28. Ke1 Rh6 29. Kf2 R6h7 30. Rch3 Rh6 31. Rg3 R6h7 32. Rc3 Rh6 33. Ke1 R6h7 34. Rg3 Rh6 35. Rc3 R6h7 36. Kd2 Rh6 37. Rh2 R6h7 38. Rch3 Rh6 39. Rh1 R6h7 40. Rg3 Rh6 41. Rc3 R6h7 42. Rd3 Rh6 43. Rc3 R6h7 44. Rg3 Rh6 45. Rd3 R6h7 46. Rc3 Rh6 47. Rh2 R6h7 48. Rh4 Rh6 49. g3 R6h7 50. Ra3 Rh6 51. Ra1 R6h7 52. Rah1 Rh6 53. R4h2 Kg7 54. Rh4 R6h7 55. Ke3 Rh6 56. Kf2 R6h7 57. Ke3 Rh6 58. Kd2 R6h7 59. Bf3 Rh6 60. Ke2 R6h7 61. Ke3 Rh6 62. Kd2 R6h7 63. Ke3 Rh6 64. Ke2 R6h7 65. Kd2 Rh6 66. Ke1 R6h7 67. Ke2 Rh6 68. Kd2 R6h7 69. Kd3 Rh6 70. c3 R6h7 71. Ke2 Rh6 72. Kd2 R6h7 73. Ke2 Rh6 74. Kd3 R6h7 75. Kd2 Rh6 76. Ke3 R6h7 77. Be2 Rh6 78. Bf3 R6h7 79. Bd1 Rh6 80. Be2 R6h7 81. Bf3 Rh6 82. R4h2 R6h7 83. Be2 Rh6 84. Bf3 R6h7 85. Kf2 Rh6 86. Be2 R6h7 87. Bd1 Rh6 88. Bf3 R6h7 89. Be2 Rh6 90. Bd1 R6h7 91. Bf3 Rh6 92. Be2 R6h7 93. Bd1 Rh6 94. Bf3 R6h7 95. Rh4 Rh6 96. Be2 R6h7 97. Bd1 Rh6 98. Bf3 R6h7 99. Be2 Rh6 100. Rf1 R6h7 101. Rfh1 Rh6 102. Re1 R6h7 103. Rf1 Rh6 104. Rfh1 R6h7 105. Bd1 Rh6 106. Re1 R6h7 107. Rf1 Rh6 108. Kf3 R6h7 109. Kf2 Rh6 110. Ke1 R6h7 111. Bf3 Rh6 112. Be2 R6h7 113. Bd1 Rh6 114. Kd2 R6h7 115. Ke1 Rh6 116. Ke2 R6h7 117. Kd2 Rh6 118. Be2 R6h7 119. Bd1 Rh6 120. Be2 { The game is a draw. } 1/2-1/2[/pgn]
Note how in the game you post Leela consistently refuses to make a draw by repetition, despite being a piece down, until the fifty move rule finally terminates the game. This is because Leela QueenForKnight uses the Queen odds net, which assumes you are a queen odds level player, so it "thinks" that it is winning when the game starts with it "only" down queen for knight! So I can say that any draws earned against LeelaQueenForKnight are fully deserved, which is not yet the case for the other three bots.

1)The game was 1+0 game and Leela could win on time by pushing a pawn to avoid a draw by the fifty move rule.
2)I believe draw is a draw and fully deserved in any case. The fact that the engine can be improved to win does not mean that the human does not deserve a draw.
I believe that in any case playing for a draw is easier than playing for a win so significantly weaker players who play for a draw can achieve bigger percentage against the bot.
Uri. Allow me to give an opinion. In the game of man versus machine chess, strategic skill in human thinking and the human ability to place your mind beyond the blind spot, horizon effect, is essential. The categorization paradigms of human player, strong player and non-strong player, may become irrelevant. Here it is about achieving the qualification and expertise of being a strong or not strong anti-computer player. And of course a great human player can simultaneously be a strong anti-machine player. But still, a human player does not necessarily have to be a strong human player, to be as strong or stronger than the strongest humans, playing alone.
anti-machine chess They are the depths of the human mind, thinking from the heart, the force that can overwhelm cybernetic thinking. When I refer to street fighters, that is exactly what I am referring to.
I am thinking chess is in a coin.Human beings for ever playing in one face.Now I am playing in the other face:"Antichess". Computers are as a fortres where owner forgot to close a little door behind. You must enter across this door.Forget the front.
lkaufman
Posts: 6108
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Amazing results of LeelaKnightOdds-Dev

Post by lkaufman »

Uri Blass wrote: Thu Nov 21, 2024 11:44 pm
lkaufman wrote: Thu Nov 21, 2024 10:39 pm
Father wrote: Thu Nov 21, 2024 9:53 pm
Father wrote: Thu Nov 21, 2024 8:40 pm
Uri Blass wrote: Thu Nov 21, 2024 7:50 pm
lkaufman wrote: Thu Nov 21, 2024 7:08 pm
Marcus91 wrote: Thu Nov 21, 2024 5:12 pm
Uri Blass wrote: Thu Nov 21, 2024 4:21 pm
Father wrote: Thu Nov 21, 2024 4:01 pm
Marcus91 wrote: Thu Nov 21, 2024 3:00 pm
Brunetti wrote: Thu Nov 21, 2024 11:05 am
Marcus91 wrote: Thu Nov 21, 2024 10:40 am maybe I should add an arbitrary rule, like halve 'K' for all draws?
Doing this is like saying that a win is worth 4 draws, which means you're changing the rules of chess and, consequently, the consistency of the Elo formula. I'm not saying this because I'm currently drawing games, but because that's just how it is :)

Alex
The Elo formula remains consistent if you consider draws as half the weight. Four draws are not equal to one win, but rather one win and one loss. I'm not making any difference between players. If you draw against a bot rated higher than you, you will continue to rise, just slowly. By giving half the weight to draws the model assumes a better fit, this means that those who drew a lot were overestimated and those who drew little were under estimated, given that lichess blitz rating is the reference here
Thank you Marcus91 for your time and your message. You know that I am not a mathematician, so I would like you to explain it well and clear up my doubts. Let's take an example context: I find myself in a man versus machine challenge in which myself and the hundred best ranked human and machine players in the world participate, all fighting against the machines in the Harvard Cup style, of a group of coffee and coffee players. not federated, some of us are self-taught and others with non-school education, obtained outside of schools, others excellent representatives of schools. We all know in the competition that it is a race for knowledge, to obtain the
higher score. If in such a hypothetical 20-game competition I obtain 20 draws and Carlsen obtains 9 victories and one draw, does it mean that I would be the champion of the hypothetical tournament and the one with the highest rating given my best performance? If your answer is yes, then my duty is to continue playing, otherwise I don't see any point in doing so. Thank you in advance Marcus91. I remind everyone: "Any protectorate that closes the path to runners for the sole reason of competing without shoes in the marathon is an absurdity."

I understand that basically 50% will give you equal rating to the machine if you play enough games.
20 games are not enough to get a stable rating but I think that 200 games are enough.

The main problem is when you get less than 50%
If you get a draw and a loss in every 2 games that is 25% your rating is going to be smaller than a player who get 20% by a win and 4 losses every 5 games assuming a lot of games by both sides.

I think that it is not fair even if I get a better place in the table relative to humans by the new rule.
It's an arbitrary rule, like in soccer where a win earns three points and a draw earns one. The rationale behind this system is as follows:

1. Discourages excessively drawish playstyles

2. Promotes balanced time controls – This system pushes players toward time controls where they perform closer to their true skill level, discouraging the trend of favoring ultra-fast formats simply because they make it easier to grind out draws and rating points.

3. Improves rating accuracy – By adjusting the rating system to reflect more dynamic play, it aligns better with the Lichess blitz rating, ensuring a more accurate representation of a player's actual strength.

It's a thoughtful adjustment designed to balance the competitive landscape and encourage more engaging play, even if it might feel arbitrary
Although this is not a rule that I advocated, it does have a certain rationale as a sort of compromise between two extreme positions. The problem here is that the engine doesn't consider the human's rating, whereas the human knows the engine's rating at any given time control. You can say that the engine could use the opponent's LiChess rating, but that is problematic because many players are unrated (nominally "1500"), at least at a specific time control. Many strong players have no Rapid rating there. So the human can aim for a draw based on relative ratings, whereas the bot assumes that the ratings are equal, having no other information. The normal way of rating works fine if both players know the ratings of both players. But it is also possible to have a rating system that simply tosses out all draws, which is called WILO (for win/loss, a pun on ELO), which is especially appropriate when one player doesn't know the opponent's rating. This is consistent with running tournaments in which draws are simply replayed rather than scored as 1/2, as was common around the year 1900 and as is done today in Norway chess. Cutting the K factor in half is a compromise between ELO and WILO, so quite easy to justify for rating engine games, although difficult to "sell" to non-mathematicians. It doesn't punish players who score 50% with lots of draws vs few draws, but it does reward players who score say 10% by wins rather than draws. Perhaps it wouldn't be needed if in the future the bot is updated in a way that makes it less inclined to take draws when still down a queen.
I understand the reason for cutting the K factor in half for draws but I think that it is unfair.
I think the new rule also reward players who score more than 50% without losing relative to players who lose with the same percentage.

I think that the engine can use the history against the specific player instead of the rating of the player to decide about contempt.
I think it might be useful to narrate something from my own experience. In approximately 2003, who at that time was the fifth best human player in the world of chess, in a 20-match blitz match against an elite computer, the human obtained 4 points; I played against that same computer in that same place and time, place and historical moment controls, and I got 12 points in my favor against 8 for the computer. To top it all off, if you beat the computers you were persecuted for "alleged foul play on your part." In this way, a lodge protecting the elos of computers and protecting the elos of elite human players was founded by force of habit. I suffered that injustice firsthand; Instead of giving up, I continued the fight, and then accusations came against me in which I was told that the cause of the computers' defeats were bugs. Then I was accused that winning in time is not winning, then that with increase I could not win, and so on the carrot and stick methodology has been developed against me. Still, I didn't care. In this way there were no other paths for me than to be a street fighter, a hunter, and an anti-machine player. All the evidence did not seem to matter, and in the meantime some serious strategic errors were made by the elites of chess, science, and the media: Doors that should never have been closed were closed, such as the dismantling of Deep Blue, closing Alpha Zero to public use, and the like. But the world has changed just a little, and I am getting older, if not my soul. The doors must be open.
[pgn] Event "Casual bullet game"]
[Site "https://lichess.org/LpK8uPfg"]
[Date "2024.11.21"]
[White "LeelaQueenForKnight"]
[Black "Catecan"]
[Result "1/2-1/2"]
[UTCDate "2024.11.21"]
[UTCTime "20:47:02"]
[WhiteElo "2000"]
[BlackElo "2030"]
[WhiteTitle "BOT"]
[Variant "From Position"]
[TimeControl "60+0"]
[ECO "?"]
[Opening "?"]
[Termination "Normal"]
[FEN "r1bqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNB1KBNR w KQkq - 0 1"]
[SetUp "1"]
[Annotator "lichess.org"]

1. Nf3 d5 2. Nc3 e6 3. e4 c6 4. e5 f5 5. d4 a6 6. Ne2 Bd7 7. Nf4 Be7 8. h4 g6 9. Ng5 Bxg5 10. hxg5 Qxg5 11. Nxe6 Qxc1+ 12. Rxc1 Bxe6 13. f4 Ne7 14. a4 h5 15. Be2 Kf7 16. Kf2 b5 17. Ra1 Rhb8 18. a5 Rh8 19. b4 Rag8 20. Rh4 Rh7 21. Rah1 Rgh8 22. R1h3 Kg7 23. Rc3 Kf7 24. Ke1 Rh6 25. Kd2 R6h7 26. Rch3 Rh6 27. Rc3 R6h7 28. Ke1 Rh6 29. Kf2 R6h7 30. Rch3 Rh6 31. Rg3 R6h7 32. Rc3 Rh6 33. Ke1 R6h7 34. Rg3 Rh6 35. Rc3 R6h7 36. Kd2 Rh6 37. Rh2 R6h7 38. Rch3 Rh6 39. Rh1 R6h7 40. Rg3 Rh6 41. Rc3 R6h7 42. Rd3 Rh6 43. Rc3 R6h7 44. Rg3 Rh6 45. Rd3 R6h7 46. Rc3 Rh6 47. Rh2 R6h7 48. Rh4 Rh6 49. g3 R6h7 50. Ra3 Rh6 51. Ra1 R6h7 52. Rah1 Rh6 53. R4h2 Kg7 54. Rh4 R6h7 55. Ke3 Rh6 56. Kf2 R6h7 57. Ke3 Rh6 58. Kd2 R6h7 59. Bf3 Rh6 60. Ke2 R6h7 61. Ke3 Rh6 62. Kd2 R6h7 63. Ke3 Rh6 64. Ke2 R6h7 65. Kd2 Rh6 66. Ke1 R6h7 67. Ke2 Rh6 68. Kd2 R6h7 69. Kd3 Rh6 70. c3 R6h7 71. Ke2 Rh6 72. Kd2 R6h7 73. Ke2 Rh6 74. Kd3 R6h7 75. Kd2 Rh6 76. Ke3 R6h7 77. Be2 Rh6 78. Bf3 R6h7 79. Bd1 Rh6 80. Be2 R6h7 81. Bf3 Rh6 82. R4h2 R6h7 83. Be2 Rh6 84. Bf3 R6h7 85. Kf2 Rh6 86. Be2 R6h7 87. Bd1 Rh6 88. Bf3 R6h7 89. Be2 Rh6 90. Bd1 R6h7 91. Bf3 Rh6 92. Be2 R6h7 93. Bd1 Rh6 94. Bf3 R6h7 95. Rh4 Rh6 96. Be2 R6h7 97. Bd1 Rh6 98. Bf3 R6h7 99. Be2 Rh6 100. Rf1 R6h7 101. Rfh1 Rh6 102. Re1 R6h7 103. Rf1 Rh6 104. Rfh1 R6h7 105. Bd1 Rh6 106. Re1 R6h7 107. Rf1 Rh6 108. Kf3 R6h7 109. Kf2 Rh6 110. Ke1 R6h7 111. Bf3 Rh6 112. Be2 R6h7 113. Bd1 Rh6 114. Kd2 R6h7 115. Ke1 Rh6 116. Ke2 R6h7 117. Kd2 Rh6 118. Be2 R6h7 119. Bd1 Rh6 120. Be2 { The game is a draw. } 1/2-1/2[/pgn]
Note how in the game you post Leela consistently refuses to make a draw by repetition, despite being a piece down, until the fifty move rule finally terminates the game. This is because Leela QueenForKnight uses the Queen odds net, which assumes you are a queen odds level player, so it "thinks" that it is winning when the game starts with it "only" down queen for knight! So I can say that any draws earned against LeelaQueenForKnight are fully deserved, which is not yet the case for the other three bots.

1)The game was 1+0 game and Leela could win on time by pushing a pawn to avoid a draw by the fifty move rule.
2)I believe draw is a draw and fully deserved in any case. The fact that the engine can be improved to win does not mean that the human does not deserve a draw.
I believe that in any case playing for a draw is easier than playing for a win so significantly weaker players who play for a draw can achieve bigger percentage against the bot.
Imagine you are playing Carlsen or Nakamura in the last round of a tournament, and he only needs a draw to win the tournament, and tells you before the game that he'll take a repetition draw any time in the first 20 moves as long as you haven't blundered first. So you make a Berlin rep draw. Did you "deserve" a draw? Maybe technically, but not in any real sense. It's sort of like this with these bots now, other than the queen for knight bot which doesn't behave this way. That's the distinction I am making.
Komodo rules!
Uri Blass
Posts: 10632
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Amazing results of LeelaKnightOdds-Dev

Post by Uri Blass »

lkaufman wrote: Fri Nov 22, 2024 12:42 am
Uri Blass wrote: Thu Nov 21, 2024 11:44 pm
lkaufman wrote: Thu Nov 21, 2024 10:39 pm
Father wrote: Thu Nov 21, 2024 9:53 pm
Father wrote: Thu Nov 21, 2024 8:40 pm
Uri Blass wrote: Thu Nov 21, 2024 7:50 pm
lkaufman wrote: Thu Nov 21, 2024 7:08 pm
Marcus91 wrote: Thu Nov 21, 2024 5:12 pm
Uri Blass wrote: Thu Nov 21, 2024 4:21 pm
Father wrote: Thu Nov 21, 2024 4:01 pm
Marcus91 wrote: Thu Nov 21, 2024 3:00 pm
Brunetti wrote: Thu Nov 21, 2024 11:05 am
Marcus91 wrote: Thu Nov 21, 2024 10:40 am maybe I should add an arbitrary rule, like halve 'K' for all draws?
Doing this is like saying that a win is worth 4 draws, which means you're changing the rules of chess and, consequently, the consistency of the Elo formula. I'm not saying this because I'm currently drawing games, but because that's just how it is :)

Alex
The Elo formula remains consistent if you consider draws as half the weight. Four draws are not equal to one win, but rather one win and one loss. I'm not making any difference between players. If you draw against a bot rated higher than you, you will continue to rise, just slowly. By giving half the weight to draws the model assumes a better fit, this means that those who drew a lot were overestimated and those who drew little were under estimated, given that lichess blitz rating is the reference here
Thank you Marcus91 for your time and your message. You know that I am not a mathematician, so I would like you to explain it well and clear up my doubts. Let's take an example context: I find myself in a man versus machine challenge in which myself and the hundred best ranked human and machine players in the world participate, all fighting against the machines in the Harvard Cup style, of a group of coffee and coffee players. not federated, some of us are self-taught and others with non-school education, obtained outside of schools, others excellent representatives of schools. We all know in the competition that it is a race for knowledge, to obtain the
higher score. If in such a hypothetical 20-game competition I obtain 20 draws and Carlsen obtains 9 victories and one draw, does it mean that I would be the champion of the hypothetical tournament and the one with the highest rating given my best performance? If your answer is yes, then my duty is to continue playing, otherwise I don't see any point in doing so. Thank you in advance Marcus91. I remind everyone: "Any protectorate that closes the path to runners for the sole reason of competing without shoes in the marathon is an absurdity."

I understand that basically 50% will give you equal rating to the machine if you play enough games.
20 games are not enough to get a stable rating but I think that 200 games are enough.

The main problem is when you get less than 50%
If you get a draw and a loss in every 2 games that is 25% your rating is going to be smaller than a player who get 20% by a win and 4 losses every 5 games assuming a lot of games by both sides.

I think that it is not fair even if I get a better place in the table relative to humans by the new rule.
It's an arbitrary rule, like in soccer where a win earns three points and a draw earns one. The rationale behind this system is as follows:

1. Discourages excessively drawish playstyles

2. Promotes balanced time controls – This system pushes players toward time controls where they perform closer to their true skill level, discouraging the trend of favoring ultra-fast formats simply because they make it easier to grind out draws and rating points.

3. Improves rating accuracy – By adjusting the rating system to reflect more dynamic play, it aligns better with the Lichess blitz rating, ensuring a more accurate representation of a player's actual strength.

It's a thoughtful adjustment designed to balance the competitive landscape and encourage more engaging play, even if it might feel arbitrary
Although this is not a rule that I advocated, it does have a certain rationale as a sort of compromise between two extreme positions. The problem here is that the engine doesn't consider the human's rating, whereas the human knows the engine's rating at any given time control. You can say that the engine could use the opponent's LiChess rating, but that is problematic because many players are unrated (nominally "1500"), at least at a specific time control. Many strong players have no Rapid rating there. So the human can aim for a draw based on relative ratings, whereas the bot assumes that the ratings are equal, having no other information. The normal way of rating works fine if both players know the ratings of both players. But it is also possible to have a rating system that simply tosses out all draws, which is called WILO (for win/loss, a pun on ELO), which is especially appropriate when one player doesn't know the opponent's rating. This is consistent with running tournaments in which draws are simply replayed rather than scored as 1/2, as was common around the year 1900 and as is done today in Norway chess. Cutting the K factor in half is a compromise between ELO and WILO, so quite easy to justify for rating engine games, although difficult to "sell" to non-mathematicians. It doesn't punish players who score 50% with lots of draws vs few draws, but it does reward players who score say 10% by wins rather than draws. Perhaps it wouldn't be needed if in the future the bot is updated in a way that makes it less inclined to take draws when still down a queen.
I understand the reason for cutting the K factor in half for draws but I think that it is unfair.
I think the new rule also reward players who score more than 50% without losing relative to players who lose with the same percentage.

I think that the engine can use the history against the specific player instead of the rating of the player to decide about contempt.
I think it might be useful to narrate something from my own experience. In approximately 2003, who at that time was the fifth best human player in the world of chess, in a 20-match blitz match against an elite computer, the human obtained 4 points; I played against that same computer in that same place and time, place and historical moment controls, and I got 12 points in my favor against 8 for the computer. To top it all off, if you beat the computers you were persecuted for "alleged foul play on your part." In this way, a lodge protecting the elos of computers and protecting the elos of elite human players was founded by force of habit. I suffered that injustice firsthand; Instead of giving up, I continued the fight, and then accusations came against me in which I was told that the cause of the computers' defeats were bugs. Then I was accused that winning in time is not winning, then that with increase I could not win, and so on the carrot and stick methodology has been developed against me. Still, I didn't care. In this way there were no other paths for me than to be a street fighter, a hunter, and an anti-machine player. All the evidence did not seem to matter, and in the meantime some serious strategic errors were made by the elites of chess, science, and the media: Doors that should never have been closed were closed, such as the dismantling of Deep Blue, closing Alpha Zero to public use, and the like. But the world has changed just a little, and I am getting older, if not my soul. The doors must be open.
[pgn] Event "Casual bullet game"]
[Site "https://lichess.org/LpK8uPfg"]
[Date "2024.11.21"]
[White "LeelaQueenForKnight"]
[Black "Catecan"]
[Result "1/2-1/2"]
[UTCDate "2024.11.21"]
[UTCTime "20:47:02"]
[WhiteElo "2000"]
[BlackElo "2030"]
[WhiteTitle "BOT"]
[Variant "From Position"]
[TimeControl "60+0"]
[ECO "?"]
[Opening "?"]
[Termination "Normal"]
[FEN "r1bqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNB1KBNR w KQkq - 0 1"]
[SetUp "1"]
[Annotator "lichess.org"]

1. Nf3 d5 2. Nc3 e6 3. e4 c6 4. e5 f5 5. d4 a6 6. Ne2 Bd7 7. Nf4 Be7 8. h4 g6 9. Ng5 Bxg5 10. hxg5 Qxg5 11. Nxe6 Qxc1+ 12. Rxc1 Bxe6 13. f4 Ne7 14. a4 h5 15. Be2 Kf7 16. Kf2 b5 17. Ra1 Rhb8 18. a5 Rh8 19. b4 Rag8 20. Rh4 Rh7 21. Rah1 Rgh8 22. R1h3 Kg7 23. Rc3 Kf7 24. Ke1 Rh6 25. Kd2 R6h7 26. Rch3 Rh6 27. Rc3 R6h7 28. Ke1 Rh6 29. Kf2 R6h7 30. Rch3 Rh6 31. Rg3 R6h7 32. Rc3 Rh6 33. Ke1 R6h7 34. Rg3 Rh6 35. Rc3 R6h7 36. Kd2 Rh6 37. Rh2 R6h7 38. Rch3 Rh6 39. Rh1 R6h7 40. Rg3 Rh6 41. Rc3 R6h7 42. Rd3 Rh6 43. Rc3 R6h7 44. Rg3 Rh6 45. Rd3 R6h7 46. Rc3 Rh6 47. Rh2 R6h7 48. Rh4 Rh6 49. g3 R6h7 50. Ra3 Rh6 51. Ra1 R6h7 52. Rah1 Rh6 53. R4h2 Kg7 54. Rh4 R6h7 55. Ke3 Rh6 56. Kf2 R6h7 57. Ke3 Rh6 58. Kd2 R6h7 59. Bf3 Rh6 60. Ke2 R6h7 61. Ke3 Rh6 62. Kd2 R6h7 63. Ke3 Rh6 64. Ke2 R6h7 65. Kd2 Rh6 66. Ke1 R6h7 67. Ke2 Rh6 68. Kd2 R6h7 69. Kd3 Rh6 70. c3 R6h7 71. Ke2 Rh6 72. Kd2 R6h7 73. Ke2 Rh6 74. Kd3 R6h7 75. Kd2 Rh6 76. Ke3 R6h7 77. Be2 Rh6 78. Bf3 R6h7 79. Bd1 Rh6 80. Be2 R6h7 81. Bf3 Rh6 82. R4h2 R6h7 83. Be2 Rh6 84. Bf3 R6h7 85. Kf2 Rh6 86. Be2 R6h7 87. Bd1 Rh6 88. Bf3 R6h7 89. Be2 Rh6 90. Bd1 R6h7 91. Bf3 Rh6 92. Be2 R6h7 93. Bd1 Rh6 94. Bf3 R6h7 95. Rh4 Rh6 96. Be2 R6h7 97. Bd1 Rh6 98. Bf3 R6h7 99. Be2 Rh6 100. Rf1 R6h7 101. Rfh1 Rh6 102. Re1 R6h7 103. Rf1 Rh6 104. Rfh1 R6h7 105. Bd1 Rh6 106. Re1 R6h7 107. Rf1 Rh6 108. Kf3 R6h7 109. Kf2 Rh6 110. Ke1 R6h7 111. Bf3 Rh6 112. Be2 R6h7 113. Bd1 Rh6 114. Kd2 R6h7 115. Ke1 Rh6 116. Ke2 R6h7 117. Kd2 Rh6 118. Be2 R6h7 119. Bd1 Rh6 120. Be2 { The game is a draw. } 1/2-1/2[/pgn]
Note how in the game you post Leela consistently refuses to make a draw by repetition, despite being a piece down, until the fifty move rule finally terminates the game. This is because Leela QueenForKnight uses the Queen odds net, which assumes you are a queen odds level player, so it "thinks" that it is winning when the game starts with it "only" down queen for knight! So I can say that any draws earned against LeelaQueenForKnight are fully deserved, which is not yet the case for the other three bots.

1)The game was 1+0 game and Leela could win on time by pushing a pawn to avoid a draw by the fifty move rule.
2)I believe draw is a draw and fully deserved in any case. The fact that the engine can be improved to win does not mean that the human does not deserve a draw.
I believe that in any case playing for a draw is easier than playing for a win so significantly weaker players who play for a draw can achieve bigger percentage against the bot.
Imagine you are playing Carlsen or Nakamura in the last round of a tournament, and he only needs a draw to win the tournament, and tells you before the game that he'll take a repetition draw any time in the first 20 moves as long as you haven't blundered first. So you make a Berlin rep draw. Did you "deserve" a draw? Maybe technically, but not in any real sense. It's sort of like this with these bots now, other than the queen for knight bot which doesn't behave this way. That's the distinction I am making.
I understand your point but weaknesses of the opponent and knowing to take advantage of them is part of the game.
People try to get higher rating by taking advantage of the weakness of the opponent and I think that it is unfair to tell them that is not right to do it and change the rating rules retroactively(in case they knew they could maybe decide to play in a different way).
User avatar
Brunetti
Posts: 373
Joined: Tue Dec 08, 2009 1:37 pm
Location: Milan, Italy
Full name: Alex Brunetti

Re: Amazing results of LeelaKnightOdds-Dev

Post by Brunetti »

lkaufman wrote: Fri Nov 22, 2024 12:42 am Imagine you are playing Carlsen or Nakamura in the last round of a tournament, and he only needs a draw to win the tournament, and tells you before the game that he'll take a repetition draw any time in the first 20 moves as long as you haven't blundered first. So you make a Berlin rep draw. Did you "deserve" a draw? Maybe technically, but not in any real sense.
Technically both players deserve a zero, since the result is agreed upon before the game :)

Alex
lkaufman
Posts: 6108
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Amazing results of LeelaKnightOdds-Dev

Post by lkaufman »

Brunetti wrote: Fri Nov 22, 2024 1:55 am
lkaufman wrote: Fri Nov 22, 2024 12:42 am Imagine you are playing Carlsen or Nakamura in the last round of a tournament, and he only needs a draw to win the tournament, and tells you before the game that he'll take a repetition draw any time in the first 20 moves as long as you haven't blundered first. So you make a Berlin rep draw. Did you "deserve" a draw? Maybe technically, but not in any real sense.
Technically both players deserve a zero, since the result is agreed upon before the game :)

Alex
More technically, only the superstar player could be faulted in this hypothetical scenario, because he stated his intention to play in a specific way (repeating if opponent repeated), while the weaker player (you) said nothing, just acted on the info. It's the same as if Nakamura said before the game that he planned to play the Berlin and allow the draw if you chose the Spanish, and you decided to trust him and did so. No one would fault you, and I'm not sure if Nakamura did anything against regulations either; is there a rule against announcing your planned opening move or moves before a game? Only if both sides do it is it a pre-arranged game. Perhaps some would say it is unethical as it means you are not trying your best, but that is debatable.
Komodo rules!