Your network blocks the Lichess assets!

lichess.org
Donate

World Champion Tiebreaks: A Counter-Intuitive Proposal.

Why should the player who plays most like the engine used get the win?
Another engine might give the win to the other player.
Even the same engine at another time per move may give the win to another player.

In ICCF there is now a tournament ongoing where in case of a draw material is counted as Q=9, R=5, B=N=3, P=1.

Why should the player who plays most like the engine used get the win? Another engine might give the win to the other player. Even the same engine at another time per move may give the win to another player. In ICCF there is now a tournament ongoing where in case of a draw material is counted as Q=9, R=5, B=N=3, P=1.

@tpr said ^

Why should the player who plays most like the engine used get the win?
Another engine might give the win to the other player.
Even the same engine at another time per move may give the win to another player.

That's a good point that you raise. Engines evals should be consistent. The strongest and most consistent engine should be used in world class events (Stockfish). The depth and time should be optimized. For the concern about fluctuations of TPLV being random and arbitrary/inconsistent engine evals, the authors suggested a threshold where TPLV that are within 1% are treated as equal, as a example. This percentage could be changed as well to have a higher distinguishability threshold (e.g. 5%). This would reduce variability in engine evals. Determining the right threshold will need to be investigated empirically.

I edited the blog to address this.

@tpr said [^](/forum/redirect/post/v14371tC) > Why should the player who plays most like the engine used get the win? > Another engine might give the win to the other player. > Even the same engine at another time per move may give the win to another player. That's a good point that you raise. Engines evals should be consistent. The strongest and most consistent engine should be used in world class events (Stockfish). The depth and time should be optimized. For the concern about fluctuations of TPLV being random and arbitrary/inconsistent engine evals, the authors suggested a threshold where TPLV that are within 1% are treated as equal, as a example. This percentage could be changed as well to have a higher distinguishability threshold (e.g. 5%). This would reduce variability in engine evals. Determining the right threshold will need to be investigated empirically. I edited the blog to address this.

This is a nice suggestion and all, but there is one flaw. Let’s say that we are in 2021. Then, Stockfish only had till level 12. Now it is 18. It is possible that in a very tight match, by Stockfish 18, it says that Player 1 won. But maybe, a month later, a new level of Stockfish comes out and says that Player 2 had the win. That way there would be discrepancy. Please address this issue as well.

This is a nice suggestion and all, but there is one flaw. Let’s say that we are in 2021. Then, Stockfish only had till level 12. Now it is 18. It is possible that in a very tight match, by Stockfish 18, it says that Player 1 won. But maybe, a month later, a new level of Stockfish comes out and says that Player 2 had the win. That way there would be discrepancy. Please address this issue as well.

@PRIYAMVAD said ^

This is a nice suggestion and all, but there is one flaw. Let’s say that we are in 2021. Then, Stockfish only had till level 12. Now it is 18. It is possible that in a very tight match, by Stockfish 18, it says that Player 1 won. But maybe, a month later, a new level of Stockfish comes out and says that Player 2 had the win. That way there would be discrepancy. Please address this issue as well.

That's a good point that you raise. We will have to compare evals within the latest Stockfish and between previous Stockfish versions to see the margin of consistency. To see by how much the total pawn loss value changes between the same version and previous versions. If the margin of error is lower than the percentage differences in the World Championships then we can proceed. Another possibility is to only count games that have a player % difference that is greater than the margin of error as determined by the comparison analysis.

For the WCH's in this study, the lowest % difference of player TPLV is Carlsen-Caruana at 1.1% followed by Karjakin-Carlsen at 3.2%. Even if the margin is lower than 1%, what if the match is very tight? In this case the TPLV difference may be more random, but this would still be more fair than a Rapid and Blitz playoff, where the weaker Rapid/Blitz player would have a greater chance of losing in spite of their tied Classical performance.

I think that the above scenario will be quite unlikely anyway as during a course of a match/tournament, players and the public know their TPLV (Total Pawn Loss Values). A player trailing in the TPLV will play for a win. This reduces draws as players can't rely on going into a fast play tiebreak. A player leading can't offer a draw in better positions or they lose TPLV. The changing values of the TPLV throughout matches or tournaments adds drama and tension. Ultimately the system works in service of providing an incentive for wins in Classical games.

This tiebreak mechanism is a means to an end. We want to see a win by most Classical wins as opposed to a tiebreak win.

Overall the benefits of using TPLV is that it incentivizes a win in the Classical match, and that a tie will be based on Classical Chess as opposed to Rapid and Blitz playoffs. We will have to determine the consistency of engines to see a margin of error. If it is small then TPLV scores will be suitable to determine the winner. The possibility of the TPLV being within the margin of error is less likely. Players will be more incentivized to go for Classical wins in this format. If one person feels a advantage, they cannot agree to a draw as they will have a worse TPLV. If the TPLV is close, the players will know this and play for a win as they cannot guarantee that the TPLV is in their favor.

I edited the blog to address this.

@PRIYAMVAD said [^](/forum/redirect/post/O8FMH6ii) > This is a nice suggestion and all, but there is one flaw. Let’s say that we are in 2021. Then, Stockfish only had till level 12. Now it is 18. It is possible that in a very tight match, by Stockfish 18, it says that Player 1 won. But maybe, a month later, a new level of Stockfish comes out and says that Player 2 had the win. That way there would be discrepancy. Please address this issue as well. That's a good point that you raise. We will have to compare evals within the latest Stockfish and between previous Stockfish versions to see the margin of consistency. To see by how much the total pawn loss value changes between the same version and previous versions. If the margin of error is lower than the percentage differences in the World Championships then we can proceed. Another possibility is to only count games that have a player % difference that is greater than the margin of error as determined by the comparison analysis. For the WCH's in this study, the lowest % difference of player TPLV is Carlsen-Caruana at 1.1% followed by Karjakin-Carlsen at 3.2%. Even if the margin is lower than 1%, what if the match is very tight? In this case the TPLV difference may be more random, but this would still be more fair than a Rapid and Blitz playoff, where the weaker Rapid/Blitz player would have a greater chance of losing in spite of their tied Classical performance. I think that the above scenario will be quite unlikely anyway as during a course of a match/tournament, players and the public know their TPLV (Total Pawn Loss Values). A player trailing in the TPLV will play for a win. This reduces draws as players can't rely on going into a fast play tiebreak. A player leading can't offer a draw in better positions or they lose TPLV. The changing values of the TPLV throughout matches or tournaments adds drama and tension. Ultimately the system works in service of providing an incentive for wins in Classical games. This tiebreak mechanism is a means to an end. We want to see a win by most Classical wins as opposed to a tiebreak win. Overall the benefits of using TPLV is that it incentivizes a win in the Classical match, and that a tie will be based on Classical Chess as opposed to Rapid and Blitz playoffs. We will have to determine the consistency of engines to see a margin of error. If it is small then TPLV scores will be suitable to determine the winner. The possibility of the TPLV being within the margin of error is less likely. Players will be more incentivized to go for Classical wins in this format. If one person feels a advantage, they cannot agree to a draw as they will have a worse TPLV. If the TPLV is close, the players will know this and play for a win as they cannot guarantee that the TPLV is in their favor. I edited the blog to address this.

What happens if, for example:

  1. An engine sees mate in 7, but a player played a move that leads to mate in 12. How does that count towards TPVL?
  2. An engine sees mate in 7, but a player played a move that leads to +9.88 eval. How does that count towards TPVL?
What happens if, for example: 1) An engine sees mate in 7, but a player played a move that leads to mate in 12. How does that count towards TPVL? 2) An engine sees mate in 7, but a player played a move that leads to +9.88 eval. How does that count towards TPVL?

This makes no sense because it benefits players who choose calm, unchaotic positions where they are more likely to make engine moves, and it encourages a less aggressive, "double-edged" style of chess, since playing that way could hurt them in tiebreaks.

This makes no sense because it benefits players who choose calm, unchaotic positions where they are more likely to make engine moves, and it encourages a less aggressive, "double-edged" style of chess, since playing that way could hurt them in tiebreaks.

@Tactical-Attack said ^

This makes no sense because it benefits players who choose calm, unchaotic positions where they are more likely to make engine moves, and it encourages a less aggressive, "double-edged" style of chess, since playing that way could hurt them in tiebreaks.

That’s true, but as @RuyLopez1000 said, it is actually the Classical games that are actually deciding the title. The better the players play their games, the lesser the TPLV. It doesn’t matter whether we do the blitz/rapid games at all. We are just trying to give an alternative to save time.

@Tactical-Attack said [^](/forum/redirect/post/Uxh6lvLd) > This makes no sense because it benefits players who choose calm, unchaotic positions where they are more likely to make engine moves, and it encourages a less aggressive, "double-edged" style of chess, since playing that way could hurt them in tiebreaks. That’s true, but as @RuyLopez1000 said, it is actually the Classical games that are actually deciding the title. The better the players play their games, the lesser the TPLV. It doesn’t matter whether we do the blitz/rapid games at all. We are just trying to give an alternative to save time.

@Akavall said ^

What happens if, for example:

  1. An engine sees mate in 7, but a player played a move that leads to mate in 12. How does that count towards TPVL?
  2. An engine sees mate in 7, but a player played a move that leads to +9.88 eval. How does that count towards TPVL?

That's a good point that you raise. The authors didn't mention that possibility. A substack post said:

"I cannot vouch for every engine available, but the general approach is, the mate is evaluated to something close to 15-bit integer, 32765 centipawns, and you get minus 1000 centipawns for every move-to-mate (to make it reasonable for the engine to sacrifice the queen (~900 cp) to delay the mate)."

https://chess.stackexchange.com/questions/22087/how-is-average-centipawn-loss-calculated-when-a-mate-is-missed

But this makes me realize that centipawn loss is not ideal since a -2 centipawn loss is treated the same when going from +10 to +8 and +2 to 0. In both cases the centipawn loss is treated the same even though in the first case the player is still winning, while in the second case it goes from winning to equal.

This means that win percentage loss should be calculated instead. Centipawn advantage an be converted into a win percentage by a formula. Lichess uses Win% = 50 + 50 * (2 / (1 + exp(-0.00368208 * centipawns)) - 1). This was based on 75k positions in 2300+ rapid games on Lichess. https://github.com/lichess-org/lila/pull/11148

A model can be created for OTB Classical game for GMs to create a more accurate Win Percentage graph.

A possibility is to exclude opening book moves to avoid discriminating against openings which are evaluated lower such as the Pirc or King's Indian.

As an example using the Lichess model, in the scenario above going from +10 to +8, converts to going from a 97.5% chance of winning to 95.0% which is a 2.5% difference. (A Classical OTB model with GMs would probably reduce the difference (2.5%) between a +10 to +8 eval change.)

And +2 to 0, converts from 67.6% to 50%, leading to a 17.6% percentage loss.

Under the pawn loss metric, these two instances would be equivalent.

So from here on, I will recommend the Total Win Percentage Loss (TWPL) as a more suitable metric than Total Pawn Loss Value.

The Total Win Percentage Loss (TWPL) sums up the win percentage loss across all the moves in the match, like the Total Pawn Loss Value did.

I edited the blog to address this.

@Akavall said [^](/forum/redirect/post/Bd3EQjDF) > What happens if, for example: > > 1) An engine sees mate in 7, but a player played a move that leads to mate in 12. How does that count towards TPVL? > 2) An engine sees mate in 7, but a player played a move that leads to +9.88 eval. How does that count towards TPVL? That's a good point that you raise. The authors didn't mention that possibility. A substack post said: >"I cannot vouch for every engine available, but the general approach is, the mate is evaluated to something close to 15-bit integer, 32765 centipawns, and you get minus 1000 centipawns for every move-to-mate (to make it reasonable for the engine to sacrifice the queen (~900 cp) to delay the mate)." https://chess.stackexchange.com/questions/22087/how-is-average-centipawn-loss-calculated-when-a-mate-is-missed But this makes me realize that centipawn loss is not ideal since a -2 centipawn loss is treated the same when going from +10 to +8 and +2 to 0. In both cases the centipawn loss is treated the same even though in the first case the player is still winning, while in the second case it goes from winning to equal. This means that win percentage loss should be calculated instead. Centipawn advantage an be converted into a win percentage by a formula. Lichess uses Win% = 50 + 50 * (2 / (1 + exp(-0.00368208 * centipawns)) - 1). This was based on 75k positions in 2300+ rapid games on Lichess. https://github.com/lichess-org/lila/pull/11148 A model can be created for OTB Classical game for GMs to create a more accurate Win Percentage graph. A possibility is to exclude opening book moves to avoid discriminating against openings which are evaluated lower such as the Pirc or King's Indian. As an example using the Lichess model, in the scenario above going from +10 to +8, converts to going from a 97.5% chance of winning to 95.0% which is a 2.5% difference. (A Classical OTB model with GMs would probably reduce the difference (2.5%) between a +10 to +8 eval change.) And +2 to 0, converts from 67.6% to 50%, leading to a 17.6% percentage loss. Under the pawn loss metric, these two instances would be equivalent. So from here on, I will recommend the Total Win Percentage Loss (TWPL) as a more suitable metric than Total Pawn Loss Value. The Total Win Percentage Loss (TWPL) sums up the win percentage loss across all the moves in the match, like the Total Pawn Loss Value did. I edited the blog to address this.

But even in a classical time control game, you can't give more credit to a solid game by Anish Giri, who doesn't take risks and waits for his opponent to kill himself, than to a game by Alireza Firoujza or Richard Rapport, who complicate every game in pursuit of a full point from the start. Even in a classical time control game, mistakes will happen, and that's part of chess; it doesn't diminish the merit of the fight. Furthermore, many moves that a human wouldn't consider mistakes, even if the computer says they are, since we're playing against real people, not stockfish. Besides, you prevent them from making "human" poisonous preparations.

But even in a classical time control game, you can't give more credit to a solid game by Anish Giri, who doesn't take risks and waits for his opponent to kill himself, than to a game by Alireza Firoujza or Richard Rapport, who complicate every game in pursuit of a full point from the start. Even in a classical time control game, mistakes will happen, and that's part of chess; it doesn't diminish the merit of the fight. Furthermore, many moves that a human wouldn't consider mistakes, even if the computer says they are, since we're playing against real people, not stockfish. Besides, you prevent them from making "human" poisonous preparations.