World Champion Tiebreaks: A Counter-Intuitive Proposal.

Computer evaluation-based tie breaker is wrong at its core: objectively speaking, at every given turn a position is won/drawn/lost with mutual flawless play, computer's evaluation isn't like this, it gives rather arbitrary numbers, so all these metrics aren't objective.

Interesting point. I can see where you are thinking.

Evals do correlate with winning chance tho right? ,so it's not arbitrary?

I mean It's not random. People talk about evals to indicate their 'advantage'. People would not talk about Evals if they are arbitrary. Philosophically a position is either a win, draw or loss. But the eval isn't a random number.

If you can't force players play sudden death, you should find other alternatives. Maybe even something peculiar like clock usage: you can argue that in case of a draw the player who used less time played better.

Fascinating idea.

@GennadyBukin said [^](/forum/redirect/post/bwV1OV4q) > Computer evaluation-based tie breaker is wrong at its core: objectively speaking, at every given turn a position is won/drawn/lost with mutual flawless play, computer's evaluation isn't like this, it gives rather arbitrary numbers, so all these metrics aren't objective. Interesting point. I can see where you are thinking. Evals do correlate with winning chance tho right? ,so it's not arbitrary? I mean It's not random. People talk about evals to indicate their 'advantage'. People would not talk about Evals if they are arbitrary. Philosophically a position is either a win, draw or loss. But the eval isn't a random number. >If you can't force players play sudden death, you should find other alternatives. Maybe even something peculiar like clock usage: you can argue that in case of a draw the player who used less time played better. Fascinating idea.

RuyLopez1000

@mrgwbland said ^

Absolutely not this is ridiculous

Why tho?

I feel so strongly and am so vehemently against this idea that I have decided to write a full argument against it.
As both a competitive chess player and a programmer who has written a chess engine from scratch, I can say with certainty that using centipawn loss for World Championship tiebreaks is a terrible idea. It fundamentally misunderstands both the soul of chess and the current limitations of computer science, I'm going to break this down into major points.

I read your points. I can see that you did not bother to read what I wrote as otherwise you would have addressed what I had said.

So I had to paste what I wrote in the blog. In the hope that you will read it.

Please read my blog. People don't read and then they say things which don't make sense.

It Changes the Core Goal of Chess
Chess fundamentally has one goal, to checkmate the opponent. Introducing CP loss as a tiebreak criteria changes that entirely. It injects a secondary, artificial goal into a player’s head. Instead of playing the board and the opponent, players would be playing to please the engine.

Firstly I'm not using centipawn loss.

So from here on, I will recommend the Total Win Percentage Loss (TWPL) as a more suitable metric than Total Pawn Loss Value.
The Total Win Percentage Loss (TWPL) sums up the win percentage loss across all the moves in the match, like the Total Pawn Loss Value did.

It Punishes Creative, Psychological, and Human Play
Chess is a game played against a human opponent, where the objective is to win (or secure a draw as Black). In a World Championship, your opponent is of a similar, elite strength, but they are not infallible.
A player can often be rewarded (in terms of game result) by playing ambitiously and violently, intentionally entering sharp, practical complications that might not be strictly "engine sound" but are incredibly difficult for a human to play accurately in (think Tal). Under a CP loss tiebreak system, this brilliant, risky human play is actively punished.

I already mentioned this in the blog. Like why didn't you read it. Now I gotta paste the blog for you. Please address what I wrote.

There is the question of style. Tactical and complex players have lower inaccuracies. But this is fine for a two-player event as TWPL is a relative measure. We are only interested in whether one player is better than the other. As an example, Firouzja playing tactical chess doesn't hurt him. It only hurts him if his opponent manages to defend. The opponent also has to cope with the pressure. The opponent making errors will give them a worse TWPL than Firouzja. Only way they can have a better TWPL than Firouzja is if they defend his attack with an advantage. In which case they deserve to have a better TWPL. For this reason, TWPL may not be suitable for tournaments as solid players would be favored over tactical/complex players.

It Will Incentivise Incredibly Boring Chess
If the winner of the World Championship can be decided by who kept their engine score the cleanest, the playstyle favored will be risk-averse and dull! Imagine a match where both players draw every single game because they are too terrified to unbalance the position and risk lowering their accuracy metric. The title of World Champion would literally be awarded to whoever played the most boring chess.

Once again. I addressed these points in the blog but you didn't read what I wrote. Please read and address.

Losing games is worse than having a lower TWPL score. The tiebreak doesn't change the fact that the players want to win games. Having a lead in the classical section is better than leading the TWPL. If the player has a practical chance of defending then they will play on.

Another interesting question is whether players would change their playstyles to be more solid. But what would be the benefit of playing safer openings? It would just make the TWPL a coin toss. In a regular match the GMs don't play Petrov every game because they know they have to win. Besides, the player with a lower TWPL would still have to play for a win anyway. Now in the current WCH with Rapid and Blitz tiebreaks we already have instances of players changing their playstyle to be more solid. Carlsen agreed to a draw in a better position against Caruana because he didn't want to risk it in the 2018 match. Under the new tiebreak that wouldn't occur.

If a player wants to go for complicated play in a game, TWPL would not make a difference. As an example, under the current WCH setup they would take a risk. If they feel confident enough to play better than their opponent in that position, why would their confidence level change with the TWPL system? It doesn't matter how badly they play, just as long as they play better than their opponent. It's relative.

Engines Are Not Infallible
Even if we ignored the psychological ruin of the game, the technical premise is flawed because engines are not God. Yes, Stockfish and neural networks are incredibly strong, but they can still be mistaken. The first thing that comes to mind is Hikaru drawing engines using hippo-style openings, and we see it today with modern Stockfish still misevaluating completely locked-down positions where one side is up material but has zero breakthroughs.

Source please.

I very much doubt that Nakamura would draw the latest Stockfish with the Hippo. When did this happen, and what engine, what settings etc.

Furthermore, chess is not solved. If you run a position through Stockfish, Leela, or other top engines, you will get different evaluations and different "best" moves. These metrics are not objective truth; they are the "opinions" of machine learning heuristics, meaning that depending on what engine you choose a different World Champion could be decided!

I also addressed this. Please read my blog. It's irritating to see someone not bother to read.

One important concern is the reliability of engine evals. Different engines with different depths/time spent can give different evals. Even the same engine can give different evals with the same settings. Engines evals should be consistent to make accuracy determinations valid. The conversion from TPLV to TWPL means that reliability will be increased as evals on the higher end can be more variable.

We will have to compare evals within the latest Stockfish and between previous Stockfish versions to see the margin of error. To see by how much the Total Win Percentage Loss changes between the same version and previous versions. If the margin of error is low than we can proceed. The strongest and most consistent engine should be used in world class events (Stockfish), with optimal settings. The engine used, depth, search parameters and hardware should be disclosed publicly when measuring the TWPL and these settings should be kept consistent throughout the event. For the concern about fluctuations of TWPL being random and arbitrary/inconsistent engine evals, a threshold could be used where TWPL for games that are within a certain percentage are treated as equal, as a example. Determining the right threshold will need to be investigated empirically.

CP loss Breaks Down at the Highest Level
Finally, the maths behind centipawn loss calculation simply fails in high-level games. The horizon effect and search depths mean that a player can play what the best move according to the engine, only for the engine to shift its evaluation after the move is played and dock the player's "accuracy."

The World Championship should be decided by who can beat the human across the board. Engines are incredibly useful and amazing creations but this would be a severe misuse!!

@mrgwbland said [^](/forum/redirect/post/SYPWDMlY) > > > Absolutely not this is ridiculous > > > > Why tho? > > I feel so strongly and am so vehemently against this idea that I have decided to write a full argument against it. > As both a competitive chess player and a programmer who has written a chess engine from scratch, I can say with certainty that using centipawn loss for World Championship tiebreaks is a terrible idea. It fundamentally misunderstands both the soul of chess and the current limitations of computer science, I'm going to break this down into major points. I read your points. I can see that you did not bother to read what I wrote as otherwise you would have addressed what I had said. So I had to paste what I wrote in the blog. In the hope that you will read it. Please read my blog. People don't read and then they say things which don't make sense. > 1. It Changes the Core Goal of Chess > Chess fundamentally has one goal, to checkmate the opponent. Introducing CP loss as a tiebreak criteria changes that entirely. It injects a secondary, artificial goal into a player’s head. Instead of playing the board and the opponent, players would be playing to please the engine. Firstly I'm not using centipawn loss. *So from here on, I will recommend the Total Win Percentage Loss (TWPL) as a more suitable metric than Total Pawn Loss Value. The Total Win Percentage Loss (TWPL) sums up the win percentage loss across all the moves in the match, like the Total Pawn Loss Value did.* > 2. It Punishes Creative, Psychological, and Human Play > Chess is a game played against a human opponent, where the objective is to win (or secure a draw as Black). In a World Championship, your opponent is of a similar, elite strength, but they are not infallible. > A player can often be rewarded (in terms of game result) by playing ambitiously and violently, intentionally entering sharp, practical complications that might not be strictly "engine sound" but are incredibly difficult for a human to play accurately in (think Tal). Under a CP loss tiebreak system, this brilliant, risky human play is actively punished. I already mentioned this in the blog. Like why didn't you read it. Now I gotta paste the blog for you. Please address what I wrote. *There is the question of style. Tactical and complex players have lower inaccuracies. But this is fine for a two-player event as TWPL is a relative measure. We are only interested in whether one player is better than the other. As an example, Firouzja playing tactical chess doesn't hurt him. It only hurts him if his opponent manages to defend. The opponent also has to cope with the pressure. The opponent making errors will give them a worse TWPL than Firouzja. Only way they can have a better TWPL than Firouzja is if they defend his attack with an advantage. In which case they deserve to have a better TWPL. For this reason, TWPL may not be suitable for tournaments as solid players would be favored over tactical/complex players.* > 3. It Will Incentivise Incredibly Boring Chess > If the winner of the World Championship can be decided by who kept their engine score the cleanest, the playstyle favored will be risk-averse and dull! Imagine a match where both players draw every single game because they are too terrified to unbalance the position and risk lowering their accuracy metric. The title of World Champion would literally be awarded to whoever played the most boring chess. Once again. I addressed these points in the blog but you didn't read what I wrote. Please read and address. *There is the question of style. Tactical and complex players have lower inaccuracies. But this is fine for a two-player event as TWPL is a relative measure. We are only interested in whether one player is better than the other. As an example, Firouzja playing tactical chess doesn't hurt him. It only hurts him if his opponent manages to defend. The opponent also has to cope with the pressure. The opponent making errors will give them a worse TWPL than Firouzja. Only way they can have a better TWPL than Firouzja is if they defend his attack with an advantage. In which case they deserve to have a better TWPL. For this reason, TWPL may not be suitable for tournaments as solid players would be favored over tactical/complex players.* *Losing games is worse than having a lower TWPL score. The tiebreak doesn't change the fact that the players want to win games. Having a lead in the classical section is better than leading the TWPL. If the player has a practical chance of defending then they will play on.* *Another interesting question is whether players would change their playstyles to be more solid. But what would be the benefit of playing safer openings? It would just make the TWPL a coin toss. In a regular match the GMs don't play Petrov every game because they know they have to win. Besides, the player with a lower TWPL would still have to play for a win anyway. Now in the current WCH with Rapid and Blitz tiebreaks we already have instances of players changing their playstyle to be more solid. Carlsen agreed to a draw in a better position against Caruana because he didn't want to risk it in the 2018 match. Under the new tiebreak that wouldn't occur.* *If a player wants to go for complicated play in a game, TWPL would not make a difference. As an example, under the current WCH setup they would take a risk. If they feel confident enough to play better than their opponent in that position, why would their confidence level change with the TWPL system? It doesn't matter how badly they play, just as long as they play better than their opponent. It's relative.* > 4. Engines Are Not Infallible > Even if we ignored the psychological ruin of the game, the technical premise is flawed because engines are not God. Yes, Stockfish and neural networks are incredibly strong, but they can still be mistaken. The first thing that comes to mind is Hikaru drawing engines using hippo-style openings, and we see it today with modern Stockfish still misevaluating completely locked-down positions where one side is up material but has zero breakthroughs. Source please. I very much doubt that Nakamura would draw the latest Stockfish with the Hippo. When did this happen, and what engine, what settings etc. > Furthermore, chess is not solved. If you run a position through Stockfish, Leela, or other top engines, you will get different evaluations and different "best" moves. These metrics are not objective truth; they are the "opinions" of machine learning heuristics, meaning that depending on what engine you choose a different World Champion could be decided! I also addressed this. Please read my blog. It's irritating to see someone not bother to read. *One important concern is the reliability of engine evals. Different engines with different depths/time spent can give different evals. Even the same engine can give different evals with the same settings. Engines evals should be consistent to make accuracy determinations valid. The conversion from TPLV to TWPL means that reliability will be increased as evals on the higher end can be more variable.* *We will have to compare evals within the latest Stockfish and between previous Stockfish versions to see the margin of error. To see by how much the Total Win Percentage Loss changes between the same version and previous versions. If the margin of error is low than we can proceed. The strongest and most consistent engine should be used in world class events (Stockfish), with optimal settings. The engine used, depth, search parameters and hardware should be disclosed publicly when measuring the TWPL and these settings should be kept consistent throughout the event. For the concern about fluctuations of TWPL being random and arbitrary/inconsistent engine evals, a threshold could be used where TWPL for games that are within a certain percentage are treated as equal, as a example. Determining the right threshold will need to be investigated empirically.* > 5. CP loss Breaks Down at the Highest Level > Finally, the maths behind centipawn loss calculation simply fails in high-level games. The horizon effect and search depths mean that a player can play what the best move according to the engine, only for the engine to shift its evaluation after the move is played and dock the player's "accuracy." > > The World Championship should be decided by who can beat the human across the board. Engines are incredibly useful and amazing creations but this would be a severe misuse!! *One important concern is the reliability of engine evals. Different engines with different depths/time spent can give different evals. Even the same engine can give different evals with the same settings. Engines evals should be consistent to make accuracy determinations valid. The conversion from TPLV to TWPL means that reliability will be increased as evals on the higher end can be more variable.* *We will have to compare evals within the latest Stockfish and between previous Stockfish versions to see the margin of error. To see by how much the Total Win Percentage Loss changes between the same version and previous versions. If the margin of error is low than we can proceed. The strongest and most consistent engine should be used in world class events (Stockfish), with optimal settings. The engine used, depth, search parameters and hardware should be disclosed publicly when measuring the TWPL and these settings should be kept consistent throughout the event. For the concern about fluctuations of TWPL being random and arbitrary/inconsistent engine evals, a threshold could be used where TWPL for games that are within a certain percentage are treated as equal, as a example. Determining the right threshold will need to be investigated empirically.*

PRIYAMVAD

@alijeba said ^

Although this is an interesting idea, I think blitz and rapid are just as much a part of top-level competition as any other aspect of chess.

For example, I dislike playing against the Nimzo-Larsen, and my results against it are far from ideal. But I don’t make blog posts arguing that my losses against it should be disregarded simply because it’s a weak point in my game.

I hope the analogy makes sense. What I’m trying to say is that if two players are equally strong in classical chess, but one performs significantly worse in faster time controls, then that player is simply weaker overall chess player.

In the same way, I would be a weaker chess player than someone identical to me in every respect except for the Nimzo-Larsen, where they outperform me.

Ok, I understand, but think about it this way- you’re bad in the Nimzo-Larsen, and some guy just comes up Ina classical tournament, and plays the Nimzo-Larsen against you. Your TWPL would only be bad if you can’t survive the game or defend properly. But if you manage to defend, you aren’t going to lose anything. Your opponent will. When their attack fails, they lose massive amounts of TWPL.

@alijeba said [^](/forum/redirect/post/ms00wlm6) > Although this is an interesting idea, I think blitz and rapid are just as much a part of top-level competition as any other aspect of chess. > > For example, I dislike playing against the Nimzo-Larsen, and my results against it are far from ideal. But I don’t make blog posts arguing that my losses against it should be disregarded simply because it’s a weak point in my game. > > I hope the analogy makes sense. What I’m trying to say is that if two players are equally strong in classical chess, but one performs significantly worse in faster time controls, then that player is simply weaker overall chess player. > > In the same way, I would be a weaker chess player than someone identical to me in every respect except for the Nimzo-Larsen, where they outperform me. Ok, I understand, but think about it this way- you’re bad in the Nimzo-Larsen, and some guy just comes up Ina classical tournament, and plays the Nimzo-Larsen against you. Your TWPL would only be bad if you can’t survive the game or defend properly. But if you manage to defend, you aren’t going to lose anything. Your opponent will. When their attack fails, they lose massive amounts of TWPL.

alijeba

edited

@PRIYAMVAD said ^

Although this is an interesting idea, I think blitz and rapid are just as much a part of top-level competition as any other aspect of chess.

For example, I dislike playing against the Nimzo-Larsen, and my results against it are far from ideal. But I don’t make blog posts arguing that my losses against it should be disregarded simply because it’s a weak point in my game.

I hope the analogy makes sense. What I’m trying to say is that if two players are equally strong in classical chess, but one performs significantly worse in faster time controls, then that player is simply weaker overall chess player.

In the same way, I would be a weaker chess player than someone identical to me in every respect except for the Nimzo-Larsen, where they outperform me.

Ok, I understand, but think about it this way- you’re bad in the Nimzo-Larsen, and some guy just comes up Ina classical tournament, and plays the Nimzo-Larsen against you. Your TWPL would only be bad if you can’t survive the game or defend properly. But if you manage to defend, you aren’t going to lose anything. Your opponent will. When their attack fails, they lose massive amounts of TWPL.

I think you are missing the point of my comment.

@PRIYAMVAD said [^](/forum/redirect/post/kWF3n0nv) > > Although this is an interesting idea, I think blitz and rapid are just as much a part of top-level competition as any other aspect of chess. > > > > For example, I dislike playing against the Nimzo-Larsen, and my results against it are far from ideal. But I don’t make blog posts arguing that my losses against it should be disregarded simply because it’s a weak point in my game. > > > > I hope the analogy makes sense. What I’m trying to say is that if two players are equally strong in classical chess, but one performs significantly worse in faster time controls, then that player is simply weaker overall chess player. > > > > In the same way, I would be a weaker chess player than someone identical to me in every respect except for the Nimzo-Larsen, where they outperform me. > > Ok, I understand, but think about it this way- you’re bad in the Nimzo-Larsen, and some guy just comes up Ina classical tournament, and plays the Nimzo-Larsen against you. Your TWPL would only be bad if you can’t survive the game or defend properly. But if you manage to defend, you aren’t going to lose anything. Your opponent will. When their attack fails, they lose massive amounts of TWPL. I think you are missing the point of my comment.

RuyLopez1000

@alijeba said ^

I hope the analogy makes sense. What I’m trying to say is that if two players are equally strong in classical chess, but one performs significantly worse in faster time controls,

then that player is simply weaker overall chess player.

I understand your analogy. I just don't think it applies here.

The reason is that this tiebreak is suggested for the Classical World Championship.

We are not looking for the overall best player in all formats. But only in Classical.

The Chess Championships are separated by format, they don't measure the best overall chess, but the best performer for each format.

@alijeba said [^](/forum/redirect/post/ms00wlm6) > I hope the analogy makes sense. What I’m trying to say is that if two players are equally strong in classical chess, but one performs significantly worse in faster time controls, >then that player is simply weaker overall chess player. I understand your analogy. I just don't think it applies here. The reason is that this tiebreak is suggested for the Classical World Championship. We are not looking for the overall best player in all formats. But only in Classical. The Chess Championships are separated by format, they don't measure the best overall chess, but the best performer for each format.

alijeba

@RuyLopez1000 said ^

I hope the analogy makes sense. What I’m trying to say is that if two players are equally strong in classical chess, but one performs significantly worse in faster time controls,

then that player is simply weaker overall chess player.

I understand your analogy. I just don't think it applies here.

The reason is that this tiebreak is suggested for the Classical World Championship.

We are not looking for the overall best player in all formats. But only in Classical.

The Chess Championships are separated by format, they don't measure the best overall chess, but the best performer for each format.

I suppose you have a point.

@RuyLopez1000 said [^](/forum/redirect/post/ZEsXgdks) > > > I hope the analogy makes sense. What I’m trying to say is that if two players are equally strong in classical chess, but one performs significantly worse in faster time controls, > > >then that player is simply weaker overall chess player. > > I understand your analogy. I just don't think it applies here. > > The reason is that this tiebreak is suggested for the Classical World Championship. > > We are not looking for the overall best player in all formats. But only in Classical. > > The Chess Championships are separated by format, they don't measure the best overall chess, but the best performer for each format. I suppose you have a point.

DaBassie

@RuyLopez1000 said ^

Feels to me like a situation where the cure might be worse than the disease.

If 'average centipawn loss' is used (or any similar metric), there will always be one player that benefits from dragging the game on for the entire 50-moves-rule, just to improve the average with 50 'perfect moves'. We might see some extremely boring 200+ moves games haha.

That was already addressed in the blog.

In the very first paragraph, in fact.

Ah I see, you are right. I got carried away thinking about averages, not the total sum.

But now I got confused... I we assume chess is a draw starting at 0.0 eval, and we reach a draw in the end counting as 0.0 eval; how can the sum of differences be anything else than 0? (Or perhaps I should say: equal for both players). I don't know enough about the exact calculation to understand it... But introducing draws where 'one player played better than the other' feels shaky. The method with Win-Percentage adds another layer of complexity.

Perhaps the K+R vs K+R endgame was too simple, but take perhaps a closed position where white is in control (say at +1.0 eval) and black can only sit and wait until white decides to break though. Will it be advantageous for white to just shuffle 40+ moves before finally breaking though? Or the opposite: should white hurry? I don't know the answer, but these type of questions will suddenly become extremely relevant.

@RuyLopez1000 said [^](/forum/redirect/post/h260hJQk) > > Feels to me like a situation where the cure might be worse than the disease. > > > > If 'average centipawn loss' is used (or any similar metric), there will always be one player that benefits from dragging the game on for the entire 50-moves-rule, just to improve the average with 50 'perfect moves'. We might see some extremely boring 200+ moves games haha. > > That was already addressed in the blog. > > In the very first paragraph, in fact. Ah I see, you are right. I got carried away thinking about averages, not the total sum. But now I got confused... I we assume chess is a draw starting at 0.0 eval, and we reach a draw in the end counting as 0.0 eval; how can the sum of differences be anything else than 0? (Or perhaps I should say: equal for both players). I don't know enough about the exact calculation to understand it... But introducing draws where 'one player played better than the other' feels shaky. The method with Win-Percentage adds another layer of complexity. Perhaps the K+R vs K+R endgame was too simple, but take perhaps a closed position where white is in control (say at +1.0 eval) and black can only sit and wait until white decides to break though. Will it be advantageous for white to just shuffle 40+ moves before finally breaking though? Or the opposite: should white hurry? I don't know the answer, but these type of questions will suddenly become extremely relevant.

RuyLopez1000

@DaBassie said ^

But now I got confused... I we assume chess is a draw starting at 0.0 eval, and we reach a draw in the end counting as 0.0 eval; how can the sum of differences be anything else than 0? (Or perhaps I should say: equal for both players).

That's a good point. Not sure. On Lichess the centipawn loss can be different in a draw for some reason.

Thank you for this good point.

I see in the paper that drawn games has close but not exact, which indicate error in the engine eval.

This would also happen for TWPL as TWPL is simply TPVL converted to win percentage

Maybe this tiebreak may need to be scrapped altogether.

I don't know enough about the exact calculation to understand it... But introducing draws where 'one player played better than the other' feels shaky. The method with Win-Percentage adds another layer of complexity.

The Win-Percentage simply converts centipawns loss into win-percentage difference.

Perhaps the K+R vs K+R endgame was too simple, but take perhaps a closed position where white is in control (say at +1.0 eval) and black can only sit and wait until white decides to break though. Will it be advantageous for white to just shuffle 40+ moves before finally breaking though? Or the opposite: should white hurry? I don't know the answer, but these type of questions will suddenly become extremely relevant.

I don't think it would be different. In the average centipawn loss, yes as they could drag it out.

The authors used total pawn loss per move because of that concern.

"We instead use a more intuitive modification of that metric, which we term the “total pawn loss,” because (i) even chess enthusiasts do not seem to find the average centipawn loss straightforward, based on our own anecdotal observations, and (ii) it can be manipulated. The second point is easy to confirm because a player can intentionally extend the game, for example, in a theoretically drawn position, thereby ‘artificially’ decreasing his or her average centipawn loss."*

https://pmc.ncbi.nlm.nih.gov/articles/PMC11530033/

In the blog I suggest TWPL, as in the paper centipawn losses of +2-0 and +10-+8 are treated as equivalent which is problematic.

But now I don't see how to get around this draw situation.

@DaBassie said [^](/forum/redirect/post/926YDjzs) > But now I got confused... I we assume chess is a draw starting at 0.0 eval, and we reach a draw in the end counting as 0.0 eval; how can the sum of differences be anything else than 0? (Or perhaps I should say: equal for both players). That's a good point. Not sure. On Lichess the centipawn loss can be different in a draw for some reason. Thank you for this good point. I see in the paper that drawn games has close but not exact, which indicate error in the engine eval. This would also happen for TWPL as TWPL is simply TPVL converted to win percentage Maybe this tiebreak may need to be scrapped altogether. >I don't know enough about the exact calculation to understand it... But introducing draws where 'one player played better than the other' feels shaky. The method with Win-Percentage adds another layer of complexity. The Win-Percentage simply converts centipawns loss into win-percentage difference. > Perhaps the K+R vs K+R endgame was too simple, but take perhaps a closed position where white is in control (say at +1.0 eval) and black can only sit and wait until white decides to break though. Will it be advantageous for white to just shuffle 40+ moves before finally breaking though? Or the opposite: should white hurry? I don't know the answer, but these type of questions will suddenly become extremely relevant. I don't think it would be different. In the average centipawn loss, yes as they could drag it out. The authors used total pawn loss per move because of that concern. "We instead use a more intuitive modification of that metric, which we term the “total pawn loss,” because (i) even chess enthusiasts do not seem to find the average centipawn loss straightforward, based on our own anecdotal observations, and (ii) it can be manipulated. The second point is easy to confirm because a player can intentionally extend the game, for example, in a theoretically drawn position, thereby ‘artificially’ decreasing his or her average centipawn loss."* https://pmc.ncbi.nlm.nih.gov/articles/PMC11530033/ In the blog I suggest TWPL, as in the paper centipawn losses of +2-0 and +10-+8 are treated as equivalent which is problematic. But now I don't see how to get around this draw situation.

aanellien

So does that mean 1 game of high inaccuracies and high material difference lead to a better score than a very close game and just won by narrowest margin with mostly correct moves on both sides ?

SimplyBubbles014

No , I don't agree. The chess is now enough boring with all the engine backed preparation. If fide implies the idea you suggested, there would be like 50 engine backed preparation moves. So then , it would just be the battle of engines. Then why don't we just conduct Engine World Championship? Chess should be a game of creativity , not of memorization

Your network blocks the Lichess assets!

World Champion Tiebreaks: A Counter-Intuitive Proposal.