@mrgwbland said ^
Absolutely not this is ridiculous
Why tho?
I feel so strongly and am so vehemently against this idea that I have decided to write a full argument against it.
As both a competitive chess player and a programmer who has written a chess engine from scratch, I can say with certainty that using centipawn loss for World Championship tiebreaks is a terrible idea. It fundamentally misunderstands both the soul of chess and the current limitations of computer science, I'm going to break this down into major points.
I read your points. I can see that you did not bother to read what I wrote as otherwise you would have addressed what I had said.
So I had to paste what I wrote in the blog. In the hope that you will read it.
Please read my blog. People don't read and then they say things which don't make sense.
- It Changes the Core Goal of Chess
Chess fundamentally has one goal, to checkmate the opponent. Introducing CP loss as a tiebreak criteria changes that entirely. It injects a secondary, artificial goal into a player’s head. Instead of playing the board and the opponent, players would be playing to please the engine.
Firstly I'm not using centipawn loss.
So from here on, I will recommend the Total Win Percentage Loss (TWPL) as a more suitable metric than Total Pawn Loss Value.
The Total Win Percentage Loss (TWPL) sums up the win percentage loss across all the moves in the match, like the Total Pawn Loss Value did.
- It Punishes Creative, Psychological, and Human Play
Chess is a game played against a human opponent, where the objective is to win (or secure a draw as Black). In a World Championship, your opponent is of a similar, elite strength, but they are not infallible.
A player can often be rewarded (in terms of game result) by playing ambitiously and violently, intentionally entering sharp, practical complications that might not be strictly "engine sound" but are incredibly difficult for a human to play accurately in (think Tal). Under a CP loss tiebreak system, this brilliant, risky human play is actively punished.
I already mentioned this in the blog. Like why didn't you read it. Now I gotta paste the blog for you. Please address what I wrote.
There is the question of style. Tactical and complex players have lower inaccuracies. But this is fine for a two-player event as TWPL is a relative measure. We are only interested in whether one player is better than the other. As an example, Firouzja playing tactical chess doesn't hurt him. It only hurts him if his opponent manages to defend. The opponent also has to cope with the pressure. The opponent making errors will give them a worse TWPL than Firouzja. Only way they can have a better TWPL than Firouzja is if they defend his attack with an advantage. In which case they deserve to have a better TWPL. For this reason, TWPL may not be suitable for tournaments as solid players would be favored over tactical/complex players.
- It Will Incentivise Incredibly Boring Chess
If the winner of the World Championship can be decided by who kept their engine score the cleanest, the playstyle favored will be risk-averse and dull! Imagine a match where both players draw every single game because they are too terrified to unbalance the position and risk lowering their accuracy metric. The title of World Champion would literally be awarded to whoever played the most boring chess.
Once again. I addressed these points in the blog but you didn't read what I wrote. Please read and address.
There is the question of style. Tactical and complex players have lower inaccuracies. But this is fine for a two-player event as TWPL is a relative measure. We are only interested in whether one player is better than the other. As an example, Firouzja playing tactical chess doesn't hurt him. It only hurts him if his opponent manages to defend. The opponent also has to cope with the pressure. The opponent making errors will give them a worse TWPL than Firouzja. Only way they can have a better TWPL than Firouzja is if they defend his attack with an advantage. In which case they deserve to have a better TWPL. For this reason, TWPL may not be suitable for tournaments as solid players would be favored over tactical/complex players.
Losing games is worse than having a lower TWPL score. The tiebreak doesn't change the fact that the players want to win games. Having a lead in the classical section is better than leading the TWPL. If the player has a practical chance of defending then they will play on.
Another interesting question is whether players would change their playstyles to be more solid. But what would be the benefit of playing safer openings? It would just make the TWPL a coin toss. In a regular match the GMs don't play Petrov every game because they know they have to win. Besides, the player with a lower TWPL would still have to play for a win anyway. Now in the current WCH with Rapid and Blitz tiebreaks we already have instances of players changing their playstyle to be more solid. Carlsen agreed to a draw in a better position against Caruana because he didn't want to risk it in the 2018 match. Under the new tiebreak that wouldn't occur.
If a player wants to go for complicated play in a game, TWPL would not make a difference. As an example, under the current WCH setup they would take a risk. If they feel confident enough to play better than their opponent in that position, why would their confidence level change with the TWPL system? It doesn't matter how badly they play, just as long as they play better than their opponent. It's relative.
- Engines Are Not Infallible
Even if we ignored the psychological ruin of the game, the technical premise is flawed because engines are not God. Yes, Stockfish and neural networks are incredibly strong, but they can still be mistaken. The first thing that comes to mind is Hikaru drawing engines using hippo-style openings, and we see it today with modern Stockfish still misevaluating completely locked-down positions where one side is up material but has zero breakthroughs.
Source please.
I very much doubt that Nakamura would draw the latest Stockfish with the Hippo. When did this happen, and what engine, what settings etc.
Furthermore, chess is not solved. If you run a position through Stockfish, Leela, or other top engines, you will get different evaluations and different "best" moves. These metrics are not objective truth; they are the "opinions" of machine learning heuristics, meaning that depending on what engine you choose a different World Champion could be decided!
I also addressed this. Please read my blog. It's irritating to see someone not bother to read.
One important concern is the reliability of engine evals. Different engines with different depths/time spent can give different evals. Even the same engine can give different evals with the same settings. Engines evals should be consistent to make accuracy determinations valid. The conversion from TPLV to TWPL means that reliability will be increased as evals on the higher end can be more variable.
We will have to compare evals within the latest Stockfish and between previous Stockfish versions to see the margin of error. To see by how much the Total Win Percentage Loss changes between the same version and previous versions. If the margin of error is low than we can proceed. The strongest and most consistent engine should be used in world class events (Stockfish), with optimal settings. The engine used, depth, search parameters and hardware should be disclosed publicly when measuring the TWPL and these settings should be kept consistent throughout the event. For the concern about fluctuations of TWPL being random and arbitrary/inconsistent engine evals, a threshold could be used where TWPL for games that are within a certain percentage are treated as equal, as a example. Determining the right threshold will need to be investigated empirically.
- CP loss Breaks Down at the Highest Level
Finally, the maths behind centipawn loss calculation simply fails in high-level games. The horizon effect and search depths mean that a player can play what the best move according to the engine, only for the engine to shift its evaluation after the move is played and dock the player's "accuracy."
The World Championship should be decided by who can beat the human across the board. Engines are incredibly useful and amazing creations but this would be a severe misuse!!
One important concern is the reliability of engine evals. Different engines with different depths/time spent can give different evals. Even the same engine can give different evals with the same settings. Engines evals should be consistent to make accuracy determinations valid. The conversion from TPLV to TWPL means that reliability will be increased as evals on the higher end can be more variable.
We will have to compare evals within the latest Stockfish and between previous Stockfish versions to see the margin of error. To see by how much the Total Win Percentage Loss changes between the same version and previous versions. If the margin of error is low than we can proceed. The strongest and most consistent engine should be used in world class events (Stockfish), with optimal settings. The engine used, depth, search parameters and hardware should be disclosed publicly when measuring the TWPL and these settings should be kept consistent throughout the event. For the concern about fluctuations of TWPL being random and arbitrary/inconsistent engine evals, a threshold could be used where TWPL for games that are within a certain percentage are treated as equal, as a example. Determining the right threshold will need to be investigated empirically.
@mrgwbland said [^](/forum/redirect/post/SYPWDMlY)
> > > Absolutely not this is ridiculous
> >
> > Why tho?
>
> I feel so strongly and am so vehemently against this idea that I have decided to write a full argument against it.
> As both a competitive chess player and a programmer who has written a chess engine from scratch, I can say with certainty that using centipawn loss for World Championship tiebreaks is a terrible idea. It fundamentally misunderstands both the soul of chess and the current limitations of computer science, I'm going to break this down into major points.
I read your points. I can see that you did not bother to read what I wrote as otherwise you would have addressed what I had said.
So I had to paste what I wrote in the blog. In the hope that you will read it.
Please read my blog. People don't read and then they say things which don't make sense.
> 1. It Changes the Core Goal of Chess
> Chess fundamentally has one goal, to checkmate the opponent. Introducing CP loss as a tiebreak criteria changes that entirely. It injects a secondary, artificial goal into a player’s head. Instead of playing the board and the opponent, players would be playing to please the engine.
Firstly I'm not using centipawn loss.
*So from here on, I will recommend the Total Win Percentage Loss (TWPL) as a more suitable metric than Total Pawn Loss Value.
The Total Win Percentage Loss (TWPL) sums up the win percentage loss across all the moves in the match, like the Total Pawn Loss Value did.*
> 2. It Punishes Creative, Psychological, and Human Play
> Chess is a game played against a human opponent, where the objective is to win (or secure a draw as Black). In a World Championship, your opponent is of a similar, elite strength, but they are not infallible.
> A player can often be rewarded (in terms of game result) by playing ambitiously and violently, intentionally entering sharp, practical complications that might not be strictly "engine sound" but are incredibly difficult for a human to play accurately in (think Tal). Under a CP loss tiebreak system, this brilliant, risky human play is actively punished.
I already mentioned this in the blog. Like why didn't you read it. Now I gotta paste the blog for you. Please address what I wrote.
*There is the question of style. Tactical and complex players have lower inaccuracies. But this is fine for a two-player event as TWPL is a relative measure. We are only interested in whether one player is better than the other. As an example, Firouzja playing tactical chess doesn't hurt him. It only hurts him if his opponent manages to defend. The opponent also has to cope with the pressure. The opponent making errors will give them a worse TWPL than Firouzja. Only way they can have a better TWPL than Firouzja is if they defend his attack with an advantage. In which case they deserve to have a better TWPL. For this reason, TWPL may not be suitable for tournaments as solid players would be favored over tactical/complex players.*
> 3. It Will Incentivise Incredibly Boring Chess
> If the winner of the World Championship can be decided by who kept their engine score the cleanest, the playstyle favored will be risk-averse and dull! Imagine a match where both players draw every single game because they are too terrified to unbalance the position and risk lowering their accuracy metric. The title of World Champion would literally be awarded to whoever played the most boring chess.
Once again. I addressed these points in the blog but you didn't read what I wrote. Please read and address.
*There is the question of style. Tactical and complex players have lower inaccuracies. But this is fine for a two-player event as TWPL is a relative measure. We are only interested in whether one player is better than the other. As an example, Firouzja playing tactical chess doesn't hurt him. It only hurts him if his opponent manages to defend. The opponent also has to cope with the pressure. The opponent making errors will give them a worse TWPL than Firouzja. Only way they can have a better TWPL than Firouzja is if they defend his attack with an advantage. In which case they deserve to have a better TWPL. For this reason, TWPL may not be suitable for tournaments as solid players would be favored over tactical/complex players.*
*Losing games is worse than having a lower TWPL score. The tiebreak doesn't change the fact that the players want to win games. Having a lead in the classical section is better than leading the TWPL. If the player has a practical chance of defending then they will play on.*
*Another interesting question is whether players would change their playstyles to be more solid. But what would be the benefit of playing safer openings? It would just make the TWPL a coin toss. In a regular match the GMs don't play Petrov every game because they know they have to win. Besides, the player with a lower TWPL would still have to play for a win anyway. Now in the current WCH with Rapid and Blitz tiebreaks we already have instances of players changing their playstyle to be more solid. Carlsen agreed to a draw in a better position against Caruana because he didn't want to risk it in the 2018 match. Under the new tiebreak that wouldn't occur.*
*If a player wants to go for complicated play in a game, TWPL would not make a difference. As an example, under the current WCH setup they would take a risk. If they feel confident enough to play better than their opponent in that position, why would their confidence level change with the TWPL system? It doesn't matter how badly they play, just as long as they play better than their opponent. It's relative.*
> 4. Engines Are Not Infallible
> Even if we ignored the psychological ruin of the game, the technical premise is flawed because engines are not God. Yes, Stockfish and neural networks are incredibly strong, but they can still be mistaken. The first thing that comes to mind is Hikaru drawing engines using hippo-style openings, and we see it today with modern Stockfish still misevaluating completely locked-down positions where one side is up material but has zero breakthroughs.
Source please.
I very much doubt that Nakamura would draw the latest Stockfish with the Hippo. When did this happen, and what engine, what settings etc.
> Furthermore, chess is not solved. If you run a position through Stockfish, Leela, or other top engines, you will get different evaluations and different "best" moves. These metrics are not objective truth; they are the "opinions" of machine learning heuristics, meaning that depending on what engine you choose a different World Champion could be decided!
I also addressed this. Please read my blog. It's irritating to see someone not bother to read.
*One important concern is the reliability of engine evals. Different engines with different depths/time spent can give different evals. Even the same engine can give different evals with the same settings. Engines evals should be consistent to make accuracy determinations valid. The conversion from TPLV to TWPL means that reliability will be increased as evals on the higher end can be more variable.*
*We will have to compare evals within the latest Stockfish and between previous Stockfish versions to see the margin of error. To see by how much the Total Win Percentage Loss changes between the same version and previous versions. If the margin of error is low than we can proceed. The strongest and most consistent engine should be used in world class events (Stockfish), with optimal settings. The engine used, depth, search parameters and hardware should be disclosed publicly when measuring the TWPL and these settings should be kept consistent throughout the event. For the concern about fluctuations of TWPL being random and arbitrary/inconsistent engine evals, a threshold could be used where TWPL for games that are within a certain percentage are treated as equal, as a example. Determining the right threshold will need to be investigated empirically.*
> 5. CP loss Breaks Down at the Highest Level
> Finally, the maths behind centipawn loss calculation simply fails in high-level games. The horizon effect and search depths mean that a player can play what the best move according to the engine, only for the engine to shift its evaluation after the move is played and dock the player's "accuracy."
>
> The World Championship should be decided by who can beat the human across the board. Engines are incredibly useful and amazing creations but this would be a severe misuse!!
*One important concern is the reliability of engine evals. Different engines with different depths/time spent can give different evals. Even the same engine can give different evals with the same settings. Engines evals should be consistent to make accuracy determinations valid. The conversion from TPLV to TWPL means that reliability will be increased as evals on the higher end can be more variable.*
*We will have to compare evals within the latest Stockfish and between previous Stockfish versions to see the margin of error. To see by how much the Total Win Percentage Loss changes between the same version and previous versions. If the margin of error is low than we can proceed. The strongest and most consistent engine should be used in world class events (Stockfish), with optimal settings. The engine used, depth, search parameters and hardware should be disclosed publicly when measuring the TWPL and these settings should be kept consistent throughout the event. For the concern about fluctuations of TWPL being random and arbitrary/inconsistent engine evals, a threshold could be used where TWPL for games that are within a certain percentage are treated as equal, as a example. Determining the right threshold will need to be investigated empirically.*