Your network blocks the Lichess assets!

lichess.org
Donate

World Champion Tiebreaks: A Counter-Intuitive Proposal.

Feels to me like a situation where the cure might be worse than the disease.

Giving any kind of value to computer evaluations will probably create a massive movement of people researching its behavior, trying to extract the tiniest bits of advantage out of it. It will impact how chess is played. Imagine a completely drawn K+R v.s. K+R endgame. If 'average centipawn loss' is used (or any similar metric), there will always be one player that benefits from dragging the game on for the entire 50-moves-rule, just to improve the average with 50 'perfect moves'. We might see some extremely boring 200+ moves games haha.

Another thing is tablebases. In any extremely complicated 8-piece tablebase position, the engine evaluation is no longer a number. Its just 'winning','draw' or 'losing'. I feels that raises a mathematical issue. In complicated positions, there is no way for a human to distinguish these three categories, so it feels rather unfair to base the match outcome on it.

Rapid chess as tiebreak might not be perfect, but I don't mind it that much. It is nice and easy to understand for the general audience.

Feels to me like a situation where the cure might be worse than the disease. Giving any kind of value to computer evaluations will probably create a massive movement of people researching its behavior, trying to extract the tiniest bits of advantage out of it. It will impact how chess is played. Imagine a completely drawn K+R v.s. K+R endgame. If 'average centipawn loss' is used (or any similar metric), there will always be one player that benefits from dragging the game on for the entire 50-moves-rule, just to improve the average with 50 'perfect moves'. We might see some extremely boring 200+ moves games haha. Another thing is tablebases. In any extremely complicated 8-piece tablebase position, the engine evaluation is no longer a number. Its just 'winning','draw' or 'losing'. I feels that raises a mathematical issue. In complicated positions, there is no way for a human to distinguish these three categories, so it feels rather unfair to base the match outcome on it. Rapid chess as tiebreak might not be perfect, but I don't mind it that much. It is nice and easy to understand for the general audience.

The current tb system is fine in my opinion. It rewards the title to the person who can win with less time to think. Also intuitively this system should result in equal values for both players right? since its a 2 player system and win percentages are net 0, if player A blunders but player B fails to capitalize, then the net win percentage change would still be 0

The current tb system is fine in my opinion. It rewards the title to the person who can win with less time to think. Also intuitively this system should result in equal values for both players right? since its a 2 player system and win percentages are net 0, if player A blunders but player B fails to capitalize, then the net win percentage change would still be 0

Computer evaluation-based tie breaker is wrong at its core: objectively speaking, at every given turn a position is won/drawn/lost with mutual flawless play, computer's evaluation isn't like this, it gives rather arbitrary numbers, so all these metrics aren't objective. If you can't force players play sudden death, you should find other alternatives. Maybe even something peculiar like clock usage: you can argue that in case of a draw the player who used less time played better.

Computer evaluation-based tie breaker is wrong at its core: objectively speaking, at every given turn a position is won/drawn/lost with mutual flawless play, computer's evaluation isn't like this, it gives rather arbitrary numbers, so all these metrics aren't objective. If you can't force players play sudden death, you should find other alternatives. Maybe even something peculiar like clock usage: you can argue that in case of a draw the player who used less time played better.

The idea is flawed. Before massive tablebases even a strong engine could give all sorts of evals. These evals were considered "0bjective". then came the latest tablebase showing that all those evals were bogus, they [a subset of them] were proven to be drawn. Or... not many years ago, Stockfish was just clueless on opposite-coloured bishop endgames. Imagine eval-tiebreaks then!

The same argument can be made for early stages of games, as nicely explained by @GennadyBukin and @DaBassie. The evals we see +1.1 or + 2.1 are just highly educated guesses, not absolute truths. We just happen to lack the technology to prove the optimal game outcome, which can always only be one one of won/drawn/lost.

Sorry, but IMO eval-based tiebreaks are an academic pipe-dream.

The idea is flawed. Before massive tablebases even a strong engine could give all sorts of evals. These evals were considered "0bjective". then came the latest tablebase showing that all those evals were bogus, they [a subset of them] were proven to be drawn. Or... not many years ago, Stockfish was just clueless on opposite-coloured bishop endgames. Imagine eval-tiebreaks then! The same argument can be made for early stages of games, as nicely explained by @GennadyBukin and @DaBassie. The evals we see +1.1 or + 2.1 are just highly educated guesses, not absolute truths. We just happen to lack the technology to prove the optimal game outcome, which can always only be one one of won/drawn/lost. Sorry, but IMO eval-based tiebreaks are an academic pipe-dream.

Absolutely not this is ridiculous

Absolutely not this is ridiculous

Although this is an interesting idea, I think blitz and rapid are just as much a part of top-level competition as any other aspect of chess.

For example, I dislike playing against the Nimzo-Larsen, and my results against it are far from ideal. But I don’t make blog posts arguing that my losses against it should be disregarded simply because it’s a weak point in my game.

I hope the analogy makes sense. What I’m trying to say is that if two players are equally strong in classical chess, but one performs significantly worse in faster time controls, then that player is simply weaker overall chess player.

In the same way, I would be a weaker chess player than someone identical to me in every respect except for the Nimzo-Larsen, where they outperform me.

Although this is an interesting idea, I think blitz and rapid are just as much a part of top-level competition as any other aspect of chess. For example, I dislike playing against the Nimzo-Larsen, and my results against it are far from ideal. But I don’t make blog posts arguing that my losses against it should be disregarded simply because it’s a weak point in my game. I hope the analogy makes sense. What I’m trying to say is that if two players are equally strong in classical chess, but one performs significantly worse in faster time controls, then that player is simply weaker overall chess player. In the same way, I would be a weaker chess player than someone identical to me in every respect except for the Nimzo-Larsen, where they outperform me.

@DaBassie said ^

Feels to me like a situation where the cure might be worse than the disease.

If 'average centipawn loss' is used (or any similar metric), there will always be one player that benefits from dragging the game on for the entire 50-moves-rule, just to improve the average with 50 'perfect moves'. We might see some extremely boring 200+ moves games haha.

That was already addressed in the blog.

In the very first paragraph, in fact.

@DaBassie said [^](/forum/redirect/post/UTDraSI6) > Feels to me like a situation where the cure might be worse than the disease. > > If 'average centipawn loss' is used (or any similar metric), there will always be one player that benefits from dragging the game on for the entire 50-moves-rule, just to improve the average with 50 'perfect moves'. We might see some extremely boring 200+ moves games haha. That was already addressed in the blog. In the very first paragraph, in fact.

@mrgwbland said ^

Absolutely not this is ridiculous

Why tho?

@mrgwbland said [^](/forum/redirect/post/rEH7ns4e) > Absolutely not this is ridiculous Why tho?

@alijeba said ^

Although this is an interesting idea, I think blitz and rapid are just as much a part of top-level competition as any other aspect of chess.

For example, I dislike playing against the Nimzo-Larsen, and my results against it are far from ideal. But I don’t make blog posts arguing that my losses against it should be disregarded simply because it’s a weak point in my game.

I hope the analogy makes sense. What I’m trying to say is that if two players are equally strong in classical chess, but one performs significantly worse in faster time controls, then that player is simply weaker overall chess player.

In the same way, I would be a weaker chess player than someone identical to me in every respect except for the Nimzo-Larsen, where they outperform me.

Hmm interesting point. There are separate world championships for each format which indicates that they are seen as separate.

@alijeba said [^](/forum/redirect/post/ms00wlm6) > Although this is an interesting idea, I think blitz and rapid are just as much a part of top-level competition as any other aspect of chess. > > For example, I dislike playing against the Nimzo-Larsen, and my results against it are far from ideal. But I don’t make blog posts arguing that my losses against it should be disregarded simply because it’s a weak point in my game. > > I hope the analogy makes sense. What I’m trying to say is that if two players are equally strong in classical chess, but one performs significantly worse in faster time controls, then that player is simply weaker overall chess player. > > In the same way, I would be a weaker chess player than someone identical to me in every respect except for the Nimzo-Larsen, where they outperform me. Hmm interesting point. There are separate world championships for each format which indicates that they are seen as separate.

@RuyLopez1000 said ^

Absolutely not this is ridiculous

Why tho?

I feel so strongly and am so vehemently against this idea that I have decided to write a full argument against it.
As both a competitive chess player and a programmer who has written a chess engine from scratch, I can say with certainty that using centipawn loss for World Championship tiebreaks is a terrible idea. It fundamentally misunderstands both the soul of chess and the current limitations of computer science, I'm going to break this down into major points.

  1. It Changes the Core Goal of Chess
    Chess fundamentally has one goal, to checkmate the opponent. Introducing CP loss as a tiebreak criteria changes that entirely. It injects a secondary, artificial goal into a player’s head. Instead of playing the board and the opponent, players would be playing to please the engine.

  2. It Punishes Creative, Psychological, and Human Play
    Chess is a game played against a human opponent, where the objective is to win (or secure a draw as Black). In a World Championship, your opponent is of a similar, elite strength, but they are not infallible.
    A player can often be rewarded (in terms of game result) by playing ambitiously and violently, intentionally entering sharp, practical complications that might not be strictly "engine sound" but are incredibly difficult for a human to play accurately in (think Tal). Under a CP loss tiebreak system, this brilliant, risky human play is actively punished.

  3. It Will Incentivise Incredibly Boring Chess
    If the winner of the World Championship can be decided by who kept their engine score the cleanest, the playstyle favored will be risk-averse and dull! Imagine a match where both players draw every single game because they are too terrified to unbalance the position and risk lowering their accuracy metric. The title of World Champion would literally be awarded to whoever played the most boring chess.

  4. Engines Are Not Infallible
    Even if we ignored the psychological ruin of the game, the technical premise is flawed because engines are not God. Yes, Stockfish and neural networks are incredibly strong, but they can still be mistaken. The first thing that comes to mind is Hikaru drawing engines using hippo-style openings, and we see it today with modern Stockfish still misevaluating completely locked-down positions where one side is up material but has zero breakthroughs.
    Furthermore, chess is not solved. If you run a position through Stockfish, Leela, or other top engines, you will get different evaluations and different "best" moves. These metrics are not objective truth; they are the "opinions" of machine learning heuristics, meaning that depending on what engine you choose a different World Champion could be decided!

  5. CP loss Breaks Down at the Highest Level
    Finally, the maths behind centipawn loss calculation simply fails in high-level games. The horizon effect and search depths mean that a player can play what the best move according to the engine, only for the engine to shift its evaluation after the move is played and dock the player's "accuracy."

The World Championship should be decided by who can beat the human across the board. Engines are incredibly useful and amazing creations but this would be a severe misuse!!

@RuyLopez1000 said [^](/forum/redirect/post/RwODZNmY) > > Absolutely not this is ridiculous > > Why tho? I feel so strongly and am so vehemently against this idea that I have decided to write a full argument against it. As both a competitive chess player and a programmer who has written a chess engine from scratch, I can say with certainty that using centipawn loss for World Championship tiebreaks is a terrible idea. It fundamentally misunderstands both the soul of chess and the current limitations of computer science, I'm going to break this down into major points. 1. It Changes the Core Goal of Chess Chess fundamentally has one goal, to checkmate the opponent. Introducing CP loss as a tiebreak criteria changes that entirely. It injects a secondary, artificial goal into a player’s head. Instead of playing the board and the opponent, players would be playing to please the engine. 2. It Punishes Creative, Psychological, and Human Play Chess is a game played against a human opponent, where the objective is to win (or secure a draw as Black). In a World Championship, your opponent is of a similar, elite strength, but they are not infallible. A player can often be rewarded (in terms of game result) by playing ambitiously and violently, intentionally entering sharp, practical complications that might not be strictly "engine sound" but are incredibly difficult for a human to play accurately in (think Tal). Under a CP loss tiebreak system, this brilliant, risky human play is actively punished. 3. It Will Incentivise Incredibly Boring Chess If the winner of the World Championship can be decided by who kept their engine score the cleanest, the playstyle favored will be risk-averse and dull! Imagine a match where both players draw every single game because they are too terrified to unbalance the position and risk lowering their accuracy metric. The title of World Champion would literally be awarded to whoever played the most boring chess. 4. Engines Are Not Infallible Even if we ignored the psychological ruin of the game, the technical premise is flawed because engines are not God. Yes, Stockfish and neural networks are incredibly strong, but they can still be mistaken. The first thing that comes to mind is Hikaru drawing engines using hippo-style openings, and we see it today with modern Stockfish still misevaluating completely locked-down positions where one side is up material but has zero breakthroughs. Furthermore, chess is not solved. If you run a position through Stockfish, Leela, or other top engines, you will get different evaluations and different "best" moves. These metrics are not objective truth; they are the "opinions" of machine learning heuristics, meaning that depending on what engine you choose a different World Champion could be decided! 5. CP loss Breaks Down at the Highest Level Finally, the maths behind centipawn loss calculation simply fails in high-level games. The horizon effect and search depths mean that a player can play what the best move according to the engine, only for the engine to shift its evaluation after the move is played and dock the player's "accuracy." The World Championship should be decided by who can beat the human across the board. Engines are incredibly useful and amazing creations but this would be a severe misuse!!