lichess.org
Donate

Can LLMs like ChatGPT Understand Chess?

I am actually not very surprised that LLMs struggle with chess positions that is nothing like what it saw in the training data, so I gave it something it had to had to see before:

https://adjva4.dpdns.org/ijnzPPu8

I asked Gemini 3 to discuss some good and bad moves for black:

This is a classic opening trap from the Scholar's Mate family. White has developed their Queen and Bishop early to target the weakest point in Black's camp: the f7 pawn, which is only protected by the King.

It is currently Black's turn, and the stakes are high. One wrong move leads to an immediate loss.

The "Instant Loss" Moves
These moves are common mistakes for beginners who don't see the threat on f7.

Nf6 (Knight to f6): While this looks like a developing move that attacks the Queen, it ignores the threat. White plays Qxf7# and it is checkmate.

d6 or a6: Any "waiting" move or pawn push that doesn't address the f7 square results in the same Qxf7# checkmate.

️ The "Natural but Risky" Move
g6 (Pawn to g6): This attacks the Queen and blocks the path to f7. However, in this specific variation (where Black's e-pawn has moved), it can be dangerous. While it's a standard response in some lines, here White can sometimes retreat the Queen to f3 to keep the pressure on f7, forcing Black to find another solution anyway.

The Best Moves for Black
Black needs to defend f7 while ideally developing a piece.

  1. Qe7 (Queen to e7) — Solid Defense
    This is the most direct way to protect f7.

Pros: It completely neutralizes the checkmate threat. If White plays Qxf7+, Black simply captures with the Queen.

Cons: It blocks the dark-squared bishop on f8, which slows down Black's development and kingside castling.

  1. Qf6 (Queen to f6) — The Counter-Pressure
    Similar to Qe7, this defends f7.

Pros: It offers a Queen trade. If White trades, the "early queen attack" pressure is gone. If White retreats, Black has a well-placed Queen.

Cons: It can sometimes be a target for White's pieces later (like a Knight jumping to d5).

  1. Nh6 (Knight to h6) — The Awkward Guard
    This develops a piece and defends f7.

Pros: Keeps the Queen and Bishops free for other tasks.

Cons: "A Knight on the rim is dim." The Knight is poorly placed on the edge of the board and can be eliminated if White plays d3 and prepares to sacrifice a Bishop (Bxh6) to remove the defender of f7.

Summary Table
Move Rating Outcome
Qe7 Best Securely defends the King; slightly cramps development.
Qf6 Good Defends and challenges White to trade Queens.
Nh6 Okay Defends, but the Knight is poorly placed on the edge.
Nf6 Blunder Checkmate in 1 (Qxf7#)
Would you like me to show you how to punish White if they continue to bring their Queen out too early in the next few moves?

The moves it gives are correct, but the explanations are not. This is surprising to me, because I would think that this is such a common beginner position that all of these lines should've been explained in books and forums that LLM trained on, and therefore the LLM wouldn't even need to "think" about it; it would be just something it should know from the training data.

I am actually not very surprised that LLMs struggle with chess positions that is nothing like what it saw in the training data, so I gave it something it had to had to see before: https://adjva4.dpdns.org/ijnzPPu8 I asked Gemini 3 to discuss some good and bad moves for black: > This is a classic opening trap from the Scholar's Mate family. White has developed their Queen and Bishop early to target the weakest point in Black's camp: the f7 pawn, which is only protected by the King. > It is currently Black's turn, and the stakes are high. One wrong move leads to an immediate loss. > The "Instant Loss" Moves These moves are common mistakes for beginners who don't see the threat on f7. > Nf6 (Knight to f6): While this looks like a developing move that attacks the Queen, it ignores the threat. White plays Qxf7# and it is checkmate. > d6 or a6: Any "waiting" move or pawn push that doesn't address the f7 square results in the same Qxf7# checkmate. > ️ The "Natural but Risky" Move g6 (Pawn to g6): This attacks the Queen and blocks the path to f7. However, in this specific variation (where Black's e-pawn has moved), it can be dangerous. While it's a standard response in some lines, here White can sometimes retreat the Queen to f3 to keep the pressure on f7, forcing Black to find another solution anyway. > The Best Moves for Black Black needs to defend f7 while ideally developing a piece. > 1. Qe7 (Queen to e7) — Solid Defense This is the most direct way to protect f7. > Pros: It completely neutralizes the checkmate threat. If White plays Qxf7+, Black simply captures with the Queen. > Cons: It blocks the dark-squared bishop on f8, which slows down Black's development and kingside castling. > 2. Qf6 (Queen to f6) — The Counter-Pressure Similar to Qe7, this defends f7. > Pros: It offers a Queen trade. If White trades, the "early queen attack" pressure is gone. If White retreats, Black has a well-placed Queen. > Cons: It can sometimes be a target for White's pieces later (like a Knight jumping to d5). > 3. Nh6 (Knight to h6) — The Awkward Guard This develops a piece and defends f7. > Pros: Keeps the Queen and Bishops free for other tasks. > Cons: "A Knight on the rim is dim." The Knight is poorly placed on the edge of the board and can be eliminated if White plays d3 and prepares to sacrifice a Bishop (Bxh6) to remove the defender of f7. > Summary Table > Move Rating Outcome > Qe7 Best Securely defends the King; slightly cramps development. > Qf6 Good Defends and challenges White to trade Queens. > Nh6 Okay Defends, but the Knight is poorly placed on the edge. > Nf6 Blunder Checkmate in 1 (Qxf7#) > Would you like me to show you how to punish White if they continue to bring their Queen out too early in the next few moves? The moves it gives are correct, but the explanations are not. This is surprising to me, because I would think that this is such a common beginner position that all of these lines should've been explained in books and forums that LLM trained on, and therefore the LLM wouldn't even need to "think" about it; it would be just something it should know from the training data.

I wonder what kind of training data these LLMs actually had. I'm sure they scraped the lichess forums/blogs/etc, as well as other free sites, reddit, etc. But I don't think chess books are freely available on the internet. And even though these companies don't give a sh!t about copyright, they still have to find the content. A site like Chessable has content, but you cannot just access it without payment.

Also, not everything that is found on the internet is 100% correct, but still ends up in the training set.

I wonder what kind of training data these LLMs actually had. I'm sure they scraped the lichess forums/blogs/etc, as well as other free sites, reddit, etc. But I don't think chess books are freely available on the internet. And even though these companies don't give a sh!t about copyright, they still have to find the content. A site like Chessable has content, but you cannot just access it without payment. Also, not everything that is found on the internet is 100% correct, but still ends up in the training set.

@RuyLopez1000 Thanks for sharing this well articulated article!

I'm currently building an ai-powered chess improvement platform and have stumbled upon similar LLM limitations during my engineering journey. My initial goal was to merge an LLM & Stockfish together to create a "ChatGPT for Chess" but discovered very quickly that the LLM, by itself, was often unable to correctly explain a positional advantage that Stockfish had calculated.

Now I'm focusing on establishing a heuristics/analytics layer first which the LLM can then use as context guardrails to prevent hallucinations/illegal move recommendations/etc when providing improvement feedback to users.

I would love to discuss this topic further with you if your interested.

@RuyLopez1000 Thanks for sharing this well articulated article! I'm currently building an ai-powered chess improvement platform and have stumbled upon similar LLM limitations during my engineering journey. My initial goal was to merge an LLM & Stockfish together to create a "ChatGPT for Chess" but discovered very quickly that the LLM, by itself, was often unable to correctly explain a positional advantage that Stockfish had calculated. Now I'm focusing on establishing a heuristics/analytics layer first which the LLM can then use as context guardrails to prevent hallucinations/illegal move recommendations/etc when providing improvement feedback to users. I would love to discuss this topic further with you if your interested.

talking about ai in a blog with ai thumbnail lol

talking about ai in a blog with ai thumbnail lol