Introducing a universal rating converter for 2024

OnePlayAtATime

How does this model address the fact that the blitz/rapid/classical apparent definitions used by lichess and chessdotcom do not match FIDE? The lichess quick pairing board has 10+0 as the shortest rapid, when for FIDE it is the longest blitz; and 30+0 and 30+20 are classical for lichess when they are mid-rapid for FIDE. Was lichess-classical used, or lichess games that meet the FIDE classical definition? Same re chessdotcom whose rapid were used; they also label 10+0 as rapid (though they have 5+5 as blitz, which is the same thing according to FIDE).

FIDE definitions taken just now from https://rcc.fide.com/fide-laws-of-chess_fulltexthtml/ :

"A.1 A ‘Rapid chess’ game is one where either all the moves must be completed in a fixed time of more than 10 minutes but less than 60 minutes for each player; or the time allotted plus 60 times any increment is of more than 10 minutes but less than 60 minutes for each player."

"B.1 A ‘blitz’ game is one where all the moves must be completed in a fixed time of 10 minutes or less for each player; or the allotted time plus 60 times any increment is 10 minutes or less for each player."

How does this model address the fact that the blitz/rapid/classical apparent definitions used by lichess and chessdotcom do not match FIDE? The lichess quick pairing board has 10+0 as the shortest rapid, when for FIDE it is the longest blitz; and 30+0 and 30+20 are classical for lichess when they are mid-rapid for FIDE. Was lichess-classical used, or lichess games that meet the FIDE classical definition? Same re chessdotcom whose rapid were used; they also label 10+0 as rapid (though they have 5+5 as blitz, which is the same thing according to FIDE). FIDE definitions taken just now from https://rcc.fide.com/fide-laws-of-chess_fulltexthtml/ : "A.1 A ‘Rapid chess’ game is one where either all the moves must be completed in a fixed time of more than 10 minutes but less than 60 minutes for each player; or the time allotted plus 60 times any increment is of more than 10 minutes but less than 60 minutes for each player." "B.1 A ‘blitz’ game is one where all the moves must be completed in a fixed time of 10 minutes or less for each player; or the allotted time plus 60 times any increment is 10 minutes or less for each player."

Akavall

@dboing

Here is a more concrete approach of what I was thinking, with some symbolic math (I tried!). I was thinking of Lightgbm; it has two nice features: 1) It is easy to do quantile regression 2) It handles missing values out of the box.

x1 = lichess blitz rating
x2 = lichess rapid rating
x3 = lichess classical rating
x4 = chesscom blitz rating
x5 = chesscom rapid rating
x6 = USCF classical rating

y = Fide Rating

And we want to estimate:

y = f(x1, x2, x3, x4, x5, x6)

Since we probably want to have confidence interval, we could compute, let's say 10% and 50% and 90% percentiles, which would give us three regressions:

y = f[0.1](x1, x2, x3, x4, x5, x6) # lower bound predicted estimate
y = f[0.5](x1, x2, x3, x4, x5, x6) # predicted fide rating estimate
y = f[0.9](x1, x2, x3, x4, x5, x6) # upper bound predicted estimate

So we would have to estimate three regressions.

And since we will probably have a lot of missing data points, a model that handles missing data out of the box is just nice use.

@dboing Here is a more concrete approach of what I was thinking, with some symbolic math (I tried!). I was thinking of Lightgbm; it has two nice features: 1) It is easy to do quantile regression 2) It handles missing values out of the box. x1 = lichess blitz rating x2 = lichess rapid rating x3 = lichess classical rating x4 = chesscom blitz rating x5 = chesscom rapid rating x6 = USCF classical rating y = Fide Rating And we want to estimate: y = f(x1, x2, x3, x4, x5, x6) Since we probably want to have confidence interval, we could compute, let's say 10% and 50% and 90% percentiles, which would give us three regressions: y = f[0.1](x1, x2, x3, x4, x5, x6) # lower bound predicted estimate y = f[0.5](x1, x2, x3, x4, x5, x6) # predicted fide rating estimate y = f[0.9](x1, x2, x3, x4, x5, x6) # upper bound predicted estimate So we would have to estimate three regressions. And since we will probably have a lot of missing data points, a model that handles missing data out of the box is just nice use.

BuzzardChecker

As perpetrator of the ECF formula, I'm not going to argue with your more "accurate fit", although I don't necessarily agree that it is better. There are many plausible fits given the wide range of outcomes. The one thing I will point out is something I long ignored. Most regression techniques minimise the error in the y-variable. In this work there is no reason why the dependent variable is one or other of the ratings. What one finds is that if one switches the variables in the regression the (transformed) equation is quite different, in this case at least.

Looking at the fit for USCF v FIDE something like 4 points are above the line for each one below. The USCF have a conversion of FIDE and CFC to their ratings that differ from yours (although the documentation does not seem to have been updated for the FIDE change).

As perpetrator of the ECF formula, I'm not going to argue with your more "accurate fit", although I don't necessarily agree that it is better. There are many plausible fits given the wide range of outcomes. The one thing I will point out is something I long ignored. Most regression techniques minimise the error in the y-variable. In this work there is no reason why the dependent variable is one or other of the ratings. What one finds is that if one switches the variables in the regression the (transformed) equation is quite different, in this case at least. Looking at the fit for USCF v FIDE something like 4 points are above the line for each one below. The USCF have a conversion of FIDE and CFC to their ratings that differ from yours (although the documentation does not seem to have been updated for the FIDE change).

NoseKnowsAll

@OnePlayAtATime This analysis was only done on classical chess. Yes, all systems define "classical" different (and chesscom doesn't even have a classical system - see conclusion 5). But in general, one's skill at games that take 60 minutes, 90 minutes, or 2 hours are similar. One's skill at playing a 3 minute game is fundamentally different, hence why I am explicitly not comparing blitz or rapid ratings for any of the rating systems.

NoseKnowsAll

@BuzzardChecker I agree that the USCF vs FIDE plot is lopsided in one direction when compared to the data I have. That is mostly because I weighted the accurate chesscom and lichess ratings of the players more heavily when defining the USCF cohorts. I discussed this in conclusion 3, but think of it based on the following simpler consideration. If every USCF 1510 player is 1650 chesscom and 1910 lichess, then would you not trust that the skill of a USCF 1510 player is equivalent to those online ratings? What if I told you that every USCF 1510 player was also 1500 FIDE (and not the purported 1650 FIDE from my model)? Yet all other 1500 FIDE players in Europe, Canada, etc seem to be more like 1350 chesscom and 1730 lichess players. This mismatch cannot be resolved. I believe the "correct" choice to make here is to more align the USCF ratings with the online ratings at the lower ratings and ignore the comparison with FIDE - most Americans know their FIDE rating is useless anyway because there are no FIDE events to play in!

You are correct BuzzardChecker - these cohort values should be symmetric. Comparing FIDE to USCF or USCF to FIDE should yield the same results. I took great pains to ensure that was true when I modeled the ECF:FIDE and FIDE:ECF values specifically based on initial feedback from ECF ChessDojo members. That's actually why I reported the ECF to FIDE mapping in both directions - I had originally derived it two different ways!

I saw in the original ECF blog post, "From past experience the relationship can move around and so the rating team will review the recommendation from time to time." So I figured I would take it upon myself to derive a cleaner fit based on updated data and do the leg-work for them. In particular, I initially found that the cross-over from ECF being lower than people's FIDEs to higher occurred at approximately 1805 ECF=FIDE, not the 1750 ECF=FIDE from the original proposition. This spurred me to dig deeper and generate a full "easy" fit.

Yes, as always with this type of work, there are many plausible fits. Some ECF players might be 1805 ECF and 1700 FIDE. Others might be 1900 FIDE. These systems do not have a 1:1 comparison, nor does 1700 FIDE necessarily mean the same thing in the US, UK, or India. The uncertainty inherent in such a comparison is what it is.

@BuzzardChecker I agree that the USCF vs FIDE plot is lopsided in one direction when compared to the data I have. That is mostly because I weighted the accurate chesscom and lichess ratings of the players more heavily when defining the USCF cohorts. I discussed this in conclusion 3, but think of it based on the following simpler consideration. If every USCF 1510 player is 1650 chesscom and 1910 lichess, then would you not trust that the skill of a USCF 1510 player is equivalent to those online ratings? What if I told you that every USCF 1510 player was also 1500 FIDE (and not the purported 1650 FIDE from my model)? Yet all other 1500 FIDE players in Europe, Canada, etc seem to be more like 1350 chesscom and 1730 lichess players. This mismatch cannot be resolved. I believe the "correct" choice to make here is to more align the USCF ratings with the online ratings at the lower ratings and ignore the comparison with FIDE - most Americans know their FIDE rating is useless anyway because there are no FIDE events to play in! You are correct BuzzardChecker - these cohort values should be symmetric. Comparing FIDE to USCF or USCF to FIDE should yield the same results. I took great pains to ensure that was true when I modeled the ECF:FIDE and FIDE:ECF values specifically based on initial feedback from ECF ChessDojo members. That's actually why I reported the ECF to FIDE mapping in both directions - I had originally derived it two different ways! I saw in the original ECF blog post, "From past experience the relationship can move around and so the rating team will review the recommendation from time to time." So I figured I would take it upon myself to derive a cleaner fit based on updated data and do the leg-work for them. In particular, I initially found that the cross-over from ECF being lower than people's FIDEs to higher occurred at approximately 1805 ECF=FIDE, not the 1750 ECF=FIDE from the original proposition. This spurred me to dig deeper and generate a full "easy" fit. Yes, as always with this type of work, there are many plausible fits. Some ECF players might be 1805 ECF and 1700 FIDE. Others might be 1900 FIDE. These systems do not have a 1:1 comparison, nor does 1700 FIDE necessarily mean the same thing in the US, UK, or India. The uncertainty inherent in such a comparison is what it is.

RogerH3

More interested in the stats per se than the chess. I'm guessing you are using R and ggplot2? Are you using stat_smooth() with default parameters for your best fit curves? thanks.

svensp edited

@Toscani said in #59:

There is a significant difference in ratings when players apply themselves compared to when they do not, especially when considering two players in a single game. This variance can be extremely wide, often spanning hundreds of rating points. Additionally, this does not even take into account whether an opening is sound or not. It is challenging to make comparisons when one day a player's performance is excellent and the next day it is lacking. Many players experience fluctuations in their performance; some days I perform well, while on other days, my performance falters. When we factor in that our opponents might be experiencing similar ups and downs, it becomes clear why there is such a wide window of opportunity for a player's rating to rise or fall.

That's true, but if there is such a thing at all as 'general playing strength' (which may vary more or less consistently across players according to playing mode) it may be justifiable to do comparisons across different type of ratings (i.e. OTB classical, lichess classical, chess com rapid) as they could all be treated as stand ins for this 'general playing strength', even though in general people of course play worse when playing a rapid match on chess com than when playing a classical OTB (or probably also when playing a classical lichess game). As long as this difference between modes is more or less consistent that should be fine.

On the other hand, if, among the group of people who have a FIDE rating as well as a chess com rapid rating (within this survey) some apply themselves more than other players during chess com rapid (as opposed to classical games - key would be an uneven difference among players), I feel it would be a valid point in terms of questioning the approach, because in a way the quantity we ultimately care about would have vanished from the data or become more hidden in it. However, even then it could perhaps just be treated as something that contributes to the general statistical uncertainty of the outcome (although not covered explicitly as some aspect of the model).

Also, what you mention about fluctuating performance day by day or event to event is a topic for any rating system (and also for anyone trying to see and measure improvement in their rating). I think on many servers it's covered by also giving a second quantity 'rating deviation'. I don't think FIDE has that (it does take account of the different variations with the k factor), but I may be wrong about this.

I think the basic assumption/approach was: 'There is such a thing as general playing strength and here are different rating systems measuring this thing. Let's compare them statistically according to the data.'

@Toscani said in #59: > There is a significant difference in ratings when players apply themselves compared to when they do not, especially when considering two players in a single game. This variance can be extremely wide, often spanning hundreds of rating points. Additionally, this does not even take into account whether an opening is sound or not. It is challenging to make comparisons when one day a player's performance is excellent and the next day it is lacking. Many players experience fluctuations in their performance; some days I perform well, while on other days, my performance falters. When we factor in that our opponents might be experiencing similar ups and downs, it becomes clear why there is such a wide window of opportunity for a player's rating to rise or fall. That's true, but if there is such a thing at all as 'general playing strength' (which may vary more or less consistently across players according to playing mode) it may be justifiable to do comparisons across different type of ratings (i.e. OTB classical, lichess classical, chess com rapid) as they could all be treated as stand ins for this 'general playing strength', even though in general people of course play worse when playing a rapid match on chess com than when playing a classical OTB (or probably also when playing a classical lichess game). As long as this difference between modes is more or less consistent that should be fine. On the other hand, if, among the group of people who have a FIDE rating as well as a chess com rapid rating (within this survey) some apply themselves more than other players during chess com rapid (as opposed to classical games - key would be an uneven difference among players), I feel it would be a valid point in terms of questioning the approach, because in a way the quantity we ultimately care about would have vanished from the data or become more hidden in it. However, even then it could perhaps just be treated as something that contributes to the general statistical uncertainty of the outcome (although not covered explicitly as some aspect of the model). Also, what you mention about fluctuating performance day by day or event to event is a topic for any rating system (and also for anyone trying to see and measure improvement in their rating). I think on many servers it's covered by also giving a second quantity 'rating deviation'. I don't think FIDE has that (it does take account of the different variations with the k factor), but I may be wrong about this. I think the basic assumption/approach was: 'There is such a thing as general playing strength and here are different rating systems measuring this thing. Let's compare them statistically according to the data.'

BuzzardChecker

Thanks for your full response #65 to my post #63. I now understand your approach better.

I will stop nit picking after one more point. Both the USCF/ FIDE and ECF/Chess.com fits seem to go below the point where there is direct data. In both cases by eye the bottom end of the fit could be elevated.

The dynamic fit of the relationships must be emphasised. Many correspondents have commented on why relationships may vary. What we know is that FIDE decided their ratings were misleading, but they did not really get to the bottom of why that happened. One must assume the problems will emerge slowly again; and it may be this sheds light on other list's weaknesses

In the UK there has been an upsurge in numbers playing and that tends to depress average ECF ratings. Since players tend to start their FIDE career much later (and this must be the case now the minimum FIDE rating is closer to average), the affect on FIDE ratings may be less dramatic. Indeed if we wanteed to make the "official conversion" more precise, then there would be separate adult and junior (predominately the new players) conversions.

Thanks for your full response #65 to my post #63. I now understand your approach better. I will stop nit picking after one more point. Both the USCF/ FIDE and ECF/Chess.com fits seem to go below the point where there is direct data. In both cases by eye the bottom end of the fit could be elevated. The dynamic fit of the relationships must be emphasised. Many correspondents have commented on why relationships may vary. What we know is that FIDE decided their ratings were misleading, but they did not really get to the bottom of why that happened. One must assume the problems will emerge slowly again; and it may be this sheds light on other list's weaknesses In the UK there has been an upsurge in numbers playing and that tends to depress average ECF ratings. Since players tend to start their FIDE career much later (and this must be the case now the minimum FIDE rating is closer to average), the affect on FIDE ratings may be less dramatic. Indeed if we wanteed to make the "official conversion" more precise, then there would be separate adult and junior (predominately the new players) conversions.

MBurns2020

cool stats! however, i hardly agree. my fide and online rating differences are minimal, everybody is different, there is so such thing as a category for this

tokisuno

the lichess to chess.com conversion seems insanely wrong at the early stages LOL

Your network blocks the Lichess assets!