Introducing a universal rating converter for 2024

F-35_Raptor

and I am a 2900 and not even 2300 in FIDE

CondensedWater

Finally!!! Closure for the ignorant!!

MartianBlitz

@F-35_Raptor said in #11:

and I am a 2900 and not even 2300 in FIDE

You are around 2900 in Bullet on Lichess.

The article says:
"The above ratings refer to your classical lichess rating or rapid chesscom rating."

@F-35_Raptor said in #11: > and I am a 2900 and not even 2300 in FIDE You are around 2900 in Bullet on Lichess. The article says: "The above ratings refer to your classical lichess rating or rapid chesscom rating."

svensp edited

@RookyBeach said in #6:

@NoseKnowsAll but even after filtering and pruning you can use the remaining sample sizes per conversion to provide confidence intervals right? even if they are slightly biased because of the pruning and filtering. It's very hard to extract any sample size numbers from just the plots.

Unrelated to that, which model did you use? It looks like a step wise linear fit, but in reality the relation would be perfectly linear (since rating differences basically express expected scores which would be the same no matter the system), right? so there is probably some overfitting

It should be just an offset basically (as the rating difference would be the same and the tables performance vs rating difference are the same) if the rating systems are functional and Elo-like. However, this cannot always be assumed for everybody playing (for example somebody could play few tournaments but have a change in playing strength).

I don't think with FIDE for example it would be perfectly linear in relation to other systems, because I think it is very unlikely that whatever the original problem was, granting 40% on the road to 2000 for everyone below 2000 and no changes above 2000 will lead immediately to a situation where the ratings accurately predict the winning probability according to the elo tables.

I imagine there were people who were underrated and people who weren't, but both their ratings have been equally increased (which may be understandable, because imagine the outcry if they had done an increase selectively per federation for example). So if we take a player of 2100 strength who was properly rated and a player of originally 1000 strength who was already properly rated and now has 1400, the 2100 will perform like a 2500 against this person as they will still perform according to a rating difference of 1100 points as nothing has changed in their playing strengths.

If you are above 2000 and play against low rated players who had an appropriate rating before the change, you should be able to perform hundreds of points above your rating without any change in playing strength (it's also true below 2000, but there it is somewhat compensated by your own increase). This is kind of unfortunate. But I think it also leads to more complex relations between the rating systems.

@RookyBeach said in #6: > @NoseKnowsAll but even after filtering and pruning you can use the remaining sample sizes per conversion to provide confidence intervals right? even if they are slightly biased because of the pruning and filtering. It's very hard to extract any sample size numbers from just the plots. > > Unrelated to that, which model did you use? It looks like a step wise linear fit, but in reality the relation would be perfectly linear (since rating differences basically express expected scores which would be the same no matter the system), right? so there is probably some overfitting It should be just an offset basically (as the rating difference would be the same and the tables performance vs rating difference are the same) if the rating systems are functional and Elo-like. However, this cannot always be assumed for everybody playing (for example somebody could play few tournaments but have a change in playing strength). I don't think with FIDE for example it would be perfectly linear in relation to other systems, because I think it is very unlikely that whatever the original problem was, granting 40% on the road to 2000 for everyone below 2000 and no changes above 2000 will lead immediately to a situation where the ratings accurately predict the winning probability according to the elo tables. I imagine there were people who were underrated and people who weren't, but both their ratings have been equally increased (which may be understandable, because imagine the outcry if they had done an increase selectively per federation for example). So if we take a player of 2100 strength who was properly rated and a player of originally 1000 strength who was already properly rated and now has 1400, the 2100 will perform like a 2500 against this person as they will still perform according to a rating difference of 1100 points as nothing has changed in their playing strengths. If you are above 2000 and play against low rated players who had an appropriate rating before the change, you should be able to perform hundreds of points above your rating without any change in playing strength (it's also true below 2000, but there it is somewhat compensated by your own increase). This is kind of unfortunate. But I think it also leads to more complex relations between the rating systems.

F-35_Raptor

@MartianBlitz said in #13:

You are around 2900 in Bullet on Lichess.

The article says:
"The above ratings refer to your classical lichess rating or rapid chesscom rating."

ok, but thats still innacurate, I could easily reach 2400 classical if I wanted to but lets leave that

@MartianBlitz said in #13: > You are around 2900 in Bullet on Lichess. > > The article says: > "The above ratings refer to your classical lichess rating or rapid chesscom rating." ok, but thats still innacurate, I could easily reach 2400 classical if I wanted to but lets leave that

NoseKnowsAll

@F-35_Raptor As I mentioned in the blog several times, this only applies to lichess classical ratings. Moreover, in conclusion 2, I clearly stated "The absolute top of the online rating pools do not provide reasonable estimates for OTB ratings. I would personally draw the line at 2275 chesscom rapid and 2310 lichess classical ratings..." and provided a reasoning behind that point.

RookyBeach

@dboing
@dboing said in #9:

And other question. maybe related to the post about confidence notions. Can one bathe this data exercise into a prediction one, from sub-sampling the data used to fit the global model with certain number of adjustable parameters, and then use the reamaining data to test for predicting value.

yes exactly. You basically need some kind of holdout set to estimate how good the fit really is. You could ofc also throw some neural network at the problem and get super close predictions on the training data, but this is just caused by overfitting and won't perform good on a test set. The same happens here, but to a smaller degree when we use piecewise fits for a non piecewise relation

@dboing @dboing said in #9: > And other question. maybe related to the post about confidence notions. Can one bathe this data exercise into a prediction one, from sub-sampling the data used to fit the global model with certain number of adjustable parameters, and then use the reamaining data to test for predicting value. yes exactly. You basically need some kind of holdout set to estimate how good the fit really is. You could ofc also throw some neural network at the problem and get super close predictions on the training data, but this is just caused by overfitting and won't perform good on a test set. The same happens here, but to a smaller degree when we use piecewise fits for a non piecewise relation

NoseKnowsAll

I agree that there should NOT be a linear fit between the different rating systems. First of all, many of the systems are not Elo or Elo-like. Chesscom uses the Glicko system, Lichess famously uses Glicko-2, and there's no guarantee that an Elo-like system like the USCF system will preserve rating differences with an Elo system like FIDE. Moreover, the FIDE compression in March this year fundamentally showed that the ratings computed by naive Elo were out-of-touch with the reality for many lower-rated players in a nonlinear fashion. So yes, you should not expect a linear relationship across the rating systems. One of the many reasons this rating converter was so difficult to create in the first place!

@RookyBeach and @svensp - Yes, for simplicity, you can assume it's piecewise linear between the points reported in the table. As you know, if you zoom in on any nonlinear plot enough, you'll eventually be left with a linear plot anyway (because slopes are linear and essentially define the change in the plot once you zoom in to the infinitesimals). Subdividing this overall nonlinear set of points into so many intervals is similar to this act of "zooming in." I agree that there should NOT be a linear fit between the different rating systems. First of all, many of the systems are not Elo or Elo-like. Chesscom uses the Glicko system, Lichess famously uses Glicko-2, and there's no guarantee that an Elo-like system like the USCF system will preserve rating differences with an Elo system like FIDE. Moreover, the FIDE compression in March this year fundamentally showed that the ratings computed by naive Elo were out-of-touch with the reality for many lower-rated players in a nonlinear fashion. So yes, you should not expect a linear relationship across the rating systems. One of the many reasons this rating converter was so difficult to create in the first place!

Schachgestalt

@F-35_Raptor said in #10:

this is just funny lol, you consider a 2470 on lichess 2400 Fide, I know 1600s with 2600 rating on lichess lol

Really? Who?

@F-35_Raptor said in #11:

and I am a 2900 and not even 2300 in FIDE

That's bullet. Differences of up to 700 can happen, due to mouse skills being a big factor, which is not relevant in OTB.

@F-35_Raptor said in #15:

ok, but thats still innacurate, I could easily reach 2400 classical if I wanted to but lets leave that

2440, says the article. 2410 is just ELO 2200.

@F-35_Raptor said in #10: > this is just funny lol, you consider a 2470 on lichess 2400 Fide, I know 1600s with 2600 rating on lichess lol Really? Who? @F-35_Raptor said in #11: > and I am a 2900 and not even 2300 in FIDE That's bullet. Differences of up to 700 can happen, due to mouse skills being a big factor, which is not relevant in OTB. @F-35_Raptor said in #15: > ok, but thats still innacurate, I could easily reach 2400 classical if I wanted to but lets leave that 2440, says the article. 2410 is just ELO 2200.

RookyBeach

@NoseKnowsAll said in #18:

I agree that there should NOT be a linear fit between the different rating systems. First of all, many of the systems are not Elo or Elo-like. Chesscom uses the Glicko system, Lichess famously uses Glicko-2, and there's no guarantee that an Elo-like system like the USCF system will preserve rating differences with an Elo system like FIDE. Moreover, the FIDE compression in March this year fundamentally showed that the ratings computed by naive Elo were out-of-touch with the reality for many lower-rated players in a nonlinear fashion. So yes, you should not expect a linear relationship across the rating systems. One of the many reasons this rating converter was so difficult to create in the first place!

it's not perfectly linear, but should be very close. Elo and Glicko/Glicko2 expected win rates are very similar. The compression was there to counteract the existing problems as best as possible so now the rating differences should be meaningful again. Ofc there will be players that got pushed up too high and others that were not pushed high enough, but on average it should even out.

The most difficult part of fitting a conversion tool like this is the data aggregation, cleaning and preprocessing, which can have a lot of impact. Whether you then fit something linear, piecewise linear or nonlinear shouldn't be much difference in effort right? You could get a lower error using xgboost or a neural network (but in return have a lot of overfitting), use a piecewise function like you did (and have less overfitting but still overfitting) or just use a linear function which might underfit. Without a proper evaluation set there is no way too know which one works best, which is why this is so important. Otherwise we don't have any clue whether this conversion is any better than any of the other converters online.

@NoseKnowsAll said in #18: > I agree that there should NOT be a linear fit between the different rating systems. First of all, many of the systems are not Elo or Elo-like. Chesscom uses the Glicko system, Lichess famously uses Glicko-2, and there's no guarantee that an Elo-like system like the USCF system will preserve rating differences with an Elo system like FIDE. Moreover, the FIDE compression in March this year fundamentally showed that the ratings computed by naive Elo were out-of-touch with the reality for many lower-rated players in a nonlinear fashion. So yes, you should not expect a linear relationship across the rating systems. One of the many reasons this rating converter was so difficult to create in the first place! it's not perfectly linear, but should be very close. Elo and Glicko/Glicko2 expected win rates are very similar. The compression was there to counteract the existing problems as best as possible so now the rating differences should be meaningful again. Ofc there will be players that got pushed up too high and others that were not pushed high enough, but on average it should even out. The most difficult part of fitting a conversion tool like this is the data aggregation, cleaning and preprocessing, which can have a lot of impact. Whether you then fit something linear, piecewise linear or nonlinear shouldn't be much difference in effort right? You could get a lower error using xgboost or a neural network (but in return have a lot of overfitting), use a piecewise function like you did (and have less overfitting but still overfitting) or just use a linear function which might underfit. Without a proper evaluation set there is no way too know which one works best, which is why this is so important. Otherwise we don't have any clue whether this conversion is any better than any of the other converters online.

Your network blocks the Lichess assets!

Introducing a universal rating converter for 2024