11 September 2017

Ratings Correlated to Performance

Continuing with Early USCF Rating Issues, the chart on the left, from the 20 August 1951 issue of Chess Life (CL), shows the result of the 1951 U.S. Championship. The tournament started at the end of July 1951, and consisted of two stages.

The event was the subject of an editorial by Montgomery Major titled 'Consider the Rating System' in the 5 November 1951 issue of CL. The rating system had been introduced a year earlier and this was the first major test of its correlation to actual performance.


No MATHEMATICAL system of grading skill and proficiency will ever be quite accurate. for no system can evaluate the deviations from the expected to which the human mechanism will inevitably turn. Nor can the logics of mathematics evaluate and make allowance for the incalcuable human factors of weariness, stamina, digestion and moodiness. Why a master will be unbeatable in one tournament and in the next become the victim of numerous losses is physical or psychological, and it cannot be reduced to mathematical terms.

For that reason the National Rating System cannot perform the miracle of placing players in their exact relation to each other; and it is just as well that it cannot, for if it could predict in advance the relative ranking of players in a tournament there would not be much incentive for playing tournaments!

But the National Rating System can (and does) indicate the relative groupings of players in categories with more than casual accuracy. This is its justification; and the necessity for determining such categories is the reason for its existence. The Rating System does select players in groups and while it cannot with real accuracy determine the exact ranking of players in any one group, it can determine quite accurately the grouping in which any player belongs, when sufficient data is available on that player's performances,

Nowhere are these facts demonstrated more conclusively than in the recent U.S. Championship. Consider the first five players in the final standing. They were Evans (2554), Reshevsky (2747), Pavey (2441), Seidman (2451), and Horowitz (2565). The remaining contestants were in order Bernstein (2309), Santasiere (2304), Mengarini (2310), Shainswit (2444), Hanauer (2325), Pinkus (2421), and Simonson (2345).

Immediately it is obvious that with the exception of Shainswit and Pinkus all the players in the upper bracket of the Master Class (2400 or better) finished at the top, while those in the lower bracket (2300 to 2400) finished in the lower positions. This is what we would expect, if the Rating System lay any claims to accuracy as distinguishing between groups.

The fact that Shainswit and Pinkus were exceptions merely indicates the incalcuable human factor in playing chess which no system can evaluate -- the physical and psychological factor.

Turning to the preliminary rounds, the same general rule was in full evidence. Only one player with a rating over the 2300-2400 series failed to qualify for the finals; and as this player was Kevitz (2610) it is quite obvious that the physical strain to the elderly master was a decisive factor, for tournament chess remains a young man's game.

Within each grouping there is not, of course, the same accuracy. It is mathematically impossible to determine the exact shade of difference in strength between players of relatively the same strength; and the Rating System was not intended to do this. In addition there is the added factor that between players of relatively the same strength there is no conclusive determination possible as to which may be the stronger. Upon one occasion one may win, in the next encounter the other may be victorious.

Therefore, it is well advised to remember that the National Rating System is primarily designed to designate classes of players, and not to determine with precise accuracy the relative ranking of players within a class. That is to say, a player with the rating of 2304 may possibly be stronger than player rated 2325 -- the difference in points may be a reflection of the relative strength of the tournaments in which each has played recently. It may be even the reflection of temporary factors such as indigestion, melancholia, or simply weariness. But the difference between a player with a rating of 2450 and one with 2350 should be a difference in playing strength that as demonstratable over the chess board.

Montgomery Major

NB: This anecdotal analysis was produced some months after the event completed. The next post in the series will look at the use of ratings before a chess event takes place.

