Chatbot Arena and the Elo rating system - Part 2
Published:
In our previous blog post on Elo rating system, we introduced the basics of the Elo rating system and its online linear update algorithm, hereafter referred to as “online Elo”. However, we identified a significant concern with online Elo: its instability and tendency to bias toward recent results. For example, a demonstration from Chatbot Arena showed substantial shifts in model ratings when Elo was recalculated using the reverse order of matches.