Sunbelt Computer Software

Leaderboard Rankings

This bonus material implements two different ways to construct LM Arena (formerly Chatbot Arena) style leaderboards from pairwise comparisons.

Both implementations take in a list of pairwise preferences (left: winner, right: loser) from a json file via the --path argument. Here's an excerpt of the provided votes.json file:

[
  ["GPT-5", "Claude-3"],
  ["GPT-5", "Llama-4"],
  ["Claude-3", "Llama-3"],
  ["Llama-4", "Llama-3"],
  ...
]

Note: If you are not a uv user, replace uv run ...py with python ...py in the examples below.

Method 1: Elo ratings

Implements the popular Elo rating method (inspired by chess rankings) that was originally used by LM Arena
See the main notebook for details

➜  03_leaderboards git:(main) ✗ uv run 1_elo_leaderboard.py --path votes.json

Leaderboard (Elo) 
-----------------------
 1. GPT-5       1095.9
 2. Claude-3    1058.7
 3. Llama-4      958.2
 4. Llama-3      887.2

Method 2: Bradley-Terry model

Implements a Bradley-Terry model, similar to the new LM Arena leaderboard as described in the official paper (Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference)
Like on the LM Arena leaderboard, the scores are re-scaled to be similar to the original Elo scores
The code here uses the Adam optimizer from PyTorch to fit the model (for better code familiarity and readability)

➜  03_leaderboards git:(main) ✗ uv run 2_bradley_terry_leaderboard.py --path votes.json 

Leaderboard (Bradley-Terry)
-----------------------------
 1. GPT-5       1140.6
 2. Claude-3    1058.7
 3. Llama-4      950.3
 4. Llama-3      850.4

Name		Name	Last commit message	Last commit date
parent directory ..
1_elo_leaderboard.py		1_elo_leaderboard.py
2_bradley_terry_leaderboard.py		2_bradley_terry_leaderboard.py
README.md		README.md
votes.json		votes.json

Sunbelt Computer Software

PL/B Language Development and Support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Leaderboard Rankings

Method 1: Elo ratings

Method 2: Bradley-Terry model

Sunbelt Computer Software

PL/B Language Development and Support

FilesExpand file tree

03_leaderboards

Directory actions

More options

Directory actions

More options

Latest commit

History

03_leaderboards

Folders and files

parent directory

README.md

Leaderboard Rankings

Method 1: Elo ratings

Method 2: Bradley-Terry model