how it works · elo

The math that makes a top 10 settle.

Most ranking systems take a list and ask you to push it around. either.fm takes nothing and lets a list assemble itself out of duels. The bookkeeping that makes that work is Elo — the same rating system chess has used since 1960.

The basics.

Every song starts with an Elo of 1500. After a duel, the winner's number goes up and the loser's goes down. The size of the change isn't fixed — it depends on whether the result was expected. Beat a song with a higher Elo and you climb fast; beat one with a lower Elo and you barely move.

Nerd alert

The expected-score formula

For a duel between songs A and B with Elo ratings R_A and R_B:

E_A = 1 / (1 + 10^((R_B − R_A) / 400))

E_A is the probability the rating system thinks A will win, in a range of 0 to 1. If A and B have the same Elo, E_A = 0.5. A 200-point gap puts the favorite at about 0.76.

The update step

After the duel, with S_A = 1 if A wins and 0 if B wins:

R_A' = R_A + K · (S_A − E_A)

K is the volatility knob. Chess uses 16-32; we use K = 32 in src/lib/elo.ts. Songs aren't chess players — voters' moods change, the catalog is small, the signal is noisy — so we keep K on the higher end so a few clean upsets actually reorder the leaderboard.

Why we don't use percentages

Win rate alone tells you how often a song wins, but not against whom. A song that beats easy opponents 90% of the time may rank below a song that wins 55% of its duels against leaderboard-toppers. Elo combines record + opponent strength in one number — that's what we want.

Where we deviate from chess Elo

Chess matches are pre-arranged; ours are random. That helps us in one way (less skill-matching bias) and hurts in another (popular songs get sampled disproportionately and stabilize first). The pair-picker weights toward songs with fewer votes to balance the sample.

What an Elo number means in practice.

Above 1700 consensus top tier — wins most matchups, loses cleanly to the other top tier
1500–1700 solid catalog member — depends on the matchup
1300–1500 underrated or under-voted — small samples, room to climb
Under 1300 voters keep skipping or losing — either a deep cut or a real outlier

The honest limit.

Elo measures consensus, not quality. Anti-Hero climbing doesn't mean it's better; it means more voters picked it more often. A great deep cut can stay low if it's only voted by people who don't know the rest of the catalog. We surface that bias on the per-song page as vote count next to the rank — a song with 6 votes at #4 is much less stable than a song with 60.

Back to the plain-language version if you skipped here straight from the press release.