Skip to content
KOdex

How Does KOdex Work?

How the ratings, projections, and records are actually computed, and where the numbers are exact versus approximate.

1. Data

KOdex covers every fight it can source from the two canonical feeds: 9,900+ competitive fights from 2,300+ bots, starting in 2018 with chronologically organized data leading up to today. Bots are organized into NHRL's three weight classes and kept as separate pools. The two canonical data sources are combined with provenance: BrettZone is the current authority. This has the live event reporting, more metadata, and carries data from 2023 on. That data is supplemented by the same source that NHRL Wiki pulls from: statsbook API. The statsbook data is cleaner, and has the pre-2023 data, but lacks some of the features that BrettZone has.

They are deduplicated conservatively with BrettZone treated as complete for its era, statsbook only filling strictly before it and occasionally used for sanity checking.

2. Glicko-2

Glicko-2 ratings are opponent-adjusted, which basically means that beating a strong bot moves you more than farming weak bots, or rookies. These are calculated by replaying fights chronologically over the whole history, one fight at a time, adjusting rating strictly point-in-time so a result never leaks into its own forecast. Each bot's RD (uncertainty) starts near 350 and shrinks as it completes more fights:

Illustrative shape: RD falls fast over a bot's first fights and settles near 30 once its level is well known.

You may notice that bots with a yellow colored rating (a still-uncertain, high-RD bot) but a higher Glicko-2 score are ranked lower on the leaderboard than bots with a green colored rating (a settled, low-RD bot) but a lower Glicko-2 score. Leaderboards rank by the conservative rating rather than the raw number, so a rookie bot which has only completed a few fights cannot awkwardly out-rank a veteran on uncertainty alone:

Rrank=r2RDR_{\text{rank}} = r - 2\,\mathrm{RD}

Glicko-2 is great for predicting outcomes for seasoned bots with history. On a no-leakage backtest, KOdex scores 0.225 by Brier score (the average squared error of the probability put on each outcome):

BS=1N(pioi)2\mathrm{BS}=\frac{1}{N}\sum (p_i-o_i)^2

Brier score grades the probabilities themselves: 0 is perfect, and always guessing 50/50 scores 0.250 (each prediction sits 0.5 from the actual 0/1 result, and 0.5² = 0.25). So lower is better.

Brier score on held-out fights: KOdex 0.225 beats both a flat 50/50 guess (0.250) and a win-rate baseline (0.260, which is worse than guessing). Lower is better.

3. Volatility

Volatility (σ) measures how streaky a bot's recent results are. It says nothing about skill or certainty. Two things to know: it never affects a bot's odds (it is purely descriptive), and it only becomes meaningful past about 15 fights. Below that a bot sits at the 0.06 seed, the same way RD stays high until a bot proves itself, so it reads best as a steady-versus-streaky label rather than a raw decimal.

4. Scopes and seasons

KOdex keeps a separate rating book per scope:

  • Current: a rolling current-plus-previous-season window, the same basis as NHRL's official rankings. The default leaderboard.
  • All-time: the full history.
  • Per season: one book per calendar year.

Arbitrary custom season ranges are not available yet.

5. Head-to-head: exact vs. approximate

The win probability is exact and straight from the engine's own formula. The rating gap between two bots becomes a win probability through the S-shaped curve. But each bot's uncertainty first shrinks that gap, so when both bots are unproven, the curve flattens toward 50% and the fight reads as more of a coin flip. The formula is:

P(A beats B)=11+10g(rArB)/(400T),T=1.4g=11+3q2 ⁣(RDA2+RDB2)/π2,q=ln10400\begin{aligned} P(A\text{ beats }B) &= \frac{1}{1+10^{-\,g\,(r_A-r_B)/(400\,T)}},\quad T=1.4 \\[4pt] g &= \frac{1}{\sqrt{1+ 3q^{2}\!\left(\mathrm{RD}_A^{2}+\mathrm{RD}_B^{2}\right)/\pi^{2}}},\quad q=\tfrac{\ln 10}{400} \end{aligned}
Both established (low RD) Both provisional (high RD)
Same rating gap, two confidence levels: when both bots are unproven, the curve flattens toward a coin flip.

Raw Glicko was a little overconfident, so the served odds also apply a display-only temperature (T = 1.4) that widens the effective rating gap. It is monotonic, so the favorite is still the favorite and rankings never move; it just stops overstating a mismatch, so a 400-point edge reads about 84% rather than 91%.

The projected rating, and the RD swings, are a credible approximation of what the actual full rating change would be, within about a point. So the projection is accurate enough to display but it isn't exact in the way the probability estimation is. Projections are scope-aware, so a bot with no rating in the chosen window returns no projection rather than a wild useless guess.

6. Archetypes

An archetype is a descriptive label for how a bot tends to win, derived from the distribution of its win methods. It is not part of the rating and never changes a bot's number.

ArchetypeWins byShare of winsWhat it means
ExecutionerKnockout (KO)over 33%Ends fights with a clean knockout
Damage EngineTap-out (TO)over 33%Piles on damage until the other team taps out
StrategistJudges' decision (JD)over 50%Goes the distance and wins on the judges' scorecards
Deus Ex MachinaTechnicality (FF / TKO / DQ)over 60%Wins mostly by technicality, a true outlier
Well-Roundedno single method clears its barn/aNo single way to win, dangerous from every angle

A bot needs at least 9 wins before any archetype unlocks; below that it stays unlabeled, the same caution as a high RD. Among the archetypes that clear their own bar, the one with the highest share wins; a tie breaks toward the rarer archetype, and if nothing clears, the bot is Well-Rounded.

Why the bars differ: a third of wins by knockout or tap-out already marks a clear finishing identity, so those sit at 33%. A Strategist needs over half its wins to come on the judges' scorecards, and a Deus Ex Machina needs a 60% supermajority of technicality wins (forfeits, technical KOs, and disqualifications), rare enough to demand a higher bar.

Archetypes are intended to be a fun, intuitive way to understand a bot's play style at a glance, but there is not yet enough telemetry or bot metadata to perform more than a superficial analysis of what drives them. They are a work-in-progress and the parameters are currently mostly subjective to ensure some amount of spread across bots, so take them with a grain of salt.

7. Badges

There are two different types of badges. Records are rare "unique" badges awarded per weight class, and are only given to multiple bots if they are tied. There are also milestones (Golden Dumpster, World Finals champion and attendee, podium, Teams competitor, seasoned veteran), which are awarded to every bot that meets the criteria. Both of these categories derive their rarity from holder counts. Rarity level gives an animation whose intensity scales with how rare the badge is. Holder counts are displayed in detail on the badges page and the thresholds are as follows:

TierHeld byWhat it means
Platinum2% or fewer of active botsHeld by almost no one. The strongest animation.
Gold10% or fewer of active botsGenuinely uncommon across the active field.
Silver25% or fewer of active botsNotable, but a fair few bots hold it.
Commonmore than 25% of active botsNo tier chip and no rarity animation.
Two carry caveats. Longest average match duration mixes the pre-2023 4-minute era with the 3-minute one, so it can favor older bots. Fastest average KO counts pure knockouts only. Duration tracking is SKETCHY over API to say the least, so this is a developing area at the moment.

8. Durations

Length is available for about 86% of fights. Rows without a usable length show blank rather than zero. A fight that goes to JD (and is 2023+) reports as 180s, and because the live clock can overrun the buzzer, 2023-onward lengths are capped at 180, while the pre-2023 4-minute era keeps its true length of up to 240.

9. Included and excluded

Included: full 2018-to-present history, Glicko-2 in all scopes, per-fight rating trails, head-to-head, badges, durations, placements, and official seed and ranking. KOdex counts invitationals like the Teams event that the statsbook omits.

Excluded or approximate: Non-NHRL tournaments, byes are recorded but never rated, and the handful of 2023-boundary fights noted above. See the About page for data-source and non-affiliation disclosures.

How it works - KOdex