How Does KOdex Work?
How the ratings, projections, and records are actually computed, and where the numbers are exact versus approximate.
1. Data
KOdex covers every fight it can source from the two canonical feeds: 9,900+ competitive fights from 2,300+ bots, starting in 2018 with chronologically organized data leading up to today. Bots are organized into NHRL's three weight classes and kept as separate pools. The two canonical data sources are combined with provenance: BrettZone is the current authority. This has the live event reporting, more metadata, and carries data from 2023 on. That data is supplemented by the same source that NHRL Wiki pulls from: statsbook API. The statsbook data is cleaner, and has the pre-2023 data, but lacks some of the features that BrettZone has.
They are deduplicated conservatively with BrettZone treated as complete for its era, statsbook only filling strictly before it and occasionally used for sanity checking.
2. Glicko-2
Glicko-2 ratings are opponent-adjusted, which basically means that beating a strong bot moves you more than farming weak bots, or rookies. These are calculated by replaying fights chronologically over the whole history, one fight at a time, adjusting rating strictly point-in-time so a result never leaks into its own forecast. Each bot's RD (uncertainty) starts near 350 and shrinks as it completes more fights:
You may notice that bots with a yellow colored rating (a still-uncertain, high-RD bot) but a higher Glicko-2 score are ranked lower on the leaderboard than bots with a green colored rating (a settled, low-RD bot) but a lower Glicko-2 score. Leaderboards rank by the conservative rating rather than the raw number, so a rookie bot which has only completed a few fights cannot awkwardly out-rank a veteran on uncertainty alone:
Glicko-2 is great for predicting outcomes for seasoned bots with history. On a no-leakage backtest, KOdex scores 0.225 by Brier score (the average squared error of the probability put on each outcome):
Brier score grades the probabilities themselves: 0 is perfect, and always guessing 50/50 scores 0.250 (each prediction sits 0.5 from the actual 0/1 result, and 0.5² = 0.25). So lower is better.
3. Volatility
Volatility (σ) measures how streaky a bot's recent results are. It says nothing about skill or certainty. Two things to know: it never affects a bot's odds (it is purely descriptive), and it only becomes meaningful past about 15 fights. Below that a bot sits at the 0.06 seed, the same way RD stays high until a bot proves itself, so it reads best as a steady-versus-streaky label rather than a raw decimal.
4. Scopes and seasons
KOdex keeps a separate rating book per scope:
- Current: a rolling current-plus-previous-season window, the same basis as NHRL's official rankings. The default leaderboard.
- All-time: the full history.
- Per season: one book per calendar year.
Arbitrary custom season ranges are not available yet.
5. Head-to-head: exact vs. approximate
The win probability is exact and straight from the engine's own formula. The rating gap between two bots becomes a win probability through the S-shaped curve. But each bot's uncertainty first shrinks that gap, so when both bots are unproven, the curve flattens toward 50% and the fight reads as more of a coin flip. The formula is:
Raw Glicko was a little overconfident, so the served odds also apply a display-only temperature (T = 1.4) that widens the effective rating gap. It is monotonic, so the favorite is still the favorite and rankings never move; it just stops overstating a mismatch, so a 400-point edge reads about 84% rather than 91%.
The projected rating, and the RD swings, are a credible approximation of what the actual full rating change would be, within about a point. So the projection is accurate enough to display but it isn't exact in the way the probability estimation is. Projections are scope-aware, so a bot with no rating in the chosen window returns no projection rather than a wild useless guess.
6. Archetypes
An archetype is a descriptive label for how a bot tends to win, derived from the distribution of its win methods. It is not part of the rating and never changes a bot's number.
| Archetype | Wins by | Share of wins | What it means |
|---|---|---|---|
| Executioner | Knockout (KO) | over 33% | Ends fights with a clean knockout |
| Damage Engine | Tap-out (TO) | over 33% | Piles on damage until the other team taps out |
| Strategist | Judges' decision (JD) | over 50% | Goes the distance and wins on the judges' scorecards |
| Deus Ex Machina | Technicality (FF / TKO / DQ) | over 60% | Wins mostly by technicality, a true outlier |
| Well-Rounded | no single method clears its bar | n/a | No single way to win, dangerous from every angle |
A bot needs at least 9 wins before any archetype unlocks; below that it stays unlabeled, the same caution as a high RD. Among the archetypes that clear their own bar, the one with the highest share wins; a tie breaks toward the rarer archetype, and if nothing clears, the bot is Well-Rounded.
Why the bars differ: a third of wins by knockout or tap-out already marks a clear finishing identity, so those sit at 33%. A Strategist needs over half its wins to come on the judges' scorecards, and a Deus Ex Machina needs a 60% supermajority of technicality wins (forfeits, technical KOs, and disqualifications), rare enough to demand a higher bar.
7. Badges
There are two different types of badges. Records are rare "unique" badges awarded per weight class, and are only given to multiple bots if they are tied. There are also milestones (Golden Dumpster, World Finals champion and attendee, podium, Teams competitor, seasoned veteran), which are awarded to every bot that meets the criteria. Both of these categories derive their rarity from holder counts. Rarity level gives an animation whose intensity scales with how rare the badge is. Holder counts are displayed in detail on the badges page and the thresholds are as follows:
| Tier | Held by | What it means |
|---|---|---|
| Platinum | 2% or fewer of active bots | Held by almost no one. The strongest animation. |
| Gold | 10% or fewer of active bots | Genuinely uncommon across the active field. |
| Silver | 25% or fewer of active bots | Notable, but a fair few bots hold it. |
| Common | more than 25% of active bots | No tier chip and no rarity animation. |
8. Durations
Length is available for about 86% of fights. Rows without a usable length show blank rather than zero. A fight that goes to JD (and is 2023+) reports as 180s, and because the live clock can overrun the buzzer, 2023-onward lengths are capped at 180, while the pre-2023 4-minute era keeps its true length of up to 240.
9. Included and excluded
Included: full 2018-to-present history, Glicko-2 in all scopes, per-fight rating trails, head-to-head, badges, durations, placements, and official seed and ranking. KOdex counts invitationals like the Teams event that the statsbook omits.
Excluded or approximate: Non-NHRL tournaments, byes are recorded but never rated, and the handful of 2023-boundary fights noted above. See the About page for data-source and non-affiliation disclosures.