GradeThread
Published, not promised

Grading accuracy & transparency report

A grade is only worth trusting if its accuracy is measured and shown. This page reports, platform-wide, how closely GradeThread's AI grades match expert human reviewers, how confident the model is, and how often buyers dispute a graded item — alongside the eval gate and model changelog that keep the standard improving over time.

How accurate the grades are

Whenever a human expert reviews a grade, we compare their score to the AI's. These figures cover every reviewed grade across the platform.

AI-vs-human agreement

Grades within half a point of the human reviewer

Mean error vs. reviewers

Average points of difference (lower is better)

Average model confidence

Across recent grades

Routed to human review

Low-confidence grades checked before finalizing

Intentional-design misread rate

Design (e.g. distressing) mistaken for damage — lower is better

Buyer dispute rate

Of opted-in graded sales

Items graded

Expert reviews

Graded sales tracked

How the standard improves over time

Accuracy isn't a one-time claim — it's maintained by a closed loop.

  1. 1

    Every grade is attributed to a model version

    So accuracy can be measured per version, per factor, and per garment category — not as a single vague average.

  2. 2

    Human reviewers correct and the loop learns

    Reviewer corrections (including when design was mistaken for damage) and post-sale buyer disputes are fed back as signal on where grading drifts.

  3. 3

    New models must clear a published eval gate

    A candidate version cannot grade live items until it beats a fixed maximum error and minimum agreement on a golden set of expert-graded garments.

  4. 4

    An automated monitor watches for regressions

    On a schedule, the live grader is re-checked against the golden set and against production reviews and disputes. If quality drifts below threshold, the team is alerted before it slips further.

Model changelog

Grading model versions that have cleared the eval gate, newest first. Each row is a version proven against the golden set before it went live.

Eval-gated model releases will appear here as new grading versions are promoted.

For the full rubric and weighting behind every grade, see the grading standard.

Transparency FAQ

How accurate is GradeThread's AI grading?
We publish it. Every grade a human reviewer checks is compared to the AI's grade, and we report the agreement rate (share within half a point) and mean absolute error against expert reviewers on this page — updated continuously as more grades are reviewed.
How does GradeThread improve over time?
Reviewer corrections and post-sale buyer disputes feed an accuracy loop, and every new grading model version must clear a fixed eval gate — a maximum error and minimum agreement against a golden set of expert-graded garments — before it can grade live items. The model changelog on this page lists versions that passed.
What stops a grading model from getting worse?
An automated monitor re-checks the live grader on a schedule against the same golden set and against production reviews and disputes. If accuracy drifts below threshold, the team is alerted before quality slips further.
Do buyers have to trust a black box?
No. The rubric and weights are published, every grade carries a confidence score, low-confidence grades are routed to human review, and these platform-wide accuracy figures are public — so the standard is verifiable, not opaque.

Ready to Grade Smarter?

Join resellers who trust GradeThread to standardize condition grading, build buyer confidence, and sell faster.