Grading accuracy & transparency report
A grade is only worth trusting if its accuracy is measured and shown. This page reports, platform-wide, how closely GradeThread's AI grades match expert human reviewers, how confident the model is, and how often buyers dispute a graded item — alongside the eval gate and model changelog that keep the standard improving over time.
How accurate the grades are
Whenever a human expert reviews a grade, we compare their score to the AI's. These figures cover every reviewed grade across the platform.
AI-vs-human agreement
…
Grades within half a point of the human reviewer
Mean error vs. reviewers
…
Average points of difference (lower is better)
Average model confidence
…
Across recent grades
Routed to human review
…
Low-confidence grades checked before finalizing
Intentional-design misread rate
…
Design (e.g. distressing) mistaken for damage — lower is better
Buyer dispute rate
…
Of opted-in graded sales
Items graded
…
Expert reviews
…
Graded sales tracked
…
How the standard improves over time
Accuracy isn't a one-time claim — it's maintained by a closed loop.
- 1
Every grade is attributed to a model version
So accuracy can be measured per version, per factor, and per garment category — not as a single vague average.
- 2
Human reviewers correct and the loop learns
Reviewer corrections (including when design was mistaken for damage) and post-sale buyer disputes are fed back as signal on where grading drifts.
- 3
New models must clear a published eval gate
A candidate version cannot grade live items until it beats a fixed maximum error and minimum agreement on a golden set of expert-graded garments.
- 4
An automated monitor watches for regressions
On a schedule, the live grader is re-checked against the golden set and against production reviews and disputes. If quality drifts below threshold, the team is alerted before it slips further.
Model changelog
Grading model versions that have cleared the eval gate, newest first. Each row is a version proven against the golden set before it went live.
Eval-gated model releases will appear here as new grading versions are promoted.
For the full rubric and weighting behind every grade, see the grading standard.
Transparency FAQ
- How accurate is GradeThread's AI grading?
- We publish it. Every grade a human reviewer checks is compared to the AI's grade, and we report the agreement rate (share within half a point) and mean absolute error against expert reviewers on this page — updated continuously as more grades are reviewed.
- How does GradeThread improve over time?
- Reviewer corrections and post-sale buyer disputes feed an accuracy loop, and every new grading model version must clear a fixed eval gate — a maximum error and minimum agreement against a golden set of expert-graded garments — before it can grade live items. The model changelog on this page lists versions that passed.
- What stops a grading model from getting worse?
- An automated monitor re-checks the live grader on a schedule against the same golden set and against production reviews and disputes. If accuracy drifts below threshold, the team is alerted before quality slips further.
- Do buyers have to trust a black box?
- No. The rubric and weights are published, every grade carries a confidence score, low-confidence grades are routed to human review, and these platform-wide accuracy figures are public — so the standard is verifiable, not opaque.
Ready to Grade Smarter?
Join resellers who trust GradeThread to standardize condition grading, build buyer confidence, and sell faster.