← Home LivePerson · AI training
04

Taxonomy annotation platform

AI models learn only as fast as humans can label their training data. I designed an annotation platform that made classifying conversations dramatically faster — and more accurate at the same time.

86% Faster classification
–15% Error rate
TeamProduct designer · team of 4
When2019
01

The challenge

Every sentiment and intent model we trained started the same way: engineers hand-classifying thousands of consumer messages in a spreadsheet. It was slow and exhausting, and after enough hours of it, people started making mistakes they wouldn't have made fresh — misclicks, inconsistent calls, the kind of noise that shows up in a model's accuracy months later.

“After annotating for a while... I get tired and I end up misclicking a lot.”

— Data Annotation Specialist
02

Process

Interviews with the people doing the labeling, plus a look at how Prodigy and Label Studio handled the same problem, pointed to three rules: keyboard-first so hands never leave the row, a visual hierarchy calm enough to survive hour four of a shift, and confirmation steps that catch a bad click before it becomes bad training data.

Annotation tool interface dashboard
An early pass at the dashboard — the day's tasks, results, and benchmarks in one place.
Annotation tool interface with controls beneath the conversation
A layout that puts the classification controls under the conversation, not beside it.
Annotation tool interface for tagging text within a message
A variant for annotating a phrase inside a message, not just the message as a whole.
Hotkeys carry the whole flow — select the text, tag it, move on.
03

Results

  • 86% faster than spreadsheets
  • 15% fewer annotation errors
  • Full team adoption within a week
  • Model training cycles: quarterly → monthly

“This is much better to use than working with spreadsheets. Reading the text is much easier, I make fewer mistakes, and I'm much faster at annotation.”

— Lar, Insights Manager
04

Reflections

What worked

Efficiency beat feature count. Keyboard shortcuts and fewer ways to make a mistake solved the actual problem without anything that would slow adoption down.

Opportunities

Inter-annotator agreement — how often two people tag the same message the same way — should have been a day-one metric, not something bolted on after launch.