Enter password to view case study

Product Design

/

AI

/

Machine Learning

Aurora Copilot

Embedding AI into a complex multi-stakeholder platform

Overview

The copilot is part of the AI layer embedded across Aurora's platform. It lives inside the workflow of senior underwriters, surfacing intelligence where decisions happen. We deliberately chose "copilot" over "autopilot." The AI does not make decisions, it prepares the underwriter to make better ones.

The constraint from day one: under FCA regulation, the AI can surface information but cannot give advice. It can tell an underwriter a rebuild cost is 40% below comparable properties. It cannot say to increase it. Every piece of AI-generated copy in the platform had to respect that boundary.

year

2024 – Current

company

Aurora

role

Lead Product Designer

focus

Product Design
UX Strategy
Rapid prototyping
User testing

What was the AI impact so far?

field correction rate

< 5% (for one product)

STP Rate

52% - 70% (as of Feb 2026)

UW Throughput

480/mo across 4 risk UWs

the first business problem

Cases stall on missing or unverifiable data.

From discovery, we knew: if underwriters can see what the AI extracted but can't verify where it came from, they'll either send it back to the broker or check manually. Both burn time.

At expectation of 24 hour turnaround per case in the industry, that compounds fast.

The issue wouldn't be accuracy. It would be transparency. Even if the AI was right, underwriters wouldn't act without understanding how it got the reasoning, or the source.

So how might we close information gaps fast enough that cases keep moving, and make AI outputs transparent enough that underwriters progress instead of second-guessing.

Progressive disclosure testing

We initially shipped a chat input on dashboard insight cards. 5 contextual interviews later: nobody typed into it. When we watched, we saw two different behaviours. Dashboard: scanning fast, triaging 12+ cases, clicking into urgent ones. Case detail: slowing down, reading, checking, deciding. The outcome was two models, expandable insight cards and AI sidebar only for deepdiving

Progressive disclosure testing

We initially shipped a chat input on dashboard insight cards. 5 contextual interviews later: nobody typed into it. When we watched, we saw two different behaviours. Dashboard: scanning fast, triaging 12+ cases, clicking into urgent ones. Case detail: slowing down, reading, checking, deciding. The outcome was two models, expandable insight cards and AI sidebar only for deepdiving

trust calibration for underwriters

As an underwriter, I want to know whether the AI found a value or inferred one, because those two states require completely different actions from me.

Under data review, each field in the case view carries one of three tags. Each tag tells the UW a different thing and asks for a different action.

Extracted: AI found the value in the document. Source tag links to exact page and paragraph.
Check source: AI inferred the value. The value wasn't explicitly stated. AI mapped it from context.
Risk flag: AI is flagging a concern about the value. The value might be correct, but something needs attention.

the second challenge

Underwriters work across multiple surfaces to make a single risk decision.

Dashboard to triage. Case detail to investigate. Documents to verify. Each surface serves a different part of the decision, but if we kept AI intelligence only showing up in one place, and the underwriter wasn't on that surface at that moment, they would miss it.

The copilot's job: meet the underwriter where they already are with the right intelligence, at the right depth, for what they're doing right now.

The opportunity: How do we match AI intelligence to the underwriter's current mode so it helps, not hinders?

scanning mode

Ambient AI cards

As an underwriter, I want the AI to have already identified what needs my attention so that I'm making decisions, not doing triage again.

The dashboard tells the UW where to go: 4 needs review, 2 in referral, 3 awaiting broker. Market intelligence with cited source and a link to the full analysis. No inputs. the UW scans, prioritises, clicks through.

We originally shipped a chat input on these cards. 5 contextual interviews revealed nobody used it. UWs are triaging at this stage, not investigating.

investigating mode

Contextual intelligence

As an underwriter, I want risk signals surfaced in context, and not buried in a separate page I have to navigate to.

The AI sidebar appears across all surfaces, but its content changes based on where the UW is and what they're looking at. It arrives pre-populated andnever a blank canvas.

the invisible artefacts

Supporting all of these intelligence are FCA regulations, design constraints and guidelines for AI.

Constraints
Trade-offDecisionRationale
Explanation depth vs. scan speedSource tags default, details on demandSenior UWs want speed. Juniors need context.
Extraction coverage vs. confidence"Not found" distinct from "low confidence"Different user action: chase vs. verify.
AI guidance vs. liabilityRules-based, not generativeCan't have an LLM giving novel underwriting advice.

FCA regulation: the AI can surface information but cannot give advice.

AI can sayAI cannot say
"Rebuild cost is 40% below median for this property type""You should increase the rebuild cost"
"This vessel's trading area includes sanctioned waters""You should exclude this vessel"
"3 years declining net assets detected""This risk should be declined"

Outcome

STP: 52% → 70%. The consolidated follow-up and pre-populated referral resolution reduced cases stalling on missing data.

3 risk UWs handle 480 quotes/month. Throughput held as volume scaled.

Quote rate: 75% → 81%. More cases reach quote-ready because broker-side completeness visibility reduced round-trip chasing.

The underwriter copilot (ambient and chat) is currently in pilot phase, being tested across the Aurora platform and used by 3 underwriters in their daily workflow.

Currently field correction rate is tested at < 5% (for one product)

What we're currently measuring (P1) and why

MetricWhy it matters
Field correction rate (per field type)Trust calibration working?
First-time completenessBroker-side AI improving quality?
Time-to-decision (with vs. without insights)Copilot saving UW time or creating doubt?
Preventable referral ratePipeline reducing unnecessary escalation?

What I learnt

  • AI confidence is not the same as user confidence. Source provenance builds trust faster than any model accuracy claim.

  • The interface patterns for showing "why" (progressive disclosure, source links, confidence indicators) were as important as the AI model accuracy itself.

  • Match AI to cognitive mode. Scanning needs ambient information and links, not inputs. Case investigation needs a pre-populated briefing.

  • Invisible correct answers are the worst failure mode. Multiple UWs missed amber insights in badges. In a regulated domain, that’s worse than wrong.

Currently working

LONDON & SINGAPORE

© Carrie Ho 2026