2025
Contracted through Pathfindr to design and deliver a multi-agent AI advisory system for a major Australian insurance comparison platform (NDA) serving 4.5M+ users. I ran the discovery research, designed the experience framework and AI personality specification, defined eval criteria and guardrails, and led the squad from POC to production-ready. The engagement started as a proof of concept. The speed and quality of the work convinced the CEO to extend into full production.

I was contracted through Pathfindr to work on a project for one of their clients -- a major Australian insurance comparison platform serving 4.5M+ users. The platform's entire revenue model depended on phone-based advisers converting leads, but the industry was shifting to digital self-service and their funnel was breaking.
Five-person squad, all Pathfindr contractors: myself as product and AI design lead, a solution architect, two full-stack engineers, and a product designer. The engagement reported directly to the client's CEO -- this was a C-level AI transformation initiative, not a feature request.
I was originally hired for the proof of concept. We built the POC with a production mindset from day one -- the goal was to demonstrate capability that could be productised immediately, not a throwaway demo. The speed and quality of what we delivered convinced the CEO to extend into full production.
We built the POC with a production mindset from day one. The goal was capability that could be productised immediately, not a throwaway demo.

The client's data told a clear story. The phone-first model was breaking and the digital channel wasn't converting. All of these numbers came directly from the client's analytics.
I ran the discovery research to understand why: ~500 survey respondents, 6 in-depth user interviews, and analysis of 40+ recorded calls from the platform's top-performing human advisers. The research revealed that customers weren't leaving because of the content. They were leaving because the experience pushed them into a sales process before they were ready.
The platform's competitive edge was their human advisers -- high phone conversion because humans read customers, adapt tone, and build trust. The design question wasn't 'can we automate this?' It was 'what makes a human advisory conversation feel trustworthy, and which of those qualities can an AI system replicate authentically?'
I studied 40+ recorded calls, mapping questioning patterns, trust-building techniques, objection handling, and the adaptive behaviours that separated the best advisers from average ones. The top performers didn't follow a script. They assessed the customer's mindset within the first few exchanges and adapted everything -- pacing, information density, question sequencing -- to match.
I deliberately scoped the system twelve feet deep rather than twelve feet wide: health insurance for two target segments (young couples and singles). Expert-level depth earns trust. Shallow breadth across every vertical feels like another chatbot.
The design question wasn't 'can we automate this?' It was 'what makes a human advisory conversation feel trustworthy, and which of those qualities can an AI system replicate authentically?'

The architecture defines what the system can do. The experience design framework defines how it should behave, adapt, and know when to hand over to a human. I designed and presented this as a strategic framework to the client's CEO and CPO, bridging product design intent and technical implementation. This was the document that turned the POC into a production commitment.

Before writing any code, I defined the performance measurement framework with the client's CEO and CTO. Every design decision mapped to a measurable outcome: lead conversion rate (baseline 7.6%, target 12-17%), recommendation accuracy benchmarked against human adviser outcomes, conversation completion rate, handoff quality scoring, cost per conversation vs phone advisers, and sub-2-second latency targets.
Insurance is a regulated product. An AI system making coverage recommendations carries real compliance risk. Guardrails were designed into the agent architecture from day one -- not added in Phase 4. The system was built to fail safe rather than fail silent.
Evals were defined before the system was built. I defined eval criteria and success thresholds during discovery, captured baselines during prototyping, and the team built an automated eval pipeline that ran on every deployment from Phase 2 onwards. 200+ test conversations across customer segments, edge cases, and adversarial inputs. Model-graded evals using LLM-as-judge for conversation quality at scale, calibrated against human grader agreement. CS team reviewed random samples weekly against the same quality rubric used for human advisers.
Evals and guardrails are not phases. They are baseline infrastructure that shapes every other decision. Define them first, build everything else around them.
The solution architect designed the multi-agent architecture on Microsoft Azure -- a network of specialised AI agents, each with a defined responsibility and clear interface contracts. My role was defining the experience layer that sat on top: what each agent should do, how they should behave, and when they should hand off. The architecture was designed for observability -- every agent interaction logged, every decision traceable.

Design-led delivery with engineering one week behind design. I owned the experience specification -- every agent interaction, every conversation flow, every handoff threshold was designed and validated before engineering built it. Four phases, each with defined outputs and go/no-go criteria.


The technology was ready. The system was fully built, integrated, and passing evals. The harder problem was organisational readiness.
The original brief focused on the unauthenticated product funnel -- the 75% of visitors who left without engaging. But as the project progressed, the client's risk appetite shifted. The product moved to an authenticated experience, meaning the problems it was designed to solve changed. The system we built for anonymous visitors converting through a conversational flow was now being evaluated against a different use case.
The Customer Success team needed to champion the shift from phone-first to digital advisory, and that change management work needed to run in parallel with technical delivery from day one -- not after it. We built the technology before the organisation was ready to deploy it.
We built the technology before the organisation was ready to deploy it. Change management needed to run in parallel with technical delivery from day one.

The system was fully built, integrated, and passing evals across 200+ test conversations. The POC-to-production transition was completed. However, the project has not yet launched to the platform's 4.5M users.
The shift from unauthenticated to authenticated experience, combined with the organisational readiness gap described above, means the production launch timeline is now with the client. The technology is ready. The defined targets -- lead conversion from 7.6% to 12-17%, recommendation accuracy benchmarked against human advisers, sub-2-second latency -- remain unmeasured against real traffic.
The system passed evals. It hasn't passed the market yet. Those are different things.
The most important design artifact in an AI product isn't the interface -- it's the experience specification that defines how the system should behave. The UI is the surface. The personality spec, handoff thresholds, and mindset definitions are what make the experience feel human. That's the design craft that's invisible to the user but makes everything work.
Depth beats breadth in AI advisory. A narrow, expert-level system earns trust that a shallow generalist never will. And defining metrics, evals, and guardrails before building anything else isn't just good practice -- it's the foundation that shapes every subsequent decision about architecture, agent design, and experience flow.
The POC was deliberately styled in pink -- a visual signal that this was not production-ready. It forced the production design phase to happen rather than allowing the client to ship the POC as-is. When we moved to production, the product designer aligned the interface with the client's brand system. The same functionality, the same experience framework, but now it looked like their product.


The most important design artifact in an AI product isn't the interface. It's the experience specification that defines how the system should behave.