Designing anAI agent for Bupa's consultants

Designing an AI agent for Bupa's frontline customer service consultants

Scope of work

GenAI

Discovery and research

User experience (UX)

User interface (UI)

User Testing

Facilitation and workshops

Company

Bupa

Year

2025

My role

Co-lead designer (1 of 2)

Too much searching. Not enough serving.

When customers ask their health insurer, “Am I covered?”, the answer isn’t quite simple.

Across Bupa’s frontline ecosystem answering a single product enquiry required navigating multiple internal systems. Consultants were searching through:

9,000+ insurance policy documents outlining what customers are covered for and what they’re not.
1.8 million individual coverage rules tied to specific treatments, item numbers, claim limits, and conditions.

On average, this took nearly 8 minutes per enquiry, with 8–20% of interaction time spent in “dead air.” Small inefficiencies were compounding into a significant experience and operational problem.

For this initiative, we focused on messaging consultants within the digital service team, a high-volume, lower-risk environment to test and learn potential solutions before scaling to voice and other frontline channels.

The challenge

How might we reduce consultant friction in navigating complex policy information so we can ensure our customers receive clear, accurate, and compliant answers?

We needed to look for a solution that can:

Reduce system switching
Improve answer accuracy
Maintain compliance and traceability
Meaningfully reduce handling time

This became the foundation of our design approach and strategy.

Building Bupa's first generative AI initiative

This project emerged from a Microsoft x Bupa hackathon during the 2023 AI surge.

As Lead Designer, I worked end-to-end across the initiative. I was responsible for:

Framing and scoping the opportunity, leading consultant research and workflow mapping
Identifying and prioritising the highest-impact use case grounded in business and user insight
Designing the proof of concept, MVP, and pilot experience (from wireframes to delivery-ready designs)
Driving structured test-and-learn cycles in close partnership with engineering to refine AI behaviour and guardrails
Preparing the solution for operational rollout and scale across service channels

Phase 1: Framing the right problem

This project began when customer operations approached our innovation team with a clear pain point about policy enquiries getting cognitive heavy and system-fragmented.

At the same time, our innovation team was actively exploring emerging generative AI capabilities in early 2023. We were experimenting to understand its enterprise readiness, risks, and practical boundaries.

Rather than assuming AI was the answer, we asked if emerging AI capabilities could support consultant workflows without introducing any compliance risks.

We explored the opportunity through three lenses:

Frontline consultants: How do consultants actually navigate systems during live enquiries? Where is friction the highest?
Business impact: How and where would efficiency gains meaningfully reduce handle time and improve NPS at scale?
Technology feasibility and fit: Could emerging generative AI retrieve policy information accurately and safely within regulatory guardrails?

We defined four foundational assumptions to test:

Relevance: Can the AI interpret consultant intent accurately?
Accuracy: Can it retrieve correct policy sources?
Speed: Can it meaningfully reduce handling time?
Safety: Can it operate within compliance guardrails?

Human-in-the-loop by strategic design

Through discovery, it became clear that deploying generative AI directly to customers would introduce risk in a highly regulated environment.

Rather than launching externally, we explored the capability within consultant workflows first. This created a safeguarded space to test, refine, and build trust.

Consultants remained in the loop to review and validate responses before communicating with customers, allowing us to improve the system responsibly while protecting compliance and accuracy.

By strengthening consultant experience first, we created a controlled path to improve customer experience.

Phase 2: Building the first proof

Rather than debating AI in theory, we brought it to life and experimented.

Using publicly available insurance policy documents, we created a working prototype through a Microsoft hackathon environment with OpenAI integration.

This proof of concept allowed us to test our four foundational assumptions in practice. This experiment validated that generative AI could retrieve relevant, source-linked answers when content was carefully structured and scoped.

More importantly, it also revealed where the AI agent broke down, particularly around context interpretation and ambiguity. This gave us a clear direction for refinement in the next phase.

Phase 3: Iterative test, learn, refine

We ran structured test-and-learn cycles and co-design sessions with messaging consultants and engineers to strengthen the AI’s reliability in real world conditions.

Through keyword-mapping and semantic workshops, we captured real conversation patterns and mapped related policy sections to help the AI agent interpret intent, not just keywords. And because the AI agent has no inherent Bupa context, this phase required deliberate knowledge and systems design.

In parallel, we also partnered closely with engineering to refine prompt strategy, establish guardrails for scope control, and structure content semantically to improve contextual accuracy.

This is a critical step that transformed a promising proof of concept into a reliable MVP consultants could trust and use.

Phase 3: Iterative test, learn, refine

We ran structured test-and-learn cycles and co-design sessions wiwith messaging consultants and engineers to strengthen the AI’s reliability in real world conditions.

This included facilitating keyword-mapping and semantic workshops with consultants to improve accuracy and relevance. We captured synonymous language used in real customer conversations and mapped policy sections commonly referenced together, ensuring the AI could understand intent in context, not just keywords in isolation.

In parallel, we partnered closely with engineering to refine prompt strategy, establish guardrails for scope control, and structure content semantically to improve contextual accuracy.

Because OpenAI is a pre-trained model with no inherent Bupa context, this phase required deliberate knowledge and systems design. It was the critical step that transformed a promising proof of concept into a reliable MVP consultants could trust and use.

Phase 4: Piloting and designing for scale

We launched a live pilot across 11 messaging teams with 112 consultants.

Rather than treating the pilot as static rollout, we designed it as an active learning system. Each team nominated an AI Champion to report feedback and suggest improvements.

Over two months, we shipped 7 iterative releases, received 6,000+ live enquiries and 427 feedback entries that addressed:

Ambiguous responses
Out-of-scope edge cases
Expanded knowledge beyond insurance policy information and suggestions to include individual coverage rules uncovering specific treatments, claim limits and item numbers.

By the end of the pilot, the AI agent was no longer an experiment. it was an operational capability ready for scale.

Meet the AI agent: A knowledge assistant designed for frontline consultants

Intent-based knowledge assistant

Consultants can enter specific natural language queries (e.g., “Is knee surgery covered under Silver Hospital?”).

With the AI trained, it translates this prompt into sturctured searches across its database of policy documents, claim eligibility and benefit rules.

Structured, source-linked responses

When AI agent provides its response, it includes clearly formatted coverage details and direct links to official policy sources that consultants can open at the same screen.

Embedded feedback and case tracking

We integrated feedback loops directly into the tool and ensured tracability by including optional Case ID tracking via Sprinklr (the messaging platform).

This enabled the team to detect edge cases, refine prompts and deliver continuous improvement from live usage.

Following pilot success, the initiative has now transitioned to Bupa’s Customer Business team and AI Factory for enterprise scale.

99.7%

pilot accuracy

Based on benchmark provided by compliance, AI Agent scored high accuracy across tested product enquiries.

consultant NPS

We surveyed approximately 140 messaging consultants to understand their current experience with the AI agent.

130

seconds saved per enquiry

Contributed to 12% reduction in handling time.

$1.02M

Projected annual operational benefit

Supporting an estimated 127,000 policy enquiries and 277,000 treatment-level coverage rules per year.

Impact from Pilot to Scale

What I learned from this project

Progress over perfection. Test, learn, refine.

Reliability of Generative AI must be thoughtfully and designed, not assumed.

Generative AI is not a plug-and-play solution. It requires thoughtful data structuring, semantic mapping, and contextual guardrails.

Experimentation accelerates clarity.

Early prototypes and live testing exposed constraints faster than planning ever could. Progress came from building, measuring, and refining.

Feedback fuels adoption

Embedding feedback loops directly into the product strengthened trust, improved accuracy, and enabled continuous iteration alongside real consultant workflows.

This case study provides a brief overview of the project.

For a detailed end-to-end walkthrough, feel free to get in touch

Contact

Other work

Transforming digital referral journeys at Bupa Dental

Optimising journeys to deliver $118k in new patient revenue

Simplifying course fee discovery at Macquarie University

Designing a 3-step calculator to replace complex PDFs

Design leadership in practice

Creating spaces, tools, and rituals that help design thinking spread beyond the design team.

Let's make something meaningful together

Made in sunny Radelaide!