Introduction
Multi-agent chatbots in large retail organizations face a critical architecture challenge: how to evolve from a modular monolith to a microservices architecture that enables agent reuse across teams and use cases.
This post shares lessons from Microsoft’s Industry Solutions Engineering (ISE) partnership with a retail customer addressing this challenge. Their production chatbot used a router pattern as a modular monolith—multiple specialized agents organized as modules within a standalone application. Each query was routed to exactly one agent based on intent detection—no multi-agent orchestration or response synthesis. The engagement focused on redesigning the architecture: transitioning to a microservices model where domain agents become independent services that coordinators can orchestrate together, enabling multiple agents to collaborate on complex requests.
While the customer is anonymized, the architectural challenges, technical decisions, and performance trade-offs documented here reflect real production constraints.
1. The Scale Challenge
The customer’s original production chatbot followed a deterministic router pattern as a modular monolith—a standalone application with well-defined agent modules. A central orchestrator detected user intent and routed each request to a single specialized agent.

Router pattern as modular monolith: fast, predictable, but not designed for reuse.
This architecture worked well for a single chatbot, but organizational ambitions grew: adding new AI behaviors across teams, supporting multiple client experiences, and enabling agent reuse across the company. The modular monolith created bottlenecks:
Architecture Limitations:
- No cross-system reuse: Agents tightly coupled to the chatbot application; other systems duplicated capabilities
- Unclear ownership boundaries: All agents maintained by a single team within one codebase, preventing domain expert ownership
- Integration rigidity: New client experiences required forking or modifying the modular monolith rather than composing existing agents
Organizational Friction:
- Development bottlenecks: Centralized team couldn’t scale with cross-functional demand for new AI behaviors
- Inconsistent practices: Different teams building similar capabilities in varied ways without shared patterns
- Slow time-to-production: Each new capability required custom development instead of reusing proven agents
Design Goals and Solutions
The ISE partnership focused on three core architectural goals:
- Enable agent reuse across systems → Coordinator Pattern with microservices-based domain agents
- Simplify integration for multiple consumers → Gateway layer for cross-cutting concerns (auth, observability)
- Scale evaluation with architecture complexity → Adapting existing evaluation frameworks to handle multi-agent orchestration
Key requirements: Framework independence, use of existing SDKs, offload cross-cutting concerns (auth, observability, tracing), clear security boundaries, microservices deployment model, minimal latency overhead.
Expected outcomes: Standardized multi-agent architecture, platform-agnostic evaluation, accelerated delivery of new AI capabilities, and clear ownership models for domain teams.
2. Coordinator-Based Multi-agent Orchestration
The architecture transition to independent microservices required an orchestration layer. The coordinator pattern introduces structure and control without sacrificing flexibility. Teams could build reusable domain agents focused on specific business capabilities—product catalog search, inventory availability, order management, customer profiles, knowledge retrieval—and compose them through coordinators to enable multi-agent orchestration. The coordinator itself replaces the separate intent-detection LLM from the original router pattern, consolidating orchestration and intent analysis into a single component.
Architectural Patterns Considered
Three architectural patterns were evaluated for orchestrating agents in a microservices architecture:
Router Pattern: Extends the original router pattern with remote agents. Deterministic intent-based routing to a single specialized agent per request. Enables migration from modular monolith to microservices but limited to one agent handling each query—no multi-agent orchestration or response synthesis.
Group Chat Pattern: Multiple agents collaborate through a shared conversation thread managed by a Group Chat Manager. The manager controls agent selection (who speaks next), termination (when to end), and human involvement. Enables transparency and iterative refinement, but introduces conversation management complexity, performance concerns, and tight coupling at the context level as the number of participating agents grows.
Coordinator Pattern (chosen): Specialized coordinators orchestrate multiple independent domain agents per request, enabling sequential or parallel invocation with response synthesis.

Coordinator pattern: smart orchestration of multiple independent domain agents with synthesized response.
The Coordinator is responsible for:
- Understanding client context: Interpreting user intent within the broader conversation
- Planning orchestration: Determining which domain agents to invoke and in what order
- Managing remote calls: Handling invocations to microservices-based agents
- Response synthesis: Combining outputs from multiple agents into coherent, contextual answers
Trade-offs acknowledged:
- More complex than router pattern
- Increased latency from remote agent invocation and orchestration overhead
- Less iterative collaboration than group chat pattern
Framework Selection
The ISE partnership explored three frameworks for implementing the coordinator pattern:
- Microsoft Agent Framework (C#): Native multi-agent orchestration framework
- Semantic Kernel (C#): Extensible SDK with agent capabilities and enterprise integration
- LangGraph (Python): Graph-based agentic workflow framework
Two teams within the customer independently selected coordinator frameworks based on language preferences and familiarity:
Team 1 – LangGraph Coordinator: Built their coordinator using LangGraph. This coordinator orchestrates remote domain agents, including an enterprise knowledge agent that uses RAG to search the corporate knowledge base.
Team 2 – Semantic Kernel Coordinator: Built their coordinator using Semantic Kernel. Their existing AI chatbot was already built on Semantic Kernel, making it the natural choice. Familiarity with SK’s patterns and APIs accelerated development and reduced onboarding friction.
The Microsoft Agent Framework was evaluated but not selected by either team during this engagement—it was still in preview, and both teams preferred frameworks they already knew. The ISE partnership supported exploration across all three frameworks while focusing deeper collaboration on the Semantic Kernel coordinator based on team readiness, existing investment, and strategic alignment. The engagement ended before comparative benchmarking across frameworks could take place, so claims about relative framework performance remain untested.
Framework Independence in Practice

Coordinator framework choice is independent of domain agent implementation technology.
Domain agents:
- Are deployed as independent microservices and owned by domain teams
- Expose well-defined interfaces through standard protocols
- Focus purely on domain logic (no UX coupling)
- Can be reused across multiple coordinators and experiences
A Note on Cross-Cutting Concerns
When multiple coordinators reuse the same domain agents, cross-cutting concerns like authentication, authorization, and observability need a centralized management point. The team introduced a gateway layer (using Azure APIM) between coordinators and domain agents to handle credential validation, identity provider abstraction, and request tracing. This layering kept domain agents decoupled from frontend authentication mechanisms, reinforcing their reusability across systems.
3. Performance Reality: No Free Lunch
The coordinator pattern’s architectural benefits—reusability, independent deployment, cross-system composition—clearly require performance trade-offs. The team anticipated some latency overhead from the coordinator architecture. The coordinator consolidates the original router’s separate intent-detection LLM into its orchestration logic, yet initial smoke testing still revealed notable overhead—even for simple cases involving just two agents. More overhead came from identifying unsupported cases like adversarial queries, abuse, off-topic requests, and non-language input.
Testing context: These initial tests established a baseline using the original agent prompts transferred directly into the coordinator and domain agents extracted from the modular monolith without optimization. The team measured the architectural overhead with minimal refactoring to isolate the microservices transition’s performance impact from prompt engineering improvements.
What drives the overhead:
- Coordinator planning: intent analysis and agent selection logic
- Multi-agent invocation: sequential or parallel calls to remote domain agents
- Response synthesis: collecting and combining outputs across agents
More formal benchmarking with optimized prompts and refined agent implementations is ongoing to establish realistic performance baselines.
Mitigation strategies under exploration:
The team is exploring multiple approaches to address latency, ordered by potential impact relative to implementation effort:
- Prompt optimization: Refining coordinator and agent prompts to reduce token counts and improve efficiency
- Model selection per task: Using smaller, faster models for simpler tasks like intent detection and reserving more capable models for complex operations like response synthesis
- Caching strategies: Reusing recent domain agent responses for frequently asked questions
- Selective orchestration: Simpler queries may bypass full multi-agent orchestration when appropriate
- Streaming responses: Improves perceived responsiveness by showing partial results immediately, though total processing time remains unchanged
Work is ongoing, and no single solution eliminates the architectural overhead.
Key Lessons Learned
- Deterministic routing didn’t scale across the organization—flexibility required orchestration
- Coordinator patterns enable reuse but introduce real latency costs
- Planning and orchestration—not just remote invocation—contributed significantly to overhead
- Both teams prioritized architectural independence between coordinators and domain agents over framework selection, choosing frameworks based on familiarity rather than benchmarked performance
- Measure early; performance assumptions are often wrong
- The architectural overhead is real—benefits must justify the cost
Closing Thoughts
Coordinator-based multi-agent systems are powerful—but not free.
In large retail environments, the trade-off is often worth it: sacrificing raw speed to gain reuse, ownership clarity, and organizational scalability. The critical skill is not choosing the right pattern, but understanding and managing the trade-offs intentionally.
