Amazon Bedrock, 2026
Enabling enterprises to migrate models without losing trust, quality, or control

PROBLEM OVERVIEW
Amazon Bedrock is a managed GenAI platform used by enterprises running production AI systems in regulated, customer-facing environments. The central challenge is that foundation models evolve rapidly, but enterprise expectations for stability do not.
PROBLEM OVERVIEW
Amazon Bedrock is a managed GenAI platform used by enterprises running production AI systems in regulated, customer-facing environments. The central challenge is that foundation models evolve rapidly, but enterprise expectations for stability do not.
Problem Overview
Amazon Bedrock is a managed GenAI platform used by enterprises running production AI systems in regulated, customer-facing environments. The central challenge is that foundation models evolve rapidly, but enterprise expectations for stability do not.
Model upgrades change behavior with Gen AI in subtle ways. Prompts migrated across models or providers often underperform and require extensive re-optimization. For large organizations, “upgrade” becomes a major operational project, not a quick configuration change.
One customer spent 5400+ engineering hours on a single migration.
Model upgrades change behavior in subtle ways. Prompts migrated across models or providers often underperform and require extensive re-optimization. For large organizations, “upgrade” becomes a major operational project, not a quick configuration change.
Model upgrades change behavior in subtle ways. Prompts migrated across models or providers often underperform and require extensive re-optimization. For large organizations, “upgrade” becomes a major operational project, not a quick configuration change.
I was the lead UX designer responsible for the end-to-end model migration experience in Amazon Bedrock. My scope spanned prompt optimization, evaluation workflows, and the system that enables customers to safely change foundation models in production. That puts this work in a specific category: the operator layer of AI systems, not the agent itself — the interfaces that make autonomous behavior visible, controllable, and safe to act on.
The hardest decisions were not about UI polish. They were about judgment and sequencing. To ship a viable first release, my team agreed to defer centralized data management, schema flexibility, and expanded observability so we could validate the core migration lifecycle with real customers.
The result was not a one-click migration. It was a controlled, repeatable process that gave customers confidence to evaluate change, manage risk, and establish clear ownership over AI decisions.
This work treated model migration as a change-management problem, not an optimization exercise. The primary failure modes designed against were silent behavior change, inefficient token utilization, and unexplainable shifts in output quality.
Model migration sits between system reliability and response quality. I designed for two primary personas.
Primary focus
Building reliable AI applications in production.
Responsibilities
Needs
Primary focus
Getting the best possible responses from the model.
Responsibilities
Needs
Shared Constraints
Both personas operate within model limits, balance cost and performance, and need validation before committing to a model change.
Model migration was designed as a decision lifecycle, not a workflow. The lifecycle consisted of three stages that progressively increased confidence while making cost and risk legible.

After my users provide their application's invocation log data (a log of what an AI did and what the result was) for evaluation. The system performs an Initial Evaluation to compare their current and selected target models on quality and token utilization before. This stage acts as a guardrail as a quick decision point if model migration is worth moving forward.
My users have a variety of methods to provide prompt templates to the system. They can use our system to perform prompt optimization, or skip directly to shadow testing.
The goal of this stage is to improve relative quality and cost efficiency before deeper investment. Optimization is a lever, not a requirement, and should not block teams that want to evaluate with their own methods.
After providing optimized prompt templates and test data, users configure a traffic sample and duration to begin shadow testing. It produces production-like evidence without production impact.
Shipping a trustworthy lifecycle required deliberate scope cuts. Centralized data management, schema flexibility, and expanded observability were intentionally deprioritized to validate the migration lifecycle itself with real customers.
This work transformed model migration from a fragile, manual effort into a controlled, repeatable process suitable for production GenAI systems.
Data management
Observability
Governance and policy controls
Standardized baselines and comparisons
Enterprise AI succeeds when change is visible, risk is bounded, and accountability is unavoidable.
Portfolio
AWS Glue StudioDesigning a configuration-driven data pipeline builder for enterprise scale
Career exploration for Workday's Career HubDesigning AI-assisted career exploration with human judgment at the center
Live Ops Alerting DashboardDesigning clarity for real-time operational decision-making
PartiQL Editor for Amazon QLDBUX Case Study
Asurion Virtual AgentUI Design
Enhanced chat for Chase mobileUI-UX design
UI for James Bond 007: World of espionageDesign system
Chase mobileUI-UX Design
Transaction details for Chase mobileUX Case study
Upgrade systems for Rival FireUX case study