gen ai - UX Case Study 

UX Case Study 

UX Case Study

UX Case Study

UX Case Study

Model Migration as a Lifecycle Problem

Amazon Bedrock, 2026
Enabling enterprises to migrate models without losing trust, quality, or control

header-img-mm

PROBLEM OVERVIEW

Amazon Bedrock is a managed GenAI platform used by enterprises running production AI systems in regulated, customer-facing environments. The central challenge is that foundation models evolve rapidly, but enterprise expectations for stability do not.

PROBLEM OVERVIEW

Amazon Bedrock is a managed GenAI platform used by enterprises running production AI systems in regulated, customer-facing environments. The central challenge is that foundation models evolve rapidly, but enterprise expectations for stability do not.

Problem Overview

Amazon Bedrock is a managed GenAI platform used by enterprises running production AI systems in regulated, customer-facing environments. The central challenge is that foundation models evolve rapidly, but enterprise expectations for stability do not.

Model upgrades change behavior with Gen AI in subtle ways. Prompts migrated across models or providers often underperform and require extensive re-optimization. For large organizations, “upgrade” becomes a major operational project, not a quick configuration change.

One customer spent 5400+ engineering hours on a single migration.

Model upgrades change behavior in subtle ways. Prompts migrated across models or providers often underperform and require extensive re-optimization. For large organizations, “upgrade” becomes a major operational project, not a quick configuration change.

Model upgrades change behavior in subtle ways. Prompts migrated across models or providers often underperform and require extensive re-optimization. For large organizations, “upgrade” becomes a major operational project, not a quick configuration change.

Executive summary

Enterprises upgrading AI models in production had no structured process. One customer spent 5,400 engineering hours on a single migration. I designed a three-gate control plane that consolidated four fragmented engineering practices into a repeatable one-to-two week lifecycle. The system surfaces analysis at every stage. Engineers choose the model. Nothing reaches production without an explicit human decision.

Design thesis and principles

DESIGN THESIS

AI systems and agents are only trustworthy when change is explicit.

This work treated model migration as a change-management problem, not an optimization exercise. The primary failure modes designed against were silent behavior change, inefficient token utilization, and unexplainable shifts in output quality.

PRINCIPLES THAT SHAPED EVERY DECISION 

  • Show what changed
  • Safety is the default
  • System provides the evidence
  • Humans decide the next step

Who I designed for

Model migration sits between system reliability and response quality. I designed for two primary personas.

BEDROCK DEVELOPER

IT-pro-omit

Primary focus
Building reliable AI applications in production.

Responsibilities

  • Application performance under real user inputs
  • Token usage and cost management
  • Handling large documents and context limits
  • Stability during upgrades

Needs

  • Clear performance and cost comparisons
  • Confidence that behavior will not regress
  • Safe rollout paths to production

PROMPT ENGINEER

saas-user-omit

Primary focus
Getting the best possible responses from the model.

Responsibilities

  • Prompt design and iteration
  • Quality evaluation across models
  • Working within token and instruction limits
  • Balancing detail with response flexibility

Needs

  • Side-by-side output comparisons
  • Fast testing workflows
  • Evidence that quality improves, not degrades

Shared Constraints
Both personas operate within model limits, balance cost and performance, and need validation before committing to a model change.

The system, not the screens

Model migration was designed as a decision lifecycle, not a workflow. The lifecycle consisted of three stages that progressively increased confidence while making cost and risk legible.

system

GUARDRAILS DESIGNED INTO EVERY STAGE

  • Users can exit without irreversible impact
  • Dynamic and integrated cost calculation
  • If an error or failur occurs, stages can be retried
  • Mandatory evaluation acts as a guardrail against premature production exposure
  • The process keeps responsibility explicit: the system provides evidence, the user decides

Stage 1: Initial evaluation

To start model migration, my users have to provide: invocation log data from their app (system logs of what an AI did and what the result was) and the models they want to target.

When they start the job, the system performs an Initial Evaluation to compare their source and selected target model performance. This stage acts as a quick decision point if model migration is worth moving forward.

Start model migration

WHAT USERS NEED HERE

  • Fast signal on relative quality and token use
  • Clear baselines and comparison metrics
  • A decision point: proceed or stop

The results screen surfaces performance measurements side-by-side. In this example, Claude Sonnet 4.5 shows higher accuracy and significantly lower latency and cost than Sonnet 3.5. The invocation log table below lets engineers drill into individual prompt pairs, seeing the source and optimized outputs side by side with accuracy deltas at the prompt level. Change is visible. My user decides whether the signal is strong enough to proceed.

Migrate 61

Stage 2: Prompt optimization

My users have a variety of methods to provide prompt templates to the system. They can use our system to perform prompt optimization, or skip directly to shadow testing.

Start prompt optimization 3

The goal of this stage is to improve relative quality and cost efficiency before deeper investment. Optimization is a lever, not a requirement, and should not block teams that want to evaluate with their own methods.

WHAT USERS NEED HERE

  • A clear promise: what optimization can and cannot do
  • Visibility into which prompts improved and by how much
  • The ability to import results if they optimize externally
Migrate 62

Stage 3: Shadow testing

After providing optimized prompt templates and test data, users configure a traffic sample and duration to begin shadow testing. It produces production-like evidence without production impact.

Start shadow testing

WHAT USERS NEED HERE

  • A controlled time window and clear cost expectations
  • Streaming results so they can stop early
  • Side-by-side diffs for outputs and key metrics

Shadow testing runs the target models against live production traffic in parallel with the current model. Results stream in real time so engineers can monitor performance as the test runs. At the end, engineers make the final migration decision based on what they see.

Migrate 60

Results

Enterprise customers who reviewed the system reported it would significantly reduce the chaos they had experienced with model migrations. A process that was too risky to run repeatedly became a controlled, repeatable one-to-two week lifecycle.

Portfolio

Live Ops Alerting DashboardDesigning clarity for real-time operational decision-making

Career exploration for Workday's Career HubDesigning AI-assisted career exploration with human judgment at the center

AWS Glue StudioDesigning a configuration-driven data pipeline builder for enterprise scale

Chase mobileUI-UX Design

RUPERTO FABITO, JR, © 2026
jr.fabito@gmail.com

RUPERTO FABITO, JR, © 2021
jr.fabito@gmail.com