Case Study
|
Travel

Reducing GenAI Monitoring by 70% with AWS

How Mu Sigma scaled a U.S. airline’s GenAI initiatives and reduced 70% DevOps effort with a reusable AWS-native monitoring framework.

Situation

A leading U.S.-based airline was running multiple GenAI applications across different business use cases, each producing varied outputs and serving distinct objectives. While GenAI adoption was accelerating, there was no standardized way to monitor model quality, safety, or reliability.
Industry-standard evaluation metrics were insufficient, and building custom monitoring pipelines for each application would have required significant engineering effort, time, and cost -slowing innovation and increasing operational risk.

Problem

No centralized monitoring framework for GenAI quality, safety, and usage
Absence of industry-standard metrics aligned to business context and semantics
High variability across GenAI applications, requiring flexible evaluation logic
AWS Bedrock service quotas limiting throughput at scale
Transient LLM errors causing pipeline failures and manual intervention
High DevOps effort to onboard and maintain monitoring for new GenAI use cases

Solution

Mu Sigma built a cloud-native GenAI monitoring framework on AWS to standardize quality, safety, and Responsible AI across applications—without rebuilding pipelines for each use case.

A config-driven architecture on Amazon S3 and Amazon Aurora enables new GenAI applications to be onboarded through configuration changes alone. Business-aligned evaluation goes beyond generic ML metrics, using custom logic such as NLP metrics, semantic similarity, and LLM-as-a-Judge powered by AWS Bedrock.

AWS Step Functions, Lambda, and ECS Fargate orchestrate quota-aware processing and automated retries, ensuring reliable scale and maximum Bedrock throughput without throttling.

By combining decision science, cross-industry context and engineering rigor we integrated hard to twire responsible AI into the airline’s operating model.

Impact:

Established a standardized, reusable GenAI monitoring framework with business-aligned custom metrics
Built a scalable, AWS-native architecture aligned with cloud best practices – 100% AWS Bedrock quota utilization without throttling
~90% automated recovery of transient failures, cutting failed alerts by 60–80%
~70% reduction in DevOps effort for new monitoring use cases

Business Impact

~70%
reduction in DevOps effort
100%
AWS Bedrock quota utilization

~70% lower operational effort, faster onboarding, and consistent Responsible AI - enabled by a reusable, AWS-native GenAI monitoring framework with customizable metrics that ensures quality and reliability across applications.

Global Supply Chain Director

Let’s move from data to decisions together. Talk to us.

The firm's name is derived from the statistical terms "Mu" and "Sigma," which symbolize a
probability distribution's mean and standard deviation, respectively.