← back to blog

Cloud FinOps Framework: AWS Cost Intelligence Dashboard & Anomaly Detection

Architecting a FinOps framework that reduced cloud costs by 30% — featuring Cost Intelligence Dashboard, automated anomaly detection, chargeback mechanisms, and executive-level cost visibility.

SJ
Sabin Joshi
DevOps Engineer

The Cloud Cost Problem

Cloud bills are opaque. You get a 300-line invoice from AWS and no one knows which team is responsible for the $40K spike in the EC2 section. Finance is unhappy, engineers are confused, and nobody has visibility until the bill arrives.

This FinOps framework gave every team real-time cost visibility, automated alert on anomalies, and a chargeback model that tied cloud spend to business units — resulting in a 30% cost reduction in 6 months.

FinOps Architecture

AWS FinOps Data Pipeline
Cost & Usage Report (CUR) S3 Raw parquet format AWS Glue ETL + tagging Athena SQL queries QuickSight Cost Intelligence Dashboard Cost Anomaly Detection SNS → Slack team alerts Tag Policy env / team / product AWS FinOps Data Pipeline CUR → S3 → Glue → Athena → QuickSight + Anomaly Alerts

The Tagging Strategy

None of this works without consistent resource tagging. We enforce tags via AWS Organizations Service Control Policies (SCPs). Resources without required tags cannot be created.

# Required tags enforced via SCP
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Deny",
    "Action": ["ec2:RunInstances", "rds:CreateDBInstance"],
    "Resource": "*",
    "Condition": {
      "Null": {
        "aws:RequestedRegion": "false",
        "aws:ResourceTag/team": "true",
        "aws:ResourceTag/environment": "true",
        "aws:ResourceTag/product": "true"
      }
    }
  }]
}

Anomaly Detection Setup

AWS Cost Anomaly Detection uses ML to identify unexpected spend increases. We've tuned it per service: a 20% increase in EC2 costs is fine; a 20% increase in NAT Gateway costs means something is very wrong (likely a misconfigured log aggregation pipeline).

Our Alert Thresholds

  • EC2/EKS: Alert if daily spend exceeds 30% of 14-day average
  • Data Transfer: Alert on any increase >15% — data transfer spikes are almost always bugs
  • RDS: Alert on 25% increase — usually forgotten development snapshots
  • Lambda: Alert on 50% increase — invocation count anomalies

Where the 30% Savings Came From

Once teams could see their costs in real time, behavior changed immediately. The biggest wins: Right-sizing EC2 instances (17% reduction), decommissioning orphaned resources (8%), and switching to Savings Plans for predictable workloads (12%). The total over 6 months exceeded $280K in savings on a ~$900K/month AWS bill.

💡Make cost data visible to engineers, not just to finance. When teams see their service costs on a dashboard next to performance metrics, they optimize naturally. We added a cost widget to every team's Grafana dashboard.