← back to blog
devops Featured · Dec 15, 2025 · 12 min read

Stop Paying for Idle EC2: Automate Underutilization Reports with Bash & CloudWatch

A Bash script that queries CloudWatch for 7-day CPU and memory averages across every running EC2 instance, flags the underutilized ones, emails an HTML report with a CSV attachment, and runs itself every morning via cron — so you never miss a wasteful instance again.

SJ
Sabin Joshi
DevOps Engineer
#aws #bash #cloudwatch #finops #ec2 #cost-optimization #devops #scripting

The Idle Instance Tax

Every AWS account has them. EC2 instances that were spun up for a project, a test, a demo — and never sized down or terminated when the load disappeared. They sit at 3–5% CPU, consuming a Reserved Instance or racking up On-Demand charges every hour, every day, every month.

AWS Cost Explorer will tell you your bill is high. It won't tell you which specific instances are wasting money or what you should do about them. That gap is exactly what this script fills: a daily automated report that names names, shows the data, and recommends an action.

30%
avg EC2 waste in most AWS accounts
7d
CloudWatch lookback window
2
metrics checked: CPU + memory
08:00
daily report delivery via cron

How the Script Works

The pipeline is simple: query every running EC2 instance → pull 7-day average CloudWatch metrics for CPU and memory → compare against configurable thresholds → generate an HTML email with a colour-coded table and a CSV attachment for tracking in spreadsheets.

EC2 Underutilization Report — Data Pipeline
Cron Job 0 8 * * * daily 8am EC2 Discovery describe-instances instance-id instance-type · Name tag CloudWatch get-metric-statistics CPUUtilization 7d avg · AWS/EC2 mem_used_percent 7d avg · CWAgent Threshold Check CPU < 10%? MEM < 20%? → recommendation HTML Email colour-coded table CSV Attachment spreadsheet tracking Log File ec2_report.log IAM Role: CloudWatchReadOnly + EC2DescribeInstances (no static creds)

Prerequisites

Three things need to be in place before the script runs correctly. The memory metric is optional — the script gracefully shows "N/A" for instances without the CloudWatch Agent installed.

AWS CLI configured IAM role with CloudWatch + EC2 read permissions sendmail installed on the host CloudWatch Agent (for memory metrics) jq installed
💡 Run this script on an EC2 instance with an instance profile — no AWS keys in config files. The IAM role only needs ec2:DescribeInstances and cloudwatch:GetMetricStatistics. Read-only, minimal blast radius.

Underutilization Thresholds

The script uses three severity tiers. Tune these constants at the top of the script to match your organisation's cost policies:

TierCPU (7d avg)Memory (7d avg)Recommendation
Critical < 5% < 10% Terminate or stop immediately
Warning < 10% < 20% Downsize to next smaller type
Healthy ≥ 10% ≥ 20% No action needed

The Full Script

Drop this in /opt/ec2-report/ec2_underutilized_report.sh, set your email address and thresholds at the top, then run chmod +x on it. The script is fully self-contained — no external dependencies beyond the AWS CLI, jq, and sendmail.

1
Configuration block
Variables at the top — email, thresholds, lookback window, log path. Change these without touching the logic.
2
EC2 discovery
Queries all running instances in the configured region, extracting instance ID, type, and Name tag in one describe-instances call.
3
CloudWatch metric fetch
For each instance, pulls 7-day average CPU from AWS/EC2 namespace and memory from CWAgent namespace. Handles missing CWAgent gracefully.
4
Threshold evaluation + recommendation
Compares metrics against thresholds with bc for float arithmetic, assigns a severity level and a human-readable recommendation string.
5
HTML + CSV generation + email
Builds a colour-coded HTML table, writes a CSV file, then sends both via sendmail as a MIME multipart email with the CSV attached.
#!/bin/bash
# ─────────────────────────────────────────────────────────────────
# ec2_underutilized_report.sh
# Identifies underutilized EC2 instances via CloudWatch,
# generates an HTML email report + CSV attachment, logs everything.
# ─────────────────────────────────────────────────────────────────
set -euo pipefail

# ── Configuration ─────────────────────────────────────────────────
REPORT_EMAIL="devops-team@yourcompany.com"
FROM_EMAIL="aws-reports@yourcompany.com"
REGION="us-east-1"
LOOKBACK_DAYS=7
PERIOD_SECONDS=$(( LOOKBACK_DAYS * 86400 ))

# Underutilization thresholds
CPU_WARN_THRESHOLD=10    # % — downsize recommendation
CPU_CRIT_THRESHOLD=5     # % — terminate/stop recommendation
MEM_WARN_THRESHOLD=20    # % — downsize recommendation
MEM_CRIT_THRESHOLD=10    # % — terminate/stop recommendation

# Paths
REPORT_DIR="/opt/ec2-report"
LOG_FILE="${REPORT_DIR}/ec2_report.log"
CSV_FILE="${REPORT_DIR}/ec2_underutilized_$(date +%Y%m%d).csv"
HTML_BODY="${REPORT_DIR}/report_body.html"

mkdir -p "${REPORT_DIR}"

# ── Logging ────────────────────────────────────────────────────────
log() {
  echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*" | tee -a "${LOG_FILE}"
}

log "=== EC2 Underutilization Report started ==="

# ── Time window for CloudWatch queries ────────────────────────────
END_TIME=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
START_TIME=$(date -u -d "${LOOKBACK_DAYS} days ago" +"%Y-%m-%dT%H:%M:%SZ")

# ── Fetch all running EC2 instances ───────────────────────────────
log "Discovering running EC2 instances in ${REGION}..."

INSTANCES=$(aws ec2 describe-instances \
  --region "${REGION}" \
  --filters "Name=instance-state-name,Values=running" \
  --query "Reservations[].Instances[].[InstanceId,InstanceType,Tags[?Key=='Name'].Value|[0]]" \
  --output json)

INSTANCE_COUNT=$(echo "${INSTANCES}" | jq 'length')
log "Found ${INSTANCE_COUNT} running instances"

if [[ "${INSTANCE_COUNT}" -eq 0 ]]; then
  log "No running instances found — exiting."
  exit 0
fi

# ── CloudWatch metric helper ───────────────────────────────────────
get_metric() {
  local instance_id="$1"
  local namespace="$2"
  local metric_name="$3"
  local dim_name="$4"

  result=$(aws cloudwatch get-metric-statistics \
    --region "${REGION}" \
    --namespace "${namespace}" \
    --metric-name "${metric_name}" \
    --dimensions "Name=${dim_name},Value=${instance_id}" \
    --start-time "${START_TIME}" \
    --end-time "${END_TIME}" \
    --period "${PERIOD_SECONDS}" \
    --statistics Average \
    --query "Datapoints[0].Average" \
    --output text 2>/dev/null)

  # Return "N/A" if CloudWatch has no data (CWAgent not installed etc.)
  if [[ "${result}" == "None" ]] || [[ -z "${result}" ]]; then
    echo "N/A"
  else
    # Round to 2 decimal places
    printf "%.2f" "${result}"
  fi
}

# ── Severity classification ────────────────────────────────────────
get_severity() {
  local cpu="$1" mem="$2"

  if [[ "${cpu}" == "N/A" ]]; then
    echo "unknown"
    return
  fi

  local cpu_crit=$(echo "${cpu} < ${CPU_CRIT_THRESHOLD}" | bc -l)
  local cpu_warn=$(echo "${cpu} < ${CPU_WARN_THRESHOLD}" | bc -l)

  if [[ "${cpu_crit}" -eq 1 ]]; then
    echo "critical"
  elif [[ "${cpu_warn}" -eq 1 ]]; then
    echo "warning"
  else
    echo "healthy"
  fi
}

get_recommendation() {
  case "$1" in
    critical) echo "Consider stopping or terminating" ;;
    warning)  echo "Downsize to smaller instance type" ;;
    healthy)  echo "No action needed" ;;
    *)        echo "Install CloudWatch Agent for memory metrics" ;;
  esac
}

# ── Build HTML report ──────────────────────────────────────────────
log "Building HTML report..."

cat > "${HTML_BODY}" <<'HTML'
<!DOCTYPE html>
<html><head><meta charset="UTF-8"/>
<style>
  body{font-family:Arial,sans-serif;background:#f4f4f4;padding:20px}
  h2{color:#232f3e}
  table{border-collapse:collapse;width:100%;background:#fff;box-shadow:0 1px 3px rgba(0,0,0,.1)}
  th{background:#232f3e;color:#fff;padding:10px 14px;text-align:left;font-size:13px}
  td{padding:9px 14px;font-size:13px;border-bottom:1px solid #eee}
  .critical{background:#fff0f0;color:#c0392b;font-weight:bold}
  .warning{background:#fffbe6;color:#d68910}
  .healthy{background:#f0fff4;color:#1e8449}
  .unknown{background:#f8f8f8;color:#888}
  .badge{padding:2px 8px;border-radius:10px;font-size:11px}
  .badge-crit{background:#fadbd8;color:#c0392b}
  .badge-warn{background:#fef9e7;color:#d68910}
  .badge-ok{background:#d5f5e3;color:#1e8449}
</style></head><body>
HTML

echo "<h2>EC2 Underutilization Report — $(date '+%B %d, %Y')</h2>" >> "${HTML_BODY}"
echo "<p>Region: <strong>${REGION}</strong> &nbsp;|&nbsp; Lookback: <strong>${LOOKBACK_DAYS} days</strong> &nbsp;|&nbsp; Instances scanned: <strong>${INSTANCE_COUNT}</strong></p>" >> "${HTML_BODY}"
echo "<table><tr><th>Instance ID</th><th>Name</th><th>Type</th><th>CPU Avg (7d)</th><th>Memory Avg (7d)</th><th>Status</th><th>Recommendation</th></tr>" >> "${HTML_BODY}"

# CSV header
echo "InstanceId,Name,InstanceType,CPU_7d_Avg,Memory_7d_Avg,Status,Recommendation" > "${CSV_FILE}"

# ── Per-instance loop ──────────────────────────────────────────────
while IFS= read -r instance; do
  INSTANCE_ID=$(echo "${instance}" | jq -r '.[0]')
  INSTANCE_TYPE=$(echo "${instance}" | jq -r '.[1]')
  INSTANCE_NAME=$(echo "${instance}" | jq -r '.[2] // "Unnamed"')

  log "Processing ${INSTANCE_ID} (${INSTANCE_NAME} / ${INSTANCE_TYPE})"

  # Pull metrics
  CPU_AVG=$(get_metric "${INSTANCE_ID}" "AWS/EC2" "CPUUtilization" "InstanceId")
  MEM_AVG=$(get_metric "${INSTANCE_ID}" "CWAgent"   "mem_used_percent" "InstanceId")

  SEVERITY=$(get_severity "${CPU_AVG}" "${MEM_AVG}")
  RECOMMENDATION=$(get_recommendation "${SEVERITY}")

  # HTML badge styling
  case "${SEVERITY}" in
    critical) BADGE='<span class="badge badge-crit">Critical</span>'  ;;
    warning)  BADGE='<span class="badge badge-warn">Warning</span>'   ;;
    healthy)  BADGE='<span class="badge badge-ok">Healthy</span>'    ;;
    *)        BADGE='<span class="badge">Unknown</span>'              ;;
  esac

  # Append HTML table row
  cat >> "${HTML_BODY}" <<EOF
<tr class="${SEVERITY}">
  <td><code>${INSTANCE_ID}</code></td>
  <td>${INSTANCE_NAME}</td>
  <td><code>${INSTANCE_TYPE}</code></td>
  <td>${CPU_AVG}%</td>
  <td>${MEM_AVG}%</td>
  <td>${BADGE}</td>
  <td>${RECOMMENDATION}</td>
</tr>
EOF

  # Append CSV row
  echo "${INSTANCE_ID},${INSTANCE_NAME},${INSTANCE_TYPE},${CPU_AVG},${MEM_AVG},${SEVERITY},${RECOMMENDATION}" >> "${CSV_FILE}"

done < <(echo "${INSTANCES}" | jq -c '.[]')

# Close HTML
echo "</table><p style='color:#999;font-size:11px'>Generated by ec2_underutilized_report.sh at $(date)</p></body></html>" >> "${HTML_BODY}"

# ── Send email via sendmail (MIME multipart) ───────────────────────
log "Sending report to ${REPORT_EMAIL}..."

BOUNDARY="boundary_$(date +%s)"
CSV_BASENAME=$(basename "${CSV_FILE}")
CSV_B64=$(base64 -w 0 "${CSV_FILE}")

sendmail -t <<MAIL
To: ${REPORT_EMAIL}
From: ${FROM_EMAIL}
Subject: [AWS] EC2 Underutilization Report — $(date '+%Y-%m-%d') (${REGION})
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="${BOUNDARY}"

--${BOUNDARY}
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: 7bit

$(cat "${HTML_BODY}")

--${BOUNDARY}
Content-Type: text/csv; name="${CSV_BASENAME}"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="${CSV_BASENAME}"

${CSV_B64}
--${BOUNDARY}--
MAIL

log "Report sent successfully."
log "CSV saved to: ${CSV_FILE}"
log "=== Report complete ==="

IAM Policy — Least Privilege

Attach this policy to the EC2 instance profile running the script. It covers exactly what's needed — describe instances, read CloudWatch metrics, nothing else.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "EC2Describe",
      "Effect": "Allow",
      "Action": [
        "ec2:DescribeInstances",
        "ec2:DescribeTags"
      ],
      "Resource": "*"
    },
    {
      "Sid": "CloudWatchReadMetrics",
      "Effect": "Allow",
      "Action": [
        "cloudwatch:GetMetricStatistics",
        "cloudwatch:ListMetrics"
      ],
      "Resource": "*"
    }
  ]
}

Enabling Memory Metrics with CloudWatch Agent

By default, CloudWatch has no memory data for EC2 — memory usage is not exposed by the hypervisor. You need the CloudWatch Agent running inside the instance to push mem_used_percent. Without it, the script shows "N/A" for memory — still useful, just without memory signal.

CloudWatch Agent — Memory Metric Flow
EC2 Instance CloudWatch Agent reads /proc/meminfo CloudWatch Namespace: CWAgent Metric: mem_used_percent Dim: InstanceId Bash Script get-metric-statistics 7d Average Without CWAgent → script reports N/A for memory
# Quick install + configure CloudWatch Agent for memory metrics

# 1. Install
sudo yum install -y amazon-cloudwatch-agent   # Amazon Linux
# sudo apt install -y amazon-cloudwatch-agent  # Ubuntu/Debian

# 2. Minimal config — writes mem_used_percent every 60s
sudo tee /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json > /dev/null <<'EOF'
{
  "metrics": {
    "namespace": "CWAgent",
    "metrics_collected": {
      "mem": {
        "measurement": ["mem_used_percent"],
        "metrics_collection_interval": 60
      }
    },
    "append_dimensions": {
      "InstanceId": "${aws:InstanceId}"
    }
  }
}
EOF

# 3. Start and enable
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl \
  -a fetch-config \
  -m ec2 \
  -c file:/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json \
  -s

# Verify it's running
sudo systemctl status amazon-cloudwatch-agent

Deploy & Schedule

Get the script onto your reporting EC2 instance, make it executable, then wire it into cron for daily delivery:

# 1. Create directory and drop script in place
sudo mkdir -p /opt/ec2-report
sudo curl -o /opt/ec2-report/ec2_underutilized_report.sh [your-script-url]
sudo chmod +x /opt/ec2-report/ec2_underutilized_report.sh

# 2. Test run first — check output before scheduling
/opt/ec2-report/ec2_underutilized_report.sh

# Watch logs in real-time during test
tail -f /opt/ec2-report/ec2_report.log

# 3. Schedule via cron — every day at 8am
# (crontab -e, then add this line)
0 8 * * * /opt/ec2-report/ec2_underutilized_report.sh >> /opt/ec2-report/ec2_report.log 2>&1

# 4. Verify cron registered the job
crontab -l
⚠️ Keep at least 7 days of CloudWatch data before trusting the averages. A newly launched instance with 2 days of 90% CPU will look healthy — but it's just a short window. The thresholds are most reliable after 2 weeks of data.

What the Report Looks Like

Every morning, the team receives an email with a table like this. Rows are colour-coded by severity — red for critical, yellow for warnings, green for healthy. The CSV attachment lets you paste straight into a spreadsheet to track trends over weeks.

Sample HTML Report Output — Colour-Coded by Severity
From: aws-reports@yourcompany.com Subject: [AWS] EC2 Underutilization Report — 2025-03-06 Instance ID Name Type CPU Avg Mem Avg Status Recommendation i-0a1b2c3d4e old-staging-api m5.xlarge 2.31% 4.18% Critical Stop/Terminate i-0f9e8d7c6b prod-worker-2 c5.2xlarge 7.84% 15.22% Warning Downsize type i-0e5d4c3b2a prod-api-primary c5.xlarge 38.56% 54.11% Healthy No action needed i-0d4c3b2a1f test-batch-runner t3.medium 3.20% N/A Critical Stop/Terminate 📎 ec2_underutilized_20250306.csv

What You Get

🔍
CPU + Memory Dual Signal
CPU alone lies. An instance processing batch jobs can spike to 80% for 2 minutes then idle for an hour — yet average 8%. Memory gives you a second signal that confirms whether the instance is genuinely underloaded.
📧
HTML Email + CSV Attachment
The HTML table gives instant visual triage — red rows stand out immediately. The CSV attachment lets you paste into Sheets or Excel to track which instances have been actioned, week-over-week cost savings, and trend patterns.
⚙️
Configurable Thresholds
Four constants at the top of the script. Development environments might warrant a 20% CPU warning threshold; production billing services might need stricter checks. No code changes — just edit the variables.
📝
Full Audit Log
Every run appends timestamped entries to ec2_report.log. When your finance team asks why a specific instance was terminated last Tuesday, the log has the metric values that triggered the recommendation.
🛡️
Graceful Degradation
Instances without the CloudWatch Agent show "N/A" for memory and are still evaluated on CPU alone. The script never crashes on missing metrics — it reports what it can and keeps moving.
Cron-Scheduled
One crontab line runs the full pipeline every morning. Reports land in your inbox before standup. Engineers arrive knowing exactly which instances to action that day — no manual checks, no forgotten instances.
💡 Want to go further? Pipe the CSV into a Slack webhook so the report drops into your #finops channel. Or feed it into a DynamoDB table and build a week-over-week savings tracker. The script is a data source — the HTML email is just the simplest output format to start with.