Automating EC2 Underutilization Reports with Bash & CloudWatch

The Idle Instance Tax

Every AWS account has them. EC2 instances that were spun up for a project, a test, a demo — and never sized down or terminated when the load disappeared. They sit at 3–5% CPU, consuming a Reserved Instance or racking up On-Demand charges every hour, every day, every month.

AWS Cost Explorer will tell you your bill is high. It won't tell you which specific instances are wasting money or what you should do about them. That gap is exactly what this script fills: a daily automated report that names names, shows the data, and recommends an action.

30%

avg EC2 waste in most AWS accounts

CloudWatch lookback window

metrics checked: CPU + memory

08:00

daily report delivery via cron

How the Script Works

The pipeline is simple: query every running EC2 instance → pull 7-day average CloudWatch metrics for CPU and memory → compare against configurable thresholds → generate an HTML email with a colour-coded table and a CSV attachment for tracking in spreadsheets.

EC2 Underutilization Report — Data Pipeline

Prerequisites

Three things need to be in place before the script runs correctly. The memory metric is optional — the script gracefully shows "N/A" for instances without the CloudWatch Agent installed.

AWS CLI configured IAM role with CloudWatch + EC2 read permissions sendmail installed on the host CloudWatch Agent (for memory metrics) jq installed

💡 Run this script on an EC2 instance with an instance profile — no AWS keys in config files. The IAM role only needs ec2:DescribeInstances and cloudwatch:GetMetricStatistics. Read-only, minimal blast radius.

Underutilization Thresholds

The script uses three severity tiers. Tune these constants at the top of the script to match your organisation's cost policies:

Tier	CPU (7d avg)	Memory (7d avg)	Recommendation
Critical	`< 5%`	`< 10%`	Terminate or stop immediately
Warning	`< 10%`	`< 20%`	Downsize to next smaller type
Healthy	`≥ 10%`	`≥ 20%`	No action needed

The Full Script

Drop this in /opt/ec2-report/ec2_underutilized_report.sh, set your email address and thresholds at the top, then run chmod +x on it. The script is fully self-contained — no external dependencies beyond the AWS CLI, jq, and sendmail.

Configuration block

Variables at the top — email, thresholds, lookback window, log path. Change these without touching the logic.

EC2 discovery

Queries all running instances in the configured region, extracting instance ID, type, and Name tag in one describe-instances call.

CloudWatch metric fetch

For each instance, pulls 7-day average CPU from AWS/EC2 namespace and memory from CWAgent namespace. Handles missing CWAgent gracefully.

Threshold evaluation + recommendation

Compares metrics against thresholds with bc for float arithmetic, assigns a severity level and a human-readable recommendation string.

HTML + CSV generation + email

Builds a colour-coded HTML table, writes a CSV file, then sends both via sendmail as a MIME multipart email with the CSV attached.

#!/bin/bash
# ─────────────────────────────────────────────────────────────────
# ec2_underutilized_report.sh
# Identifies underutilized EC2 instances via CloudWatch,
# generates an HTML email report + CSV attachment, logs everything.
# ─────────────────────────────────────────────────────────────────
set -euo pipefail

# ── Configuration ─────────────────────────────────────────────────
REPORT_EMAIL="devops-team@yourcompany.com"
FROM_EMAIL="aws-reports@yourcompany.com"
REGION="us-east-1"
LOOKBACK_DAYS=7
PERIOD_SECONDS=$(( LOOKBACK_DAYS * 86400 ))

# Underutilization thresholds
CPU_WARN_THRESHOLD=10    # % — downsize recommendation
CPU_CRIT_THRESHOLD=5     # % — terminate/stop recommendation
MEM_WARN_THRESHOLD=20    # % — downsize recommendation
MEM_CRIT_THRESHOLD=10    # % — terminate/stop recommendation

# Paths
REPORT_DIR="/opt/ec2-report"
LOG_FILE="${REPORT_DIR}/ec2_report.log"
CSV_FILE="${REPORT_DIR}/ec2_underutilized_$(date +%Y%m%d).csv"
HTML_BODY="${REPORT_DIR}/report_body.html"

mkdir -p "${REPORT_DIR}"

# ── Logging ────────────────────────────────────────────────────────
log() {
  echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*" | tee -a "${LOG_FILE}"
}

log "=== EC2 Underutilization Report started ==="

# ── Time window for CloudWatch queries ────────────────────────────
END_TIME=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
START_TIME=$(date -u -d "${LOOKBACK_DAYS} days ago" +"%Y-%m-%dT%H:%M:%SZ")

# ── Fetch all running EC2 instances ───────────────────────────────
log "Discovering running EC2 instances in ${REGION}..."

INSTANCES=$(aws ec2 describe-instances \
  --region "${REGION}" \
  --filters "Name=instance-state-name,Values=running" \
  --query "Reservations[].Instances[].[InstanceId,InstanceType,Tags[?Key=='Name'].Value|[0]]" \
  --output json)

INSTANCE_COUNT=$(echo "${INSTANCES}" | jq 'length')
log "Found ${INSTANCE_COUNT} running instances"

if [[ "${INSTANCE_COUNT}" -eq 0 ]]; then
  log "No running instances found — exiting."
  exit 0
fi

# ── CloudWatch metric helper ───────────────────────────────────────
get_metric() {
  local instance_id="$1"
  local namespace="$2"
  local metric_name="$3"
  local dim_name="$4"

  result=$(aws cloudwatch get-metric-statistics \
    --region "${REGION}" \
    --namespace "${namespace}" \
    --metric-name "${metric_name}" \
    --dimensions "Name=${dim_name},Value=${instance_id}" \
    --start-time "${START_TIME}" \
    --end-time "${END_TIME}" \
    --period "${PERIOD_SECONDS}" \
    --statistics Average \
    --query "Datapoints[0].Average" \
    --output text 2>/dev/null)

  # Return "N/A" if CloudWatch has no data (CWAgent not installed etc.)
  if [[ "${result}" == "None" ]] || [[ -z "${result}" ]]; then
    echo "N/A"
  else
    # Round to 2 decimal places
    printf "%.2f" "${result}"
  fi
}

# ── Severity classification ────────────────────────────────────────
get_severity() {
  local cpu="$1" mem="$2"

  if [[ "${cpu}" == "N/A" ]]; then
    echo "unknown"
    return
  fi

  local cpu_crit=$(echo "${cpu} < ${CPU_CRIT_THRESHOLD}" | bc -l)
  local cpu_warn=$(echo "${cpu} < ${CPU_WARN_THRESHOLD}" | bc -l)

  if [[ "${cpu_crit}" -eq 1 ]]; then
    echo "critical"
  elif [[ "${cpu_warn}" -eq 1 ]]; then
    echo "warning"
  else
    echo "healthy"
  fi
}

get_recommendation() {
  case "$1" in
    critical) echo "Consider stopping or terminating" ;;
    warning)  echo "Downsize to smaller instance type" ;;
    healthy)  echo "No action needed" ;;
    *)        echo "Install CloudWatch Agent for memory metrics" ;;
  esac
}

# ── Build HTML report ──────────────────────────────────────────────
log "Building HTML report..."

cat > "${HTML_BODY}" <<'HTML'
<!DOCTYPE html>
<html><head><meta charset="UTF-8"/>
<style>
  body{font-family:Arial,sans-serif;background:#f4f4f4;padding:20px}
  h2{color:#232f3e}
  table{border-collapse:collapse;width:100%;background:#fff;box-shadow:0 1px 3px rgba(0,0,0,.1)}
  th{background:#232f3e;color:#fff;padding:10px 14px;text-align:left;font-size:13px}
  td{padding:9px 14px;font-size:13px;border-bottom:1px solid #eee}
  .critical{background:#fff0f0;color:#c0392b;font-weight:bold}
  .warning{background:#fffbe6;color:#d68910}
  .healthy{background:#f0fff4;color:#1e8449}
  .unknown{background:#f8f8f8;color:#888}
  .badge{padding:2px 8px;border-radius:10px;font-size:11px}
  .badge-crit{background:#fadbd8;color:#c0392b}
  .badge-warn{background:#fef9e7;color:#d68910}
  .badge-ok{background:#d5f5e3;color:#1e8449}
</style></head><body>
HTML

echo "<h2>EC2 Underutilization Report — $(date '+%B %d, %Y')</h2>" >> "${HTML_BODY}"
echo "<p>Region: <strong>${REGION}</strong> &nbsp;|&nbsp; Lookback: <strong>${LOOKBACK_DAYS} days</strong> &nbsp;|&nbsp; Instances scanned: <strong>${INSTANCE_COUNT}</strong></p>" >> "${HTML_BODY}"
echo "<table><tr><th>Instance ID</th><th>Name</th><th>Type</th><th>CPU Avg (7d)</th><th>Memory Avg (7d)</th><th>Status</th><th>Recommendation</th></tr>" >> "${HTML_BODY}"

# CSV header
echo "InstanceId,Name,InstanceType,CPU_7d_Avg,Memory_7d_Avg,Status,Recommendation" > "${CSV_FILE}"

# ── Per-instance loop ──────────────────────────────────────────────
while IFS= read -r instance; do
  INSTANCE_ID=$(echo "${instance}" | jq -r '.[0]')
  INSTANCE_TYPE=$(echo "${instance}" | jq -r '.[1]')
  INSTANCE_NAME=$(echo "${instance}" | jq -r '.[2] // "Unnamed"')

  log "Processing ${INSTANCE_ID} (${INSTANCE_NAME} / ${INSTANCE_TYPE})"

  # Pull metrics
  CPU_AVG=$(get_metric "${INSTANCE_ID}" "AWS/EC2" "CPUUtilization" "InstanceId")
  MEM_AVG=$(get_metric "${INSTANCE_ID}" "CWAgent"   "mem_used_percent" "InstanceId")

  SEVERITY=$(get_severity "${CPU_AVG}" "${MEM_AVG}")
  RECOMMENDATION=$(get_recommendation "${SEVERITY}")

  # HTML badge styling
  case "${SEVERITY}" in
    critical) BADGE='<span class="badge badge-crit">Critical</span>'  ;;
    warning)  BADGE='<span class="badge badge-warn">Warning</span>'   ;;
    healthy)  BADGE='<span class="badge badge-ok">Healthy</span>'    ;;
    *)        BADGE='<span class="badge">Unknown</span>'              ;;
  esac

  # Append HTML table row
  cat >> "${HTML_BODY}" <<EOF
<tr class="${SEVERITY}">
  <td><code>${INSTANCE_ID}</code></td>
  <td>${INSTANCE_NAME}</td>
  <td><code>${INSTANCE_TYPE}</code></td>
  <td>${CPU_AVG}%</td>
  <td>${MEM_AVG}%</td>
  <td>${BADGE}</td>
  <td>${RECOMMENDATION}</td>
</tr>
EOF

  # Append CSV row
  echo "${INSTANCE_ID},${INSTANCE_NAME},${INSTANCE_TYPE},${CPU_AVG},${MEM_AVG},${SEVERITY},${RECOMMENDATION}" >> "${CSV_FILE}"

done < <(echo "${INSTANCES}" | jq -c '.[]')

# Close HTML
echo "</table><p style='color:#999;font-size:11px'>Generated by ec2_underutilized_report.sh at $(date)</p></body></html>" >> "${HTML_BODY}"

# ── Send email via sendmail (MIME multipart) ───────────────────────
log "Sending report to ${REPORT_EMAIL}..."

BOUNDARY="boundary_$(date +%s)"
CSV_BASENAME=$(basename "${CSV_FILE}")
CSV_B64=$(base64 -w 0 "${CSV_FILE}")

sendmail -t <<MAIL
To: ${REPORT_EMAIL}
From: ${FROM_EMAIL}
Subject: [AWS] EC2 Underutilization Report — $(date '+%Y-%m-%d') (${REGION})
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="${BOUNDARY}"

--${BOUNDARY}
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: 7bit

$(cat "${HTML_BODY}")

--${BOUNDARY}
Content-Type: text/csv; name="${CSV_BASENAME}"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="${CSV_BASENAME}"

${CSV_B64}
--${BOUNDARY}--
MAIL

log "Report sent successfully."
log "CSV saved to: ${CSV_FILE}"
log "=== Report complete ==="

IAM Policy — Least Privilege

Attach this policy to the EC2 instance profile running the script. It covers exactly what's needed — describe instances, read CloudWatch metrics, nothing else.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "EC2Describe",
      "Effect": "Allow",
      "Action": [
        "ec2:DescribeInstances",
        "ec2:DescribeTags"
      ],
      "Resource": "*"
    },
    {
      "Sid": "CloudWatchReadMetrics",
      "Effect": "Allow",
      "Action": [
        "cloudwatch:GetMetricStatistics",
        "cloudwatch:ListMetrics"
      ],
      "Resource": "*"
    }
  ]
}

Enabling Memory Metrics with CloudWatch Agent

By default, CloudWatch has no memory data for EC2 — memory usage is not exposed by the hypervisor. You need the CloudWatch Agent running inside the instance to push mem_used_percent. Without it, the script shows "N/A" for memory — still useful, just without memory signal.

CloudWatch Agent — Memory Metric Flow

# Quick install + configure CloudWatch Agent for memory metrics

# 1. Install
sudo yum install -y amazon-cloudwatch-agent   # Amazon Linux
# sudo apt install -y amazon-cloudwatch-agent  # Ubuntu/Debian

# 2. Minimal config — writes mem_used_percent every 60s
sudo tee /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json > /dev/null <<'EOF'
{
  "metrics": {
    "namespace": "CWAgent",
    "metrics_collected": {
      "mem": {
        "measurement": ["mem_used_percent"],
        "metrics_collection_interval": 60
      }
    },
    "append_dimensions": {
      "InstanceId": "${aws:InstanceId}"
    }
  }
}
EOF

# 3. Start and enable
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl \
  -a fetch-config \
  -m ec2 \
  -c file:/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json \
  -s

# Verify it's running
sudo systemctl status amazon-cloudwatch-agent

Deploy & Schedule

Get the script onto your reporting EC2 instance, make it executable, then wire it into cron for daily delivery:

# 1. Create directory and drop script in place
sudo mkdir -p /opt/ec2-report
sudo curl -o /opt/ec2-report/ec2_underutilized_report.sh [your-script-url]
sudo chmod +x /opt/ec2-report/ec2_underutilized_report.sh

# 2. Test run first — check output before scheduling
/opt/ec2-report/ec2_underutilized_report.sh

# Watch logs in real-time during test
tail -f /opt/ec2-report/ec2_report.log

# 3. Schedule via cron — every day at 8am
# (crontab -e, then add this line)
0 8 * * * /opt/ec2-report/ec2_underutilized_report.sh >> /opt/ec2-report/ec2_report.log 2>&1

# 4. Verify cron registered the job
crontab -l

⚠️ Keep at least 7 days of CloudWatch data before trusting the averages. A newly launched instance with 2 days of 90% CPU will look healthy — but it's just a short window. The thresholds are most reliable after 2 weeks of data.

What the Report Looks Like

Every morning, the team receives an email with a table like this. Rows are colour-coded by severity — red for critical, yellow for warnings, green for healthy. The CSV attachment lets you paste straight into a spreadsheet to track trends over weeks.

Sample HTML Report Output — Colour-Coded by Severity

What You Get

🔍

CPU + Memory Dual Signal

CPU alone lies. An instance processing batch jobs can spike to 80% for 2 minutes then idle for an hour — yet average 8%. Memory gives you a second signal that confirms whether the instance is genuinely underloaded.

📧

HTML Email + CSV Attachment

The HTML table gives instant visual triage — red rows stand out immediately. The CSV attachment lets you paste into Sheets or Excel to track which instances have been actioned, week-over-week cost savings, and trend patterns.

⚙️

Configurable Thresholds

Four constants at the top of the script. Development environments might warrant a 20% CPU warning threshold; production billing services might need stricter checks. No code changes — just edit the variables.

📝

Full Audit Log

Every run appends timestamped entries to ec2_report.log. When your finance team asks why a specific instance was terminated last Tuesday, the log has the metric values that triggered the recommendation.

🛡️

Graceful Degradation

Instances without the CloudWatch Agent show "N/A" for memory and are still evaluated on CPU alone. The script never crashes on missing metrics — it reports what it can and keeps moving.

⏰

Cron-Scheduled

One crontab line runs the full pipeline every morning. Reports land in your inbox before standup. Engineers arrive knowing exactly which instances to action that day — no manual checks, no forgotten instances.

💡 Want to go further? Pipe the CSV into a Slack webhook so the report drops into your #finops channel. Or feed it into a DynamoDB table and build a week-over-week savings tracker. The script is a data source — the HTML email is just the simplest output format to start with.

Stop Paying for Idle EC2: Automate Underutilization Reports with Bash & CloudWatch

The Idle Instance Tax

How the Script Works

Prerequisites

Underutilization Thresholds

The Full Script

IAM Policy — Least Privilege

Enabling Memory Metrics with CloudWatch Agent

Deploy & Schedule

What the Report Looks Like

What You Get