Azure Advisor vs. a Real Cost Review

Logan Hemphill
Mar 17
8 min read

The short version: Azure Advisor recommends VM downsizes based on 7 days of average utilization. A real cost review uses 90 days of data, looks at P95 instead of averages, checks application requirements before changing anything, and accounts for seasonal workload patterns. Advisor tells you which VMs to investigate. A cost review tells you which ones are actually safe to resize, which ones need more analysis, and which ones you should leave alone.

Every Azure environment has a "Right Size" tab in Advisor. It looks at CPU and memory utilization over the last 7 days, compares it against the SKU you're running, and recommends a smaller VM if utilization is low. It even shows an estimated monthly savings number next to each recommendation.

Most teams I talk to handle this one of two ways. They either ignore it entirely because nobody has time, or they click "Apply" on every recommendation because the savings numbers look good. Both approaches leave money on the table. The first obviously. The second because blindly following Advisor creates outages that cost more to fix than the savings were worth.

This article breaks down what Advisor actually gives you, what it misses, and what a proper cost review process looks like when you compare the two side by side.

What Azure Advisor Actually Gives You

Advisor evaluates your VM utilization over a 7 day lookback window (configurable up to 21 days in some cases). It checks average CPU utilization and, depending on the recommendation type, memory and network metrics. If utilization falls below a threshold, it recommends shutting down the VM or resizing it to a smaller SKU.

Each recommendation includes an estimated monthly savings figure. In environments with dozens or hundreds of VMs, these numbers can add up to tens of thousands per month.

That savings estimate is what gets people's attention. It's also where the problems start.

Azure Advisor's right sizing recommendations. The estimated savings numbers are what get teams to click "Apply" without looking deeper.

Advisor is good at two things. It finds VMs that are clearly idle, sitting at 0 to 2% CPU for weeks because someone forgot about them. And it gives you a prioritized list when you have 500 VMs and don't know where to start looking. It is free. It is already there. For teams with zero cost optimization practice, it is a reasonable place to begin.

The problem is when "a reasonable place to begin" becomes the entire strategy.

Where Advisor Stops and a Real Cost Review Starts

7 Days of Data vs. 90 Days of Data

What Advisor does: Looks at the last 7 days of average utilization.

What a cost review does: Pulls 90 days of utilization data to capture weekly cycles, monthly cycles, and seasonal patterns before making any recommendation.

Seven days of data tells you what happened last week. It does not tell you what your workload actually looks like.

A VM that runs batch processing every Saturday night might sit at 5% CPU Monday through Friday and spike to 95% on the weekend. If Advisor evaluates that VM on a Tuesday, it sees five days of near zero utilization and flags it for a downsize. Follow that recommendation and your Saturday batch jobs fail.

The same problem applies to monthly cycles. Finance workloads spike at month end. Reporting servers spike during quarterly closes. Marketing workloads spike during campaign launches. A 7 day window misses all of it.

You can pull 90 day utilization data from Azure Monitor with a KQL query:

Perf
| where ObjectName == "Processor" and CounterName == "% Processor Time"
| where TimeGenerated > ago(90d)
| summarize AvgCPU = avg(CounterValue), MaxCPU = max(CounterValue), P95CPU = percentile(CounterValue, 95) by Computer, bin(TimeGenerated, 1d)
| order by Computer asc, TimeGenerated asc

This gives you daily average, max, and 95th percentile CPU for every VM over 90 days. That is the minimum amount of data you need before making a resize decision.

View your 90 day CPU data in Log Analytics. Notice the spikes that a 7 day average would smooth over completely.

Averages vs. Percentiles

What Advisor does: Uses average utilization to decide if a VM is oversized.

What a cost review does: Uses P95 (95th percentile) to understand what the workload actually needs during its busiest moments.

Averages are the worst possible metric for right sizing decisions.

A server averaging 40% memory utilization sounds like it has room to shrink. But if that server hits 92% during peak hours twice a day, a downsize will cause out of memory crashes in production. The average looks fine. The peaks tell a different story.

CPU averages have the same problem. A VM averaging 15% CPU might be running a web application that spikes to 80% during request bursts. Average utilization says "oversized." Actual workload behavior says "this VM needs the capacity it has."

P95 tells you what utilization looks like during the busiest 5% of the time. If P95 is low, you have real headroom to resize. If P95 is high and the average is low, the VM is bursty and needs its current size.

For memory, the query looks like this:

Perf
| where ObjectName == "Memory" and CounterName == "% Committed Bytes In Use"
| where TimeGenerated > ago(90d)
| summarize AvgMem = avg(CounterValue), MaxMem = max(CounterValue), P95Mem = percentile(CounterValue, 95) by Computer, bin(TimeGenerated, 1d)
| order by Computer asc, TimeGenerated asc

If P95 memory is above 80%, do not downsize that VM regardless of what the average says.

Utilization Numbers vs. Application Context

What Advisor does: Sees CPU and memory numbers. Recommends a smaller SKU if they're low.

What a cost review does: Checks what the VM is actually running, what SKU capabilities the application depends on, and whether the constraint is CPU, memory, disk, or network.

A D4s_v3 running a database might show low CPU but require fast local SSD (temp disk) for tempdb performance. Resize it to a B series VM that lacks local SSD and your database performance tanks.

An application server might need accelerated networking, which is only available on certain VM families and sizes. Advisor doesn't check for this. It sees low CPU and recommends a smaller SKU that might not support the networking your application depends on.

Some workloads are memory bound. A VM running an in memory cache or a Java application with a large heap might show 8% CPU and 85% memory. Advisor's CPU based recommendation will tell you to shrink a VM that is already the right size for its actual workload.

The VM overview blade shows what capabilities your current SKU supports. Check this before resizing to confirm the new SKU still covers what your application needs.

Point in Time Snapshot vs. Business Calendar Awareness

What Advisor does: Evaluates utilization right now, with no awareness of when your busy season is.

What a cost review does: Accounts for seasonal patterns so you don't downsize in January and break things in March.

If you run Advisor in January and your business peaks in March, you're making resize decisions based on your quietest month. Downsize everything in January and you'll spend March scrambling to upsize it all while users complain about performance.

This is common in retail (holiday spikes), healthcare (open enrollment), education (registration periods), and financial services (quarter end processing). The workload is genuinely low most of the year. That doesn't mean the capacity isn't needed.

For workloads with predictable seasonal patterns, autoscaling is a better answer than static right sizing. Scale up when you need it, scale down when you don't, and stop paying for peak capacity year round.

Individual VMs vs. System Dependencies

What Advisor does: Evaluates each VM independently based on its own utilization metrics.

What a cost review does: Maps dependencies between resources before recommending any changes.

Your environment is a system. Advisor doesn't see it that way.

Downsizing one VM in a load balanced cluster can shift traffic unevenly and create a bottleneck on the remaining nodes. Shrinking a database server can slow queries that affect every application server connected to it. Resizing a domain controller during a busy period can cause authentication delays across the network.

If a VM is part of a cluster, the cluster needs to be evaluated as a whole. If a VM runs a shared service like DNS, authentication, or a database, any performance change will ripple outward to everything that depends on it.

The Process Behind a Real Cost Review

Right sizing saves real money. In most mid market Azure environments, 20 to 40% of spend is wasted on oversized resources. That can be tens of thousands of dollars per month.

But the savings need to stick. Here's the process that gets results without creating new problems.

Step 1: Export Your VM Inventory

Get a full list of every VM, its SKU, its monthly cost, and the subscription and resource group it lives in. Azure Resource Graph makes this easy:

resources
| where type == "microsoft.compute/virtualmachines"
| extend vmSize = properties.hardwareProfile.vmSize
| project name, resourceGroup, subscriptionId, vmSize, location

Resource Graph Explorer gives you a full VM inventory in seconds. Export this and pair it with your utilization data for a complete right sizing worksheet.

Step 2: Pull 90 Day Utilization Data

Use the KQL queries from earlier in this article to pull CPU and memory utilization for every VM over 90 days. Look at averages, P95, and max values. Export this to a spreadsheet alongside your inventory so you can see cost and utilization side by side.

Step 3: Categorize Before You Resize

Sort your VMs into three categories:

Clear downsizes. Low average CPU, low P95 CPU, low memory, no special SKU requirements, no seasonal patterns. These are safe to resize. Do them first.

Needs investigation. Low averages but high peaks, special SKU requirements, part of a cluster, or seasonal workloads. These need application context before you make a decision.

Already right sized. Utilization is healthy across all metrics. Leave these alone.

Sort every VM into one of these three categories before making any changes. The "needs investigation" bucket is where most of your time should go.

Step 4: Resize with a Rollback Plan

When you resize a VM, know how to put it back. Document the original SKU. Resize during a maintenance window. Monitor performance for 48 to 72 hours after the change. If something degrades, roll back immediately.

Right sizing is not permanent. Workloads change. A VM that's correctly sized today might need adjustment in six months. Build a quarterly review into your operations calendar so right sizing becomes a recurring discipline, not a one time cleanup.

When Advisor Is Enough and When It Isn't

Advisor is enough when you have a small environment with stable workloads, no seasonal patterns, and standard VM configurations. If you're running 10 VMs that all do the same thing year round, Advisor's recommendations are probably fine.

Advisor is not enough when you have dozens or hundreds of VMs across multiple environments with mixed workloads, clustered resources, seasonal traffic patterns, and VMs that depend on specific SKU capabilities. That describes most mid market Azure environments.

The gap between those two scenarios is where most companies lose money. They have the complex environment but they're using the simple tool. Advisor flags 40 VMs for downsizing. Maybe 15 of those are safe to resize. Maybe 10 need investigation first. Maybe 15 should be left alone. Without the process to sort them, you either skip all 40 or gamble on all 40.

I help mid market companies close the gap between what Advisor recommends and what's actually safe to do. If your team has Advisor recommendations sitting untouched because nobody's sure which ones to act on, I do free 30 minute Azure cost reviews.

Book a free Azure cost review