You Spent Six Months on Your Azure Landing Zone and Skipped the Parts That Matter

Logan Hemphill
Apr 2
7 min read

Your landing zone has a perfect management group hierarchy. Hub spoke networking with a centralized firewall. Subscription vending. Custom Bicep modules for everything. And six months after your first production workload landed, nobody can tell you what it costs, who owns it, or why there's a VM running in East US 2 that no one recognizes.

I've reviewed landing zones for organizations that spent $200K+ on consultants to build the scaffold. The networking is pristine. The management groups look like the Microsoft reference architecture diagram. But there's no budget alert on any subscription. No tag enforcement beyond a "best effort" wiki page. No cost ownership model. And the monitoring setup is whatever the first workload team configured on their own.

Teams obsess over the 80% of the landing zone that consultants love to diagram and skip the 20% that actually determines whether the platform works at scale.

The design areas nobody wants to talk about

Microsoft's Cloud Adoption Framework defines eight design areas for landing zones: billing and tenant setup, identity and access management, resource organization, network topology, security, management, governance, and platform automation. Most teams nail the first four. They're concrete, visual, and easy to scope. You can draw them on a whiteboard.

The last four are where things fall apart. Management (monitoring and alerting), governance (policy, tagging, cost controls), security (beyond NSGs), and platform automation (keeping the landing zone itself up to date) are less exciting to plan and harder to get right. They require ongoing decisions, not one time designs. So teams defer them.

The result is predictable. Six months in, the platform has grown in ways nobody anticipated. Three teams have deployed workloads with different naming conventions. Nobody set up budget alerts, so the first time leadership hears about cloud costs is when the invoice arrives 30% higher than expected. Monitoring is a patchwork of individually configured Application Insights instances with no centralized alerting. And the "we'll add tagging later" decision has turned into 500 resources with no cost attribution.

Cost governance isn't a phase two item

I hear this constantly: "We'll add cost controls after we finish the landing zone build." That's backwards. Cost governance is the landing zone build. Without it, you're handing teams a credit card with no spending limits and no way to track who bought what.

Here's what a functional cost governance setup looks like in an Azure landing zone, and none of this is hard to implement:

Budget alerts on every subscription. You can deploy these with the Azure CLI in under five minutes per subscription:

az consumption budget create \
  --name "monthly-budget" \
  --amount 5000 \
  --time-grain Monthly \
  --start-date 2026-04-01 \
  --end-date 2027-04-01 \
  --resource-group "" \
  --category Cost \
  --notifications '{
    "Actual_GreaterThan_80_Percent": {
      "enabled": true,
      "operator": "GreaterThan",
      "threshold": 80,
      "contactEmails": ["cloudteam@yourcompany.com"],
      "thresholdType": "Actual"
    },
    "Forecasted_GreaterThan_100_Percent": {
      "enabled": true,
      "operator": "GreaterThan",
      "threshold": 100,
      "contactEmails": ["cloudteam@yourcompany.com", "finance@yourcompany.com"],
      "thresholdType": "Forecasted"
    }
  }'

This creates a $5,000 monthly budget that emails your cloud team when actual spend hits 80% and emails both cloud and finance when the forecast says you'll exceed 100%. Adjust the amount per subscription. The point is that it exists from day one, not after the first bill shock.

Tag enforcement through Azure Policy. Not a wiki page. Not a Confluence doc that says "please use these tags." An actual Deny policy assigned at the management group level that prevents anyone from deploying a resource without the required tags.

The built in policy "Require a tag and its value on resources" handles this. Assign it for your critical tags (CostCenter, Environment, Owner at minimum) at the top level management group. Teams will complain for about a week. Then it becomes muscle memory.

Cost ownership by subscription. Each application or team gets their own subscription. That subscription has a budget, an owner, and a tag that maps spend to a business unit. When the CFO asks why the bill went up $15K, you don't dig through 2,000 resources trying to figure out who deployed what. You look at the subscription level cost trend and call the owner.

Monitoring is the thing that saves you at 2am

I walked into an environment last year where the landing zone had been running for eight months. They'd deployed Azure Monitor agent to their VMs. That was it. No centralized Log Analytics workspace. No baseline alerts. No alert routing to an on call team. A production database ran out of storage on a Saturday and nobody knew until customers called Monday morning.

The CAF reference architecture includes a management subscription with a centralized Log Analytics workspace. That's the minimum. But the workspace alone doesn't help. You need alerts, and you need them before the first workload goes live.

Microsoft published the Azure Monitor Baseline Alerts (AMBA) project specifically for this. It's a set of policy initiatives you deploy at the management group level that automatically create metric alerts for common Azure resources. VM CPU, disk space, memory. SQL database DTU consumption. Storage account availability. Key Vault request failures. The basics that catch 80% of production incidents.

You can deploy AMBA through the Azure Portal Accelerator, GitHub Actions, or Azure Pipelines. It works with both new and existing landing zones. There's no good reason to skip it, yet most environments I review haven't deployed it.

Here's a KQL query you can run right now to see whether your environment has alert rules configured for your core resources:

resources
| where type == "microsoft.insights/metricalerts"
| extend targetResourceType = tostring(properties.scopes)
| summarize AlertCount = count() by subscriptionId
| join kind=leftouter (
    resourcecontainers
    | where type == "microsoft.resources/subscriptions"
    | project subscriptionId, subscriptionName = name
) on subscriptionId
| project subscriptionName, AlertCount
| order by AlertCount asc

If any production subscription shows zero or single digit alert counts, you've got a gap. That's a subscription running workloads with no safety net.

Identity is the design area that bites you hardest

In every security breach post mortem I've read involving Azure, identity was the entry point. Not networking. Not a missing firewall rule. Someone had too much access, credentials were exposed, or a service principal had owner permissions on a subscription because that's what "worked" during development and nobody scoped it down.

The landing zone identity design should answer three questions before the first workload deploys:

Who gets what access at what scope? Define your RBAC model at the management group and subscription level. Application teams should get Contributor on their subscription, not Owner. Platform teams get Reader across everything and specific elevated roles where needed. Nobody gets Owner at the management group level except break glass accounts.

How do service principals and managed identities work? Every workload that calls Azure APIs needs an identity. Managed identities should be the default. Service principals with client secrets should be the exception, and those secrets should live in Key Vault with expiration policies. I still find environments where teams have service principals with Owner role and secrets that don't expire. That's a breach waiting to happen.

What does your Conditional Access baseline look like? At minimum: require MFA for all users, block legacy authentication, require compliant devices for admin access, and restrict access from risky sign in locations. Microsoft Entra ID (formerly Azure AD) Conditional Access policies should be in place before anyone logs into the Azure portal.

The Terraform module you're using is about to be archived

One more thing most teams don't know yet. If you built your landing zone with the terraform-azurerm-caf-enterprise-scale module, it's now in extended support and gets archived on August 1, 2026. No new features. Only bug fixes and policy library updates until then.

Microsoft's replacement is Azure Verified Modules (AVM) for Platform Landing Zones. The new module is already generally available on the Terraform Registry as Azure/avm-ptn-alz/azurerm. Migration guidance is published at aka.ms/alz/tf/migrate, including tooling to generate Terraform import blocks so you can move state without recreating resources.

If you're planning landing zone work this year, start with AVM. If you're already on the old module, start planning the migration now. August 2026 sounds far away until you're trying to get change control approval in July.

For Bicep users: the classic ALZ Bicep modules have already been removed as of February 2026. Bicep AVM is the only supported path.

What to do this week

If your landing zone is already built and running, you aren't starting over. But you can close the gaps that most teams leave open:

Check whether every production subscription has a budget alert. If not, create one. The CLI command above works. Five minutes per subscription.

Check whether you've deployed AMBA or have equivalent baseline alerts. If your production subscriptions have no metric alerts, that's your next priority.

Run an Azure Resource Graph query to find resources without required tags:

resources
| where isnull(tags['CostCenter']) or isnull(tags['Environment']) or isnull(tags['Owner'])
| summarize UntaggedCount = count() by type
| order by UntaggedCount desc
| take 20

If that list is long, assign a tag enforcement policy now and start a remediation task for existing resources. Every day you wait, more untagged resources accumulate.

Review your RBAC assignments. Look for service principals with Owner or Contributor at broad scopes. Look for user accounts with permanent privileged roles that should be behind Privileged Identity Management.

None of these are six month projects. Each one is a day or two of work that prevents months of cleanup later. The landing zone framework is a starting point. What you build on top of it is what separates environments that scale from environments that turn into archaeology projects.