You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

9.7 KiB

MDP Databricks-Primary Architecture — First Steps

Context

Implementation of the Databricks-primary architecture (v8.0) on Azure for Greenfield's Modern Data Platform. Excluding disaster recovery and devtest environments for now. All Terraform scripts are placed in /Implementation/Terraform.


Day 1 Tasks — Previous Conversation Summary

D1-T1: Subscription Provisioning

  • Terraform artifacts produced: providers.tf (AzureRM with subscription_id), backend.tf (state backend), variables.tf + terraform.tfvars (prod values).
  • Note: Subscription creation itself is a manual/ticket process — Terraform assumes it exists.

D1-T2: Resource Group Structure (6 RGs)

  • Terraform: networking/main.tf lines 1448
  • Resource groups created:
    • rg-mdp-prod-network
    • rg-mdp-prod-databricks
    • rg-mdp-prod-storage
    • rg-mdp-prod-governance
    • rg-mdp-prod-keyvault
    • rg-mdp-prod-monitoring

D1-T3: VNet Design & Deployment (starts D1, finishes D2)

  • Terraform: networking/main.tf
  • Main VNet: 10.100.0.0/16 with 3 subnets:
    • dbx-host
    • dbx-container
    • private-endpoints
  • Transit VNet: 10.101.0.0/24
  • Peerings: main↔transit, optional main↔hub
  • NAT Gateway with public IP on both Databricks subnets
  • NSG with full inbound/outbound rule set
  • 4 private DNS zones with VNet links
  • Outputs wired in outputs.tf
  • Root module prod/main.tf calls the networking module

D1-T1: Manual Subscription Provisioning Guide

Before Terraform can run, the Azure subscription must exist. This is a manual process at Greenfield.

Step 1 — Request or Create the Subscription

Submit a request to Cloud Infra (ServiceNow/ITSM ticket) with these specifications:

Parameter Value
Subscription name greenfield-mdp-prod
Offer type Enterprise Agreement (EA)
Billing account Greenfield's EA enrollment
Region affinity Canada Central
Management group Place under Greenfield's MG hierarchy (e.g., mg-greenfield-data or mg-greenfield-prod)

Step 2 — Place Under the Correct Management Group

  • Go to Azure Portal → Management groups
  • Locate the target MG (e.g., mg-greenfield-prod or mg-greenfield-data-platforms)
  • Move the new subscription under it (or ensure Cloud Infra places it there at creation time)
  • Azure Policy assignments on that MG are inherited automatically

Step 3 — Register Required Resource Providers

az account set --subscription "greenfield-mdp-prod"

az provider register --namespace Microsoft.Databricks
az provider register --namespace Microsoft.Storage
az provider register --namespace Microsoft.KeyVault
az provider register --namespace Microsoft.Network
az provider register --namespace Microsoft.ManagedIdentity
az provider register --namespace Microsoft.Purview
az provider register --namespace Microsoft.OperationalInsights
az provider register --namespace Microsoft.Monitor
az provider register --namespace Microsoft.EventHub

Verify with:

az provider list --query "[?registrationState=='Registered'].namespace" -o table

Step 4 — Set Up RBAC on the Subscription

Principal Role Scope
sp-mdp-terraform-prod (Terraform SP) Contributor Subscription
sp-mdp-terraform-prod User Access Administrator Subscription
grp-mdp-account-admins Owner Subscription
grp-mdp-platform-engineers Contributor Subscription
SUB_ID=$(az account show --query id -o tsv)

az role assignment create --assignee <terraform-sp-app-id> \
  --role "Contributor" --scope "/subscriptions/$SUB_ID"

az role assignment create --assignee <terraform-sp-app-id> \
  --role "User Access Administrator" --scope "/subscriptions/$SUB_ID"

Step 5 — Create the Terraform State Storage Account

In a separate management/shared subscription to avoid circular dependency:

az account set --subscription "<shared-infra-subscription>"

az group create --name rg-mdp-tfstate --location canadacentral

az storage account create \
  --name staccmdptfstate \
  --resource-group rg-mdp-tfstate \
  --location canadacentral \
  --sku Standard_GRS \
  --kind StorageV2 \
  --min-tls-version TLS1_2 \
  --allow-blob-public-access false

az storage container create \
  --name tfstate \
  --account-name staccmdptfstate

az storage account blob-service-properties update \
  --account-name staccmdptfstate \
  --enable-versioning true

Step 6 — Update terraform.tfvars

az account show --subscription "greenfield-mdp-prod" --query id -o tsv

Replace REPLACE-WITH-PROD-SUBSCRIPTION-ID in terraform.tfvars with the actual UUID.

Step 7 — Validate Terraform Authentication

cd environments/prod
az login --service-principal -u <app-id> -p <secret> --tenant <tenant-id>
terraform init
terraform plan

Handoff Summary

Action Who
Create EA subscription greenfield-mdp-prod Cloud Infra (ticket)
Place under correct Management Group Cloud Infra
Register resource providers Cloud Infra or you (az CLI)
Create Terraform SP + RBAC Cloud Infra / IAM team
Create tfstate storage account You or DevOps
Update terraform.tfvars You
Validate terraform init/plan You

Azure Policy Setup

Checking Existing Policies

az policy assignment list \
  --scope "/subscriptions/52274961-4dda-48cc-a79b-3a0b74f8eaac" \
  --query "[].{Name:displayName, Scope:scope, Policy:policyDefinitionId}" \
  -o table

1. Allowed Locations

Built-in policy ID: e56962a6-4747-49cd-b67b-bf8b01975c4c

az policy assignment create \
  --name "mdp-allowed-locations" \
  --display-name "MDP - Restrict to Canada regions" \
  --policy "e56962a6-4747-49cd-b67b-bf8b01975c4c" \
  --scope "/subscriptions/52274961-4dda-48cc-a79b-3a0b74f8eaac" \
  --params '{
    "listOfAllowedLocations": {
      "value": ["canadacentral", "canadaeast", "global"]
    }
  }'

global is needed for tenant-level resources (Entra ID, Management Groups, Azure Policy itself).

2. Mandatory Tagging (Initiative Approach)

Built-in policy "Require a tag on resources": 871b6d14-10aa-478d-b590-94f262ecfa99

Quick approach — loop per tag:

SCOPE="/subscriptions/52274961-4dda-48cc-a79b-3a0b74f8eaac"
POLICY_ID="871b6d14-10aa-478d-b590-94f262ecfa99"

for TAG in project environment cost-center owner data-classification; do
  az policy assignment create \
    --name "mdp-require-tag-${TAG}" \
    --display-name "MDP - Require tag: ${TAG}" \
    --policy "$POLICY_ID" \
    --scope "$SCOPE" \
    --params "{\"tagName\": {\"value\": \"${TAG}\"}}" \
    --enforcement-mode Default
done

Custom initiative approach (recommended):

Create mandatory-tags-initiative.json:

{
  "properties": {
    "displayName": "MDP - Mandatory Tags",
    "policyType": "Custom",
    "metadata": { "category": "Tags" },
    "parameters": {},
    "policyDefinitions": [
      {
        "policyDefinitionId": "/providers/Microsoft.Authorization/policyDefinitions/871b6d14-10aa-478d-b590-94f262ecfa99",
        "parameters": { "tagName": { "value": "project" } }
      },
      {
        "policyDefinitionId": "/providers/Microsoft.Authorization/policyDefinitions/871b6d14-10aa-478d-b590-94f262ecfa99",
        "parameters": { "tagName": { "value": "environment" } }
      },
      {
        "policyDefinitionId": "/providers/Microsoft.Authorization/policyDefinitions/871b6d14-10aa-478d-b590-94f262ecfa99",
        "parameters": { "tagName": { "value": "cost-center" } }
      },
      {
        "policyDefinitionId": "/providers/Microsoft.Authorization/policyDefinitions/871b6d14-10aa-478d-b590-94f262ecfa99",
        "parameters": { "tagName": { "value": "owner" } }
      },
      {
        "policyDefinitionId": "/providers/Microsoft.Authorization/policyDefinitions/871b6d14-10aa-478d-b590-94f262ecfa99",
        "parameters": { "tagName": { "value": "data-classification" } }
      }
    ]
  }
}
az policy set-definition create \
  --name "mdp-mandatory-tags" \
  --display-name "MDP - Mandatory Tags" \
  --definitions @mandatory-tags-initiative.json \
  --subscription "52274961-4dda-48cc-a79b-3a0b74f8eaac"

az policy assignment create \
  --name "mdp-mandatory-tags" \
  --display-name "MDP - Mandatory Tags" \
  --policy-set-definition "mdp-mandatory-tags" \
  --scope "/subscriptions/52274961-4dda-48cc-a79b-3a0b74f8eaac"

3. Allowed VM SKUs

Built-in policy ID: cccc23c7-8427-4f53-ad12-b6a63eb452b3

az policy assignment create \
  --name "mdp-allowed-vm-skus" \
  --display-name "MDP - Allowed VM SKUs" \
  --policy "cccc23c7-8427-4f53-ad12-b6a63eb452b3" \
  --scope "/subscriptions/52274961-4dda-48cc-a79b-3a0b74f8eaac" \
  --params '{
    "listOfAllowedSKUs": {
      "value": [
        "Standard_DS3_v2",
        "Standard_DS4_v2",
        "Standard_DS5_v2",
        "Standard_E8s_v5",
        "Standard_E16s_v5",
        "Standard_E32s_v5",
        "Standard_L8s_v3",
        "Standard_L16s_v3",
        "Standard_NC8as_T4_v3",
        "Standard_NC16as_T4_v3"
      ]
    }
  }'

SKU rationale:

  • DS series — general-purpose Databricks workers
  • E series — memory-optimized (data engineering)
  • L series — storage-optimized (Delta caching)
  • NC series — GPU clusters for ML

Policy Assignment Scope Guidance

Policy Recommended scope
Allowed locations Management group (likely already exists at Greenfield)
Mandatory tags Management group (likely already exists at Greenfield)
Allowed VM SKUs Subscription (workload-specific)

Enforcement Mode

For initial rollout, use --enforcement-mode DoNotEnforce to run in audit-only mode first. This flags non-compliant resources without blocking deployments. Switch to Default (enforce) after remediation.