9.7 KiB
MDP Databricks-Primary Architecture — First Steps
Context
Implementation of the Databricks-primary architecture (v8.0) on Azure for Greenfield's Modern Data Platform. Excluding disaster recovery and devtest environments for now. All Terraform scripts are placed in /Implementation/Terraform.
Day 1 Tasks — Previous Conversation Summary
D1-T1: Subscription Provisioning
- Terraform artifacts produced:
providers.tf(AzureRM withsubscription_id),backend.tf(state backend),variables.tf+terraform.tfvars(prod values). - Note: Subscription creation itself is a manual/ticket process — Terraform assumes it exists.
D1-T2: Resource Group Structure (6 RGs)
- Terraform:
networking/main.tflines 14–48 - Resource groups created:
rg-mdp-prod-networkrg-mdp-prod-databricksrg-mdp-prod-storagerg-mdp-prod-governancerg-mdp-prod-keyvaultrg-mdp-prod-monitoring
D1-T3: VNet Design & Deployment (starts D1, finishes D2)
- Terraform:
networking/main.tf - Main VNet:
10.100.0.0/16with 3 subnets:dbx-hostdbx-containerprivate-endpoints
- Transit VNet:
10.101.0.0/24 - Peerings: main↔transit, optional main↔hub
- NAT Gateway with public IP on both Databricks subnets
- NSG with full inbound/outbound rule set
- 4 private DNS zones with VNet links
- Outputs wired in
outputs.tf - Root module
prod/main.tfcalls the networking module
D1-T1: Manual Subscription Provisioning Guide
Before Terraform can run, the Azure subscription must exist. This is a manual process at Greenfield.
Step 1 — Request or Create the Subscription
Submit a request to Cloud Infra (ServiceNow/ITSM ticket) with these specifications:
| Parameter | Value |
|---|---|
| Subscription name | greenfield-mdp-prod |
| Offer type | Enterprise Agreement (EA) |
| Billing account | Greenfield's EA enrollment |
| Region affinity | Canada Central |
| Management group | Place under Greenfield's MG hierarchy (e.g., mg-greenfield-data or mg-greenfield-prod) |
Step 2 — Place Under the Correct Management Group
- Go to Azure Portal → Management groups
- Locate the target MG (e.g.,
mg-greenfield-prodormg-greenfield-data-platforms) - Move the new subscription under it (or ensure Cloud Infra places it there at creation time)
- Azure Policy assignments on that MG are inherited automatically
Step 3 — Register Required Resource Providers
az account set --subscription "greenfield-mdp-prod"
az provider register --namespace Microsoft.Databricks
az provider register --namespace Microsoft.Storage
az provider register --namespace Microsoft.KeyVault
az provider register --namespace Microsoft.Network
az provider register --namespace Microsoft.ManagedIdentity
az provider register --namespace Microsoft.Purview
az provider register --namespace Microsoft.OperationalInsights
az provider register --namespace Microsoft.Monitor
az provider register --namespace Microsoft.EventHub
Verify with:
az provider list --query "[?registrationState=='Registered'].namespace" -o table
Step 4 — Set Up RBAC on the Subscription
| Principal | Role | Scope |
|---|---|---|
sp-mdp-terraform-prod (Terraform SP) |
Contributor | Subscription |
sp-mdp-terraform-prod |
User Access Administrator | Subscription |
grp-mdp-account-admins |
Owner | Subscription |
grp-mdp-platform-engineers |
Contributor | Subscription |
SUB_ID=$(az account show --query id -o tsv)
az role assignment create --assignee <terraform-sp-app-id> \
--role "Contributor" --scope "/subscriptions/$SUB_ID"
az role assignment create --assignee <terraform-sp-app-id> \
--role "User Access Administrator" --scope "/subscriptions/$SUB_ID"
Step 5 — Create the Terraform State Storage Account
In a separate management/shared subscription to avoid circular dependency:
az account set --subscription "<shared-infra-subscription>"
az group create --name rg-mdp-tfstate --location canadacentral
az storage account create \
--name staccmdptfstate \
--resource-group rg-mdp-tfstate \
--location canadacentral \
--sku Standard_GRS \
--kind StorageV2 \
--min-tls-version TLS1_2 \
--allow-blob-public-access false
az storage container create \
--name tfstate \
--account-name staccmdptfstate
az storage account blob-service-properties update \
--account-name staccmdptfstate \
--enable-versioning true
Step 6 — Update terraform.tfvars
az account show --subscription "greenfield-mdp-prod" --query id -o tsv
Replace REPLACE-WITH-PROD-SUBSCRIPTION-ID in terraform.tfvars with the actual UUID.
Step 7 — Validate Terraform Authentication
cd environments/prod
az login --service-principal -u <app-id> -p <secret> --tenant <tenant-id>
terraform init
terraform plan
Handoff Summary
| Action | Who |
|---|---|
Create EA subscription greenfield-mdp-prod |
Cloud Infra (ticket) |
| Place under correct Management Group | Cloud Infra |
| Register resource providers | Cloud Infra or you (az CLI) |
| Create Terraform SP + RBAC | Cloud Infra / IAM team |
| Create tfstate storage account | You or DevOps |
Update terraform.tfvars |
You |
Validate terraform init/plan |
You |
Azure Policy Setup
Checking Existing Policies
az policy assignment list \
--scope "/subscriptions/52274961-4dda-48cc-a79b-3a0b74f8eaac" \
--query "[].{Name:displayName, Scope:scope, Policy:policyDefinitionId}" \
-o table
1. Allowed Locations
Built-in policy ID: e56962a6-4747-49cd-b67b-bf8b01975c4c
az policy assignment create \
--name "mdp-allowed-locations" \
--display-name "MDP - Restrict to Canada regions" \
--policy "e56962a6-4747-49cd-b67b-bf8b01975c4c" \
--scope "/subscriptions/52274961-4dda-48cc-a79b-3a0b74f8eaac" \
--params '{
"listOfAllowedLocations": {
"value": ["canadacentral", "canadaeast", "global"]
}
}'
globalis needed for tenant-level resources (Entra ID, Management Groups, Azure Policy itself).
2. Mandatory Tagging (Initiative Approach)
Built-in policy "Require a tag on resources": 871b6d14-10aa-478d-b590-94f262ecfa99
Quick approach — loop per tag:
SCOPE="/subscriptions/52274961-4dda-48cc-a79b-3a0b74f8eaac"
POLICY_ID="871b6d14-10aa-478d-b590-94f262ecfa99"
for TAG in project environment cost-center owner data-classification; do
az policy assignment create \
--name "mdp-require-tag-${TAG}" \
--display-name "MDP - Require tag: ${TAG}" \
--policy "$POLICY_ID" \
--scope "$SCOPE" \
--params "{\"tagName\": {\"value\": \"${TAG}\"}}" \
--enforcement-mode Default
done
Custom initiative approach (recommended):
Create mandatory-tags-initiative.json:
{
"properties": {
"displayName": "MDP - Mandatory Tags",
"policyType": "Custom",
"metadata": { "category": "Tags" },
"parameters": {},
"policyDefinitions": [
{
"policyDefinitionId": "/providers/Microsoft.Authorization/policyDefinitions/871b6d14-10aa-478d-b590-94f262ecfa99",
"parameters": { "tagName": { "value": "project" } }
},
{
"policyDefinitionId": "/providers/Microsoft.Authorization/policyDefinitions/871b6d14-10aa-478d-b590-94f262ecfa99",
"parameters": { "tagName": { "value": "environment" } }
},
{
"policyDefinitionId": "/providers/Microsoft.Authorization/policyDefinitions/871b6d14-10aa-478d-b590-94f262ecfa99",
"parameters": { "tagName": { "value": "cost-center" } }
},
{
"policyDefinitionId": "/providers/Microsoft.Authorization/policyDefinitions/871b6d14-10aa-478d-b590-94f262ecfa99",
"parameters": { "tagName": { "value": "owner" } }
},
{
"policyDefinitionId": "/providers/Microsoft.Authorization/policyDefinitions/871b6d14-10aa-478d-b590-94f262ecfa99",
"parameters": { "tagName": { "value": "data-classification" } }
}
]
}
}
az policy set-definition create \
--name "mdp-mandatory-tags" \
--display-name "MDP - Mandatory Tags" \
--definitions @mandatory-tags-initiative.json \
--subscription "52274961-4dda-48cc-a79b-3a0b74f8eaac"
az policy assignment create \
--name "mdp-mandatory-tags" \
--display-name "MDP - Mandatory Tags" \
--policy-set-definition "mdp-mandatory-tags" \
--scope "/subscriptions/52274961-4dda-48cc-a79b-3a0b74f8eaac"
3. Allowed VM SKUs
Built-in policy ID: cccc23c7-8427-4f53-ad12-b6a63eb452b3
az policy assignment create \
--name "mdp-allowed-vm-skus" \
--display-name "MDP - Allowed VM SKUs" \
--policy "cccc23c7-8427-4f53-ad12-b6a63eb452b3" \
--scope "/subscriptions/52274961-4dda-48cc-a79b-3a0b74f8eaac" \
--params '{
"listOfAllowedSKUs": {
"value": [
"Standard_DS3_v2",
"Standard_DS4_v2",
"Standard_DS5_v2",
"Standard_E8s_v5",
"Standard_E16s_v5",
"Standard_E32s_v5",
"Standard_L8s_v3",
"Standard_L16s_v3",
"Standard_NC8as_T4_v3",
"Standard_NC16as_T4_v3"
]
}
}'
SKU rationale:
DSseries — general-purpose Databricks workersEseries — memory-optimized (data engineering)Lseries — storage-optimized (Delta caching)NCseries — GPU clusters for ML
Policy Assignment Scope Guidance
| Policy | Recommended scope |
|---|---|
| Allowed locations | Management group (likely already exists at Greenfield) |
| Mandatory tags | Management group (likely already exists at Greenfield) |
| Allowed VM SKUs | Subscription (workload-specific) |
Enforcement Mode
For initial rollout, use --enforcement-mode DoNotEnforce to run in audit-only mode first. This flags non-compliant resources without blocking deployments. Switch to Default (enforce) after remediation.