9.9 KiB
MDP Databricks-Primary Architecture — Second Steps
Context
Continuation of the Azure setup for Greenfield's Modern Data Platform (MDP), picking up from first_steps.md. This session covers Step 4 through Step 7 of the initial provisioning guide: RBAC setup, Key Vault creation, Entra ID group creation, subscription hierarchy, and Terraform initialization.
Step 4 — Set Up RBAC on the Subscription
The two az role assignment create commands use a placeholder <terraform-sp-app-id>. Retrieve the actual App ID first:
SP_APP_ID=$(az ad sp list --display-name "sp-mdp-terraform-prod" --query "[0].appId" -o tsv)
echo $SP_APP_ID # verify before assigning
Then run the assignments:
SUB_ID=$(az account show --query id -o tsv)
# Contributor — allows creating/managing all resources
az role assignment create \
--assignee $SP_APP_ID \
--role "Contributor" \
--scope "/subscriptions/$SUB_ID"
# User Access Administrator — needed for Terraform to assign roles (e.g., on storage, Key Vault, Databricks)
az role assignment create \
--assignee $SP_APP_ID \
--role "User Access Administrator" \
--scope "/subscriptions/$SUB_ID"
For group assignments, use Object IDs:
ADMIN_GRP_ID=$(az ad group show --group "grp-mdp-account-admins" --query id -o tsv)
ENG_GRP_ID=$(az ad group show --group "grp-mdp-platform-engineers" --query id -o tsv)
az role assignment create --assignee $ADMIN_GRP_ID --role "Owner" --scope "/subscriptions/$SUB_ID"
az role assignment create --assignee $ENG_GRP_ID --role "Contributor" --scope "/subscriptions/$SUB_ID"
Verify all assignments:
az role assignment list --scope "/subscriptions/$SUB_ID" \
--query "[].{Principal:principalName, Role:roleDefinitionName}" -o table
Note:
User Access Administratoron the Terraform SP is necessary because downstream Terraform modules (Unity Catalog storage, Key Vault, private endpoints) will create their own role assignments. Without it, thoseapplyruns will fail withAuthorizationFailed.
Service Principal — sp-mdp-terraform-prod
This is the Azure AD service principal (application identity) that Terraform uses to authenticate against Azure and deploy resources. It is a non-human "robot account" with scoped, auditable permissions — required for CI/CD pipelines and to avoid using personal user accounts.
Create the SP
az ad sp create-for-rbac \
--name "sp-mdp-terraform-prod" \
--role "Contributor" \
--scopes "/subscriptions/52274961-4dda-48cc-a79b-3a0b74f8eaac" \
--years 1
Output:
{
"appId": "...", # client_id → save this
"displayName": "sp-mdp-terraform-prod",
"password": "...", # client_secret → shown only once
"tenant": "..." # tenant_id → save this
}
Important: The
passwordis shown only once. Store it in Key Vault immediately.
az keyvault secret set \
--vault-name "kv-mdp-prod-001" \
--name "sp-mdp-terraform-prod-secret" \
--value "<password-from-above>"
Then assign User Access Administrator separately:
az role assignment create \
--assignee <appId-from-above> \
--role "User Access Administrator" \
--scope "/subscriptions/52274961-4dda-48cc-a79b-3a0b74f8eaac"
create-for-rbacalready assignsContributorat creation time — only the second role needs a separate command.
Key Vault Creation
Based on the existing resource group rg-mdp-prod-keyvault from D1-T2:
az keyvault create \
--name "kv-mdp-prod-001" \
--resource-group "rg-mdp-prod-keyvault" \
--location "canadacentral" \
--sku "premium" \
--enable-purge-protection true \
--retention-days 90 \
--enable-rbac-authorization true
SKU and option rationale:
premium— HSM-backed keys (OSFI expectation for secrets at rest)enable-purge-protection— prevents hard-delete; mandatory in regulated environmentsretention-days 90— soft-delete windowenable-rbac-authorization— use Azure RBAC instead of legacy access policies
Grant your own account access to manage secrets:
MY_ID=$(az ad signed-in-user show --query id -o tsv)
KV_ID=$(az keyvault show --name "kv-mdp-prod-001" --query id -o tsv)
az role assignment create \
--assignee $MY_ID \
--role "Key Vault Administrator" \
--scope $KV_ID
Key Vault names must be globally unique across Azure — adjust
kv-mdp-prod-001if it conflicts.
Entra ID Group Creation
grp-mdp-account-admins
az ad group create \
--display-name "grp-mdp-account-admins" \
--mail-nickname "grp-mdp-account-admins" \
--description "MDP production account administrators - Owner access on greenfield-mdp-prod subscription"
# Get group Object ID
GRP_ID=$(az ad group show --group "grp-mdp-account-admins" --query id -o tsv)
# Add a member
az ad group member add \
--group "grp-mdp-account-admins" \
--member-id $(az ad user show --id "user@greenfield.ca" --query id -o tsv)
# Assign Owner on subscription
az role assignment create \
--assignee $GRP_ID \
--role "Owner" \
--scope "/subscriptions/52274961-4dda-48cc-a79b-3a0b74f8eaac"
grp-mdp-platform-engineers
az ad group create \
--display-name "grp-mdp-platform-engineers" \
--mail-nickname "grp-mdp-platform-engineers" \
--description "MDP production platform engineers - Contributor access on greenfield-mdp-prod subscription"
GRP_ID=$(az ad group show --group "grp-mdp-platform-engineers" --query id -o tsv)
az role assignment create \
--assignee $GRP_ID \
--role "Contributor" \
--scope "/subscriptions/52274961-4dda-48cc-a79b-3a0b74f8eaac"
# Add a member
az ad group member add \
--group "grp-mdp-platform-engineers" \
--member-id $(az ad user show --id "engineer@greenfield.ca" --query id -o tsv)
Note: Group creation in Entra ID requires the User Administrator or Groups Administrator Entra role. If you get
Authorization_RequestDenied, this needs to go through your IAM team.
Subscription Hierarchy — Where to Place shared-infra
Recommended structure aligned with Azure Landing Zone / CAF conventions:
Tenant Root Group
└── mg-greenfield (root MG)
├── mg-greenfield-platform ← shared-infra goes here
│ ├── mg-greenfield-connectivity (hub network, ExpressRoute, DNS)
│ └── mg-greenfield-management (logging, monitoring, tfstate, shared tools)
├── mg-greenfield-prod
│ └── greenfield-mdp-prod (workload subscription)
├── mg-greenfield-nonprod
└── mg-greenfield-sandbox
The shared-infra subscription belongs under mg-greenfield-management. It hosts cross-cutting operational resources not tied to any workload:
- Terraform state storage (
staccmdptfstate) - Shared Key Vault for pipeline secrets
- Log Analytics workspace
- Azure Monitor / alerting
The tfstate account must exist before any workload subscription is deployed — placing it under the management MG keeps it lifecycle-independent and avoids circular dependency.
At Greenfield, a management/shared-services subscription may already exist in the EA enrollment. Confirm with Cloud Infra before creating a new one — you may only need a new resource group.
Step 6 — Update terraform.tfvars
Values updated
| Variable | Value |
|---|---|
subscription_id |
52274961-4dda-48cc-a79b-3a0b74f8eaac |
databricks_account_id |
Retrieve from accounts.azuredatabricks.net → top-right menu |
cost-center tag |
Greenfield-CDO |
Get the Databricks account ID from the account console (no az CLI command available for this):
https://accounts.azuredatabricks.net → top-right menu → your account name
backend.tf alignment
The backend.tf values must match what was created in Step 5:
| Parameter | Value |
|---|---|
resource_group_name |
rg-mdp-tfstate |
storage_account_name |
staccmdptfstate |
container_name |
tfstate |
Step 7 — Terraform Init Troubleshooting
Error: 403 AuthorizationPermissionMismatch on terraform init
Error: Failed to get existing workspaces: listing blobs: executing request: unexpected status 403
(403 This request is not authorized to perform this operation using this permission.)
AuthorizationPermissionMismatch
Root cause: backend.tf uses use_azuread_auth = true. Terraform authenticates to the storage account via Entra ID rather than a storage access key. The Contributor role alone only covers the management-plane (ARM) — it does not grant blob read/write access on the data-plane. A Storage Blob Data* role is required explicitly.
Fix:
MY_ID=$(az ad signed-in-user show --query id -o tsv)
SA_ID=$(az storage account show \
--name "staccmdptfstate" \
--resource-group "rg-mdp-tfstate" \
--query id -o tsv)
az role assignment create \
--assignee $MY_ID \
--role "Storage Blob Data Contributor" \
--scope $SA_ID
Wait ~2 minutes for role assignment propagation, then re-run terraform init.
Also grant the same role to the Terraform SP for CI/CD:
SP_APP_ID=$(az ad sp list --display-name "sp-mdp-terraform-prod" --query "[0].appId" -o tsv)
az role assignment create \
--assignee $SP_APP_ID \
--role "Storage Blob Data Contributor" \
--scope $SA_ID
Summary — Actions Completed This Session
| Action | Status |
|---|---|
| RBAC assignments for Terraform SP | Done |
Created sp-mdp-terraform-prod service principal |
Done |
Created kv-mdp-prod-001 Key Vault |
Done |
| Stored SP secret in Key Vault | Done |
Created grp-mdp-account-admins |
Done |
Created grp-mdp-platform-engineers |
Done |
Updated terraform.tfvars with subscription ID and cost-center |
Done |
Aligned backend.tf with actual storage account names |
Done |
Resolved terraform init 403 error (Storage Blob Data Contributor) |
Done |
databricks_account_id in terraform.tfvars |
Pending — retrieve from account console |