You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

9.9 KiB

MDP Databricks-Primary Architecture — Second Steps

Context

Continuation of the Azure setup for Greenfield's Modern Data Platform (MDP), picking up from first_steps.md. This session covers Step 4 through Step 7 of the initial provisioning guide: RBAC setup, Key Vault creation, Entra ID group creation, subscription hierarchy, and Terraform initialization.


Step 4 — Set Up RBAC on the Subscription

The two az role assignment create commands use a placeholder <terraform-sp-app-id>. Retrieve the actual App ID first:

SP_APP_ID=$(az ad sp list --display-name "sp-mdp-terraform-prod" --query "[0].appId" -o tsv)
echo $SP_APP_ID  # verify before assigning

Then run the assignments:

SUB_ID=$(az account show --query id -o tsv)

# Contributor — allows creating/managing all resources
az role assignment create \
  --assignee $SP_APP_ID \
  --role "Contributor" \
  --scope "/subscriptions/$SUB_ID"

# User Access Administrator — needed for Terraform to assign roles (e.g., on storage, Key Vault, Databricks)
az role assignment create \
  --assignee $SP_APP_ID \
  --role "User Access Administrator" \
  --scope "/subscriptions/$SUB_ID"

For group assignments, use Object IDs:

ADMIN_GRP_ID=$(az ad group show --group "grp-mdp-account-admins" --query id -o tsv)
ENG_GRP_ID=$(az ad group show --group "grp-mdp-platform-engineers" --query id -o tsv)

az role assignment create --assignee $ADMIN_GRP_ID --role "Owner" --scope "/subscriptions/$SUB_ID"
az role assignment create --assignee $ENG_GRP_ID --role "Contributor" --scope "/subscriptions/$SUB_ID"

Verify all assignments:

az role assignment list --scope "/subscriptions/$SUB_ID" \
  --query "[].{Principal:principalName, Role:roleDefinitionName}" -o table

Note: User Access Administrator on the Terraform SP is necessary because downstream Terraform modules (Unity Catalog storage, Key Vault, private endpoints) will create their own role assignments. Without it, those apply runs will fail with AuthorizationFailed.


Service Principal — sp-mdp-terraform-prod

This is the Azure AD service principal (application identity) that Terraform uses to authenticate against Azure and deploy resources. It is a non-human "robot account" with scoped, auditable permissions — required for CI/CD pipelines and to avoid using personal user accounts.

Create the SP

az ad sp create-for-rbac \
  --name "sp-mdp-terraform-prod" \
  --role "Contributor" \
  --scopes "/subscriptions/52274961-4dda-48cc-a79b-3a0b74f8eaac" \
  --years 1

Output:

{
  "appId": "...",      # client_id   save this
  "displayName": "sp-mdp-terraform-prod",
  "password": "...",   # client_secret  shown only once
  "tenant": "..."      # tenant_id   save this
}

Important: The password is shown only once. Store it in Key Vault immediately.

az keyvault secret set \
  --vault-name "kv-mdp-prod-001" \
  --name "sp-mdp-terraform-prod-secret" \
  --value "<password-from-above>"

Then assign User Access Administrator separately:

az role assignment create \
  --assignee <appId-from-above> \
  --role "User Access Administrator" \
  --scope "/subscriptions/52274961-4dda-48cc-a79b-3a0b74f8eaac"

create-for-rbac already assigns Contributor at creation time — only the second role needs a separate command.


Key Vault Creation

Based on the existing resource group rg-mdp-prod-keyvault from D1-T2:

az keyvault create \
  --name "kv-mdp-prod-001" \
  --resource-group "rg-mdp-prod-keyvault" \
  --location "canadacentral" \
  --sku "premium" \
  --enable-purge-protection true \
  --retention-days 90 \
  --enable-rbac-authorization true

SKU and option rationale:

  • premium — HSM-backed keys (OSFI expectation for secrets at rest)
  • enable-purge-protection — prevents hard-delete; mandatory in regulated environments
  • retention-days 90 — soft-delete window
  • enable-rbac-authorization — use Azure RBAC instead of legacy access policies

Grant your own account access to manage secrets:

MY_ID=$(az ad signed-in-user show --query id -o tsv)
KV_ID=$(az keyvault show --name "kv-mdp-prod-001" --query id -o tsv)

az role assignment create \
  --assignee $MY_ID \
  --role "Key Vault Administrator" \
  --scope $KV_ID

Key Vault names must be globally unique across Azure — adjust kv-mdp-prod-001 if it conflicts.


Entra ID Group Creation

grp-mdp-account-admins

az ad group create \
  --display-name "grp-mdp-account-admins" \
  --mail-nickname "grp-mdp-account-admins" \
  --description "MDP production account administrators - Owner access on greenfield-mdp-prod subscription"

# Get group Object ID
GRP_ID=$(az ad group show --group "grp-mdp-account-admins" --query id -o tsv)

# Add a member
az ad group member add \
  --group "grp-mdp-account-admins" \
  --member-id $(az ad user show --id "user@greenfield.ca" --query id -o tsv)

# Assign Owner on subscription
az role assignment create \
  --assignee $GRP_ID \
  --role "Owner" \
  --scope "/subscriptions/52274961-4dda-48cc-a79b-3a0b74f8eaac"

grp-mdp-platform-engineers

az ad group create \
  --display-name "grp-mdp-platform-engineers" \
  --mail-nickname "grp-mdp-platform-engineers" \
  --description "MDP production platform engineers - Contributor access on greenfield-mdp-prod subscription"

GRP_ID=$(az ad group show --group "grp-mdp-platform-engineers" --query id -o tsv)

az role assignment create \
  --assignee $GRP_ID \
  --role "Contributor" \
  --scope "/subscriptions/52274961-4dda-48cc-a79b-3a0b74f8eaac"

# Add a member
az ad group member add \
  --group "grp-mdp-platform-engineers" \
  --member-id $(az ad user show --id "engineer@greenfield.ca" --query id -o tsv)

Note: Group creation in Entra ID requires the User Administrator or Groups Administrator Entra role. If you get Authorization_RequestDenied, this needs to go through your IAM team.


Subscription Hierarchy — Where to Place shared-infra

Recommended structure aligned with Azure Landing Zone / CAF conventions:

Tenant Root Group
└── mg-greenfield (root MG)
    ├── mg-greenfield-platform          ← shared-infra goes here
    │   ├── mg-greenfield-connectivity  (hub network, ExpressRoute, DNS)
    │   └── mg-greenfield-management    (logging, monitoring, tfstate, shared tools)
    ├── mg-greenfield-prod
    │   └── greenfield-mdp-prod         (workload subscription)
    ├── mg-greenfield-nonprod
    └── mg-greenfield-sandbox

The shared-infra subscription belongs under mg-greenfield-management. It hosts cross-cutting operational resources not tied to any workload:

  • Terraform state storage (staccmdptfstate)
  • Shared Key Vault for pipeline secrets
  • Log Analytics workspace
  • Azure Monitor / alerting

The tfstate account must exist before any workload subscription is deployed — placing it under the management MG keeps it lifecycle-independent and avoids circular dependency.

At Greenfield, a management/shared-services subscription may already exist in the EA enrollment. Confirm with Cloud Infra before creating a new one — you may only need a new resource group.


Step 6 — Update terraform.tfvars

Values updated

Variable Value
subscription_id 52274961-4dda-48cc-a79b-3a0b74f8eaac
databricks_account_id Retrieve from accounts.azuredatabricks.net → top-right menu
cost-center tag Greenfield-CDO

Get the Databricks account ID from the account console (no az CLI command available for this):

https://accounts.azuredatabricks.net → top-right menu → your account name

backend.tf alignment

The backend.tf values must match what was created in Step 5:

Parameter Value
resource_group_name rg-mdp-tfstate
storage_account_name staccmdptfstate
container_name tfstate

Step 7 — Terraform Init Troubleshooting

Error: 403 AuthorizationPermissionMismatch on terraform init

Error: Failed to get existing workspaces: listing blobs: executing request: unexpected status 403
(403 This request is not authorized to perform this operation using this permission.)
AuthorizationPermissionMismatch

Root cause: backend.tf uses use_azuread_auth = true. Terraform authenticates to the storage account via Entra ID rather than a storage access key. The Contributor role alone only covers the management-plane (ARM) — it does not grant blob read/write access on the data-plane. A Storage Blob Data* role is required explicitly.

Fix:

MY_ID=$(az ad signed-in-user show --query id -o tsv)

SA_ID=$(az storage account show \
  --name "staccmdptfstate" \
  --resource-group "rg-mdp-tfstate" \
  --query id -o tsv)

az role assignment create \
  --assignee $MY_ID \
  --role "Storage Blob Data Contributor" \
  --scope $SA_ID

Wait ~2 minutes for role assignment propagation, then re-run terraform init.

Also grant the same role to the Terraform SP for CI/CD:

SP_APP_ID=$(az ad sp list --display-name "sp-mdp-terraform-prod" --query "[0].appId" -o tsv)

az role assignment create \
  --assignee $SP_APP_ID \
  --role "Storage Blob Data Contributor" \
  --scope $SA_ID

Summary — Actions Completed This Session

Action Status
RBAC assignments for Terraform SP Done
Created sp-mdp-terraform-prod service principal Done
Created kv-mdp-prod-001 Key Vault Done
Stored SP secret in Key Vault Done
Created grp-mdp-account-admins Done
Created grp-mdp-platform-engineers Done
Updated terraform.tfvars with subscription ID and cost-center Done
Aligned backend.tf with actual storage account names Done
Resolved terraform init 403 error (Storage Blob Data Contributor) Done
databricks_account_id in terraform.tfvars Pending — retrieve from account console