# MDP Databricks-Primary Architecture — Second Steps ## Context Continuation of the Azure setup for Greenfield's Modern Data Platform (MDP), picking up from `first_steps.md`. This session covers Step 4 through Step 7 of the initial provisioning guide: RBAC setup, Key Vault creation, Entra ID group creation, subscription hierarchy, and Terraform initialization. --- ## Step 4 — Set Up RBAC on the Subscription The two `az role assignment create` commands use a placeholder ``. Retrieve the actual App ID first: ```bash SP_APP_ID=$(az ad sp list --display-name "sp-mdp-terraform-prod" --query "[0].appId" -o tsv) echo $SP_APP_ID # verify before assigning ``` Then run the assignments: ```bash SUB_ID=$(az account show --query id -o tsv) # Contributor — allows creating/managing all resources az role assignment create \ --assignee $SP_APP_ID \ --role "Contributor" \ --scope "/subscriptions/$SUB_ID" # User Access Administrator — needed for Terraform to assign roles (e.g., on storage, Key Vault, Databricks) az role assignment create \ --assignee $SP_APP_ID \ --role "User Access Administrator" \ --scope "/subscriptions/$SUB_ID" ``` For group assignments, use Object IDs: ```bash ADMIN_GRP_ID=$(az ad group show --group "grp-mdp-account-admins" --query id -o tsv) ENG_GRP_ID=$(az ad group show --group "grp-mdp-platform-engineers" --query id -o tsv) az role assignment create --assignee $ADMIN_GRP_ID --role "Owner" --scope "/subscriptions/$SUB_ID" az role assignment create --assignee $ENG_GRP_ID --role "Contributor" --scope "/subscriptions/$SUB_ID" ``` Verify all assignments: ```bash az role assignment list --scope "/subscriptions/$SUB_ID" \ --query "[].{Principal:principalName, Role:roleDefinitionName}" -o table ``` > **Note:** `User Access Administrator` on the Terraform SP is necessary because downstream Terraform modules (Unity Catalog storage, Key Vault, private endpoints) will create their own role assignments. Without it, those `apply` runs will fail with `AuthorizationFailed`. --- ## Service Principal — `sp-mdp-terraform-prod` This is the Azure AD service principal (application identity) that Terraform uses to authenticate against Azure and deploy resources. It is a non-human "robot account" with scoped, auditable permissions — required for CI/CD pipelines and to avoid using personal user accounts. ### Create the SP ```bash az ad sp create-for-rbac \ --name "sp-mdp-terraform-prod" \ --role "Contributor" \ --scopes "/subscriptions/52274961-4dda-48cc-a79b-3a0b74f8eaac" \ --years 1 ``` Output: ```json { "appId": "...", # client_id → save this "displayName": "sp-mdp-terraform-prod", "password": "...", # client_secret → shown only once "tenant": "..." # tenant_id → save this } ``` > **Important:** The `password` is shown only once. Store it in Key Vault immediately. ```bash az keyvault secret set \ --vault-name "kv-mdp-prod-001" \ --name "sp-mdp-terraform-prod-secret" \ --value "" ``` Then assign `User Access Administrator` separately: ```bash az role assignment create \ --assignee \ --role "User Access Administrator" \ --scope "/subscriptions/52274961-4dda-48cc-a79b-3a0b74f8eaac" ``` > `create-for-rbac` already assigns `Contributor` at creation time — only the second role needs a separate command. --- ## Key Vault Creation Based on the existing resource group `rg-mdp-prod-keyvault` from D1-T2: ```bash az keyvault create \ --name "kv-mdp-prod-001" \ --resource-group "rg-mdp-prod-keyvault" \ --location "canadacentral" \ --sku "premium" \ --enable-purge-protection true \ --retention-days 90 \ --enable-rbac-authorization true ``` **SKU and option rationale:** - `premium` — HSM-backed keys (OSFI expectation for secrets at rest) - `enable-purge-protection` — prevents hard-delete; mandatory in regulated environments - `retention-days 90` — soft-delete window - `enable-rbac-authorization` — use Azure RBAC instead of legacy access policies Grant your own account access to manage secrets: ```bash MY_ID=$(az ad signed-in-user show --query id -o tsv) KV_ID=$(az keyvault show --name "kv-mdp-prod-001" --query id -o tsv) az role assignment create \ --assignee $MY_ID \ --role "Key Vault Administrator" \ --scope $KV_ID ``` > Key Vault names must be globally unique across Azure — adjust `kv-mdp-prod-001` if it conflicts. --- ## Entra ID Group Creation ### grp-mdp-account-admins ```bash az ad group create \ --display-name "grp-mdp-account-admins" \ --mail-nickname "grp-mdp-account-admins" \ --description "MDP production account administrators - Owner access on greenfield-mdp-prod subscription" # Get group Object ID GRP_ID=$(az ad group show --group "grp-mdp-account-admins" --query id -o tsv) # Add a member az ad group member add \ --group "grp-mdp-account-admins" \ --member-id $(az ad user show --id "user@greenfield.ca" --query id -o tsv) # Assign Owner on subscription az role assignment create \ --assignee $GRP_ID \ --role "Owner" \ --scope "/subscriptions/52274961-4dda-48cc-a79b-3a0b74f8eaac" ``` ### grp-mdp-platform-engineers ```bash az ad group create \ --display-name "grp-mdp-platform-engineers" \ --mail-nickname "grp-mdp-platform-engineers" \ --description "MDP production platform engineers - Contributor access on greenfield-mdp-prod subscription" GRP_ID=$(az ad group show --group "grp-mdp-platform-engineers" --query id -o tsv) az role assignment create \ --assignee $GRP_ID \ --role "Contributor" \ --scope "/subscriptions/52274961-4dda-48cc-a79b-3a0b74f8eaac" # Add a member az ad group member add \ --group "grp-mdp-platform-engineers" \ --member-id $(az ad user show --id "engineer@greenfield.ca" --query id -o tsv) ``` > **Note:** Group creation in Entra ID requires the **User Administrator** or **Groups Administrator** Entra role. If you get `Authorization_RequestDenied`, this needs to go through your IAM team. --- ## Subscription Hierarchy — Where to Place `shared-infra` Recommended structure aligned with Azure Landing Zone / CAF conventions: ``` Tenant Root Group └── mg-greenfield (root MG) ├── mg-greenfield-platform ← shared-infra goes here │ ├── mg-greenfield-connectivity (hub network, ExpressRoute, DNS) │ └── mg-greenfield-management (logging, monitoring, tfstate, shared tools) ├── mg-greenfield-prod │ └── greenfield-mdp-prod (workload subscription) ├── mg-greenfield-nonprod └── mg-greenfield-sandbox ``` The `shared-infra` subscription belongs under **`mg-greenfield-management`**. It hosts cross-cutting operational resources not tied to any workload: - Terraform state storage (`staccmdptfstate`) - Shared Key Vault for pipeline secrets - Log Analytics workspace - Azure Monitor / alerting The tfstate account must exist before any workload subscription is deployed — placing it under the management MG keeps it lifecycle-independent and avoids circular dependency. > At Greenfield, a management/shared-services subscription may already exist in the EA enrollment. Confirm with Cloud Infra before creating a new one — you may only need a new resource group. --- ## Step 6 — Update `terraform.tfvars` ### Values updated | Variable | Value | |---|---| | `subscription_id` | `52274961-4dda-48cc-a79b-3a0b74f8eaac` | | `databricks_account_id` | Retrieve from `accounts.azuredatabricks.net` → top-right menu | | `cost-center` tag | `Greenfield-CDO` | Get the Databricks account ID from the account console (no az CLI command available for this): ``` https://accounts.azuredatabricks.net → top-right menu → your account name ``` ### `backend.tf` alignment The `backend.tf` values must match what was created in Step 5: | Parameter | Value | |---|---| | `resource_group_name` | `rg-mdp-tfstate` | | `storage_account_name` | `staccmdptfstate` | | `container_name` | `tfstate` | --- ## Step 7 — Terraform Init Troubleshooting ### Error: 403 AuthorizationPermissionMismatch on `terraform init` ``` Error: Failed to get existing workspaces: listing blobs: executing request: unexpected status 403 (403 This request is not authorized to perform this operation using this permission.) AuthorizationPermissionMismatch ``` **Root cause:** `backend.tf` uses `use_azuread_auth = true`. Terraform authenticates to the storage account via Entra ID rather than a storage access key. The `Contributor` role alone only covers the management-plane (ARM) — it does not grant blob read/write access on the data-plane. A `Storage Blob Data*` role is required explicitly. **Fix:** ```bash MY_ID=$(az ad signed-in-user show --query id -o tsv) SA_ID=$(az storage account show \ --name "staccmdptfstate" \ --resource-group "rg-mdp-tfstate" \ --query id -o tsv) az role assignment create \ --assignee $MY_ID \ --role "Storage Blob Data Contributor" \ --scope $SA_ID ``` Wait ~2 minutes for role assignment propagation, then re-run `terraform init`. Also grant the same role to the Terraform SP for CI/CD: ```bash SP_APP_ID=$(az ad sp list --display-name "sp-mdp-terraform-prod" --query "[0].appId" -o tsv) az role assignment create \ --assignee $SP_APP_ID \ --role "Storage Blob Data Contributor" \ --scope $SA_ID ``` --- ## Summary — Actions Completed This Session | Action | Status | |---|---| | RBAC assignments for Terraform SP | Done | | Created `sp-mdp-terraform-prod` service principal | Done | | Created `kv-mdp-prod-001` Key Vault | Done | | Stored SP secret in Key Vault | Done | | Created `grp-mdp-account-admins` | Done | | Created `grp-mdp-platform-engineers` | Done | | Updated `terraform.tfvars` with subscription ID and cost-center | Done | | Aligned `backend.tf` with actual storage account names | Done | | Resolved `terraform init` 403 error (Storage Blob Data Contributor) | Done | | `databricks_account_id` in `terraform.tfvars` | Pending — retrieve from account console |