You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
303 lines
9.9 KiB
Markdown
303 lines
9.9 KiB
Markdown
# MDP Databricks-Primary Architecture — Second Steps
|
|
|
|
## Context
|
|
|
|
Continuation of the Azure setup for Greenfield's Modern Data Platform (MDP), picking up from `first_steps.md`. This session covers Step 4 through Step 7 of the initial provisioning guide: RBAC setup, Key Vault creation, Entra ID group creation, subscription hierarchy, and Terraform initialization.
|
|
|
|
---
|
|
|
|
## Step 4 — Set Up RBAC on the Subscription
|
|
|
|
The two `az role assignment create` commands use a placeholder `<terraform-sp-app-id>`. Retrieve the actual App ID first:
|
|
|
|
```bash
|
|
SP_APP_ID=$(az ad sp list --display-name "sp-mdp-terraform-prod" --query "[0].appId" -o tsv)
|
|
echo $SP_APP_ID # verify before assigning
|
|
```
|
|
|
|
Then run the assignments:
|
|
|
|
```bash
|
|
SUB_ID=$(az account show --query id -o tsv)
|
|
|
|
# Contributor — allows creating/managing all resources
|
|
az role assignment create \
|
|
--assignee $SP_APP_ID \
|
|
--role "Contributor" \
|
|
--scope "/subscriptions/$SUB_ID"
|
|
|
|
# User Access Administrator — needed for Terraform to assign roles (e.g., on storage, Key Vault, Databricks)
|
|
az role assignment create \
|
|
--assignee $SP_APP_ID \
|
|
--role "User Access Administrator" \
|
|
--scope "/subscriptions/$SUB_ID"
|
|
```
|
|
|
|
For group assignments, use Object IDs:
|
|
|
|
```bash
|
|
ADMIN_GRP_ID=$(az ad group show --group "grp-mdp-account-admins" --query id -o tsv)
|
|
ENG_GRP_ID=$(az ad group show --group "grp-mdp-platform-engineers" --query id -o tsv)
|
|
|
|
az role assignment create --assignee $ADMIN_GRP_ID --role "Owner" --scope "/subscriptions/$SUB_ID"
|
|
az role assignment create --assignee $ENG_GRP_ID --role "Contributor" --scope "/subscriptions/$SUB_ID"
|
|
```
|
|
|
|
Verify all assignments:
|
|
|
|
```bash
|
|
az role assignment list --scope "/subscriptions/$SUB_ID" \
|
|
--query "[].{Principal:principalName, Role:roleDefinitionName}" -o table
|
|
```
|
|
|
|
> **Note:** `User Access Administrator` on the Terraform SP is necessary because downstream Terraform modules (Unity Catalog storage, Key Vault, private endpoints) will create their own role assignments. Without it, those `apply` runs will fail with `AuthorizationFailed`.
|
|
|
|
---
|
|
|
|
## Service Principal — `sp-mdp-terraform-prod`
|
|
|
|
This is the Azure AD service principal (application identity) that Terraform uses to authenticate against Azure and deploy resources. It is a non-human "robot account" with scoped, auditable permissions — required for CI/CD pipelines and to avoid using personal user accounts.
|
|
|
|
### Create the SP
|
|
|
|
```bash
|
|
az ad sp create-for-rbac \
|
|
--name "sp-mdp-terraform-prod" \
|
|
--role "Contributor" \
|
|
--scopes "/subscriptions/52274961-4dda-48cc-a79b-3a0b74f8eaac" \
|
|
--years 1
|
|
```
|
|
|
|
Output:
|
|
|
|
```json
|
|
{
|
|
"appId": "...", # client_id → save this
|
|
"displayName": "sp-mdp-terraform-prod",
|
|
"password": "...", # client_secret → shown only once
|
|
"tenant": "..." # tenant_id → save this
|
|
}
|
|
```
|
|
|
|
> **Important:** The `password` is shown only once. Store it in Key Vault immediately.
|
|
|
|
```bash
|
|
az keyvault secret set \
|
|
--vault-name "kv-mdp-prod-001" \
|
|
--name "sp-mdp-terraform-prod-secret" \
|
|
--value "<password-from-above>"
|
|
```
|
|
|
|
Then assign `User Access Administrator` separately:
|
|
|
|
```bash
|
|
az role assignment create \
|
|
--assignee <appId-from-above> \
|
|
--role "User Access Administrator" \
|
|
--scope "/subscriptions/52274961-4dda-48cc-a79b-3a0b74f8eaac"
|
|
```
|
|
|
|
> `create-for-rbac` already assigns `Contributor` at creation time — only the second role needs a separate command.
|
|
|
|
---
|
|
|
|
## Key Vault Creation
|
|
|
|
Based on the existing resource group `rg-mdp-prod-keyvault` from D1-T2:
|
|
|
|
```bash
|
|
az keyvault create \
|
|
--name "kv-mdp-prod-001" \
|
|
--resource-group "rg-mdp-prod-keyvault" \
|
|
--location "canadacentral" \
|
|
--sku "premium" \
|
|
--enable-purge-protection true \
|
|
--retention-days 90 \
|
|
--enable-rbac-authorization true
|
|
```
|
|
|
|
**SKU and option rationale:**
|
|
- `premium` — HSM-backed keys (OSFI expectation for secrets at rest)
|
|
- `enable-purge-protection` — prevents hard-delete; mandatory in regulated environments
|
|
- `retention-days 90` — soft-delete window
|
|
- `enable-rbac-authorization` — use Azure RBAC instead of legacy access policies
|
|
|
|
Grant your own account access to manage secrets:
|
|
|
|
```bash
|
|
MY_ID=$(az ad signed-in-user show --query id -o tsv)
|
|
KV_ID=$(az keyvault show --name "kv-mdp-prod-001" --query id -o tsv)
|
|
|
|
az role assignment create \
|
|
--assignee $MY_ID \
|
|
--role "Key Vault Administrator" \
|
|
--scope $KV_ID
|
|
```
|
|
|
|
> Key Vault names must be globally unique across Azure — adjust `kv-mdp-prod-001` if it conflicts.
|
|
|
|
---
|
|
|
|
## Entra ID Group Creation
|
|
|
|
### grp-mdp-account-admins
|
|
|
|
```bash
|
|
az ad group create \
|
|
--display-name "grp-mdp-account-admins" \
|
|
--mail-nickname "grp-mdp-account-admins" \
|
|
--description "MDP production account administrators - Owner access on greenfield-mdp-prod subscription"
|
|
|
|
# Get group Object ID
|
|
GRP_ID=$(az ad group show --group "grp-mdp-account-admins" --query id -o tsv)
|
|
|
|
# Add a member
|
|
az ad group member add \
|
|
--group "grp-mdp-account-admins" \
|
|
--member-id $(az ad user show --id "user@greenfield.ca" --query id -o tsv)
|
|
|
|
# Assign Owner on subscription
|
|
az role assignment create \
|
|
--assignee $GRP_ID \
|
|
--role "Owner" \
|
|
--scope "/subscriptions/52274961-4dda-48cc-a79b-3a0b74f8eaac"
|
|
```
|
|
|
|
### grp-mdp-platform-engineers
|
|
|
|
```bash
|
|
az ad group create \
|
|
--display-name "grp-mdp-platform-engineers" \
|
|
--mail-nickname "grp-mdp-platform-engineers" \
|
|
--description "MDP production platform engineers - Contributor access on greenfield-mdp-prod subscription"
|
|
|
|
GRP_ID=$(az ad group show --group "grp-mdp-platform-engineers" --query id -o tsv)
|
|
|
|
az role assignment create \
|
|
--assignee $GRP_ID \
|
|
--role "Contributor" \
|
|
--scope "/subscriptions/52274961-4dda-48cc-a79b-3a0b74f8eaac"
|
|
|
|
# Add a member
|
|
az ad group member add \
|
|
--group "grp-mdp-platform-engineers" \
|
|
--member-id $(az ad user show --id "engineer@greenfield.ca" --query id -o tsv)
|
|
```
|
|
|
|
> **Note:** Group creation in Entra ID requires the **User Administrator** or **Groups Administrator** Entra role. If you get `Authorization_RequestDenied`, this needs to go through your IAM team.
|
|
|
|
---
|
|
|
|
## Subscription Hierarchy — Where to Place `shared-infra`
|
|
|
|
Recommended structure aligned with Azure Landing Zone / CAF conventions:
|
|
|
|
```
|
|
Tenant Root Group
|
|
└── mg-greenfield (root MG)
|
|
├── mg-greenfield-platform ← shared-infra goes here
|
|
│ ├── mg-greenfield-connectivity (hub network, ExpressRoute, DNS)
|
|
│ └── mg-greenfield-management (logging, monitoring, tfstate, shared tools)
|
|
├── mg-greenfield-prod
|
|
│ └── greenfield-mdp-prod (workload subscription)
|
|
├── mg-greenfield-nonprod
|
|
└── mg-greenfield-sandbox
|
|
```
|
|
|
|
The `shared-infra` subscription belongs under **`mg-greenfield-management`**. It hosts cross-cutting operational resources not tied to any workload:
|
|
- Terraform state storage (`staccmdptfstate`)
|
|
- Shared Key Vault for pipeline secrets
|
|
- Log Analytics workspace
|
|
- Azure Monitor / alerting
|
|
|
|
The tfstate account must exist before any workload subscription is deployed — placing it under the management MG keeps it lifecycle-independent and avoids circular dependency.
|
|
|
|
> At Greenfield, a management/shared-services subscription may already exist in the EA enrollment. Confirm with Cloud Infra before creating a new one — you may only need a new resource group.
|
|
|
|
---
|
|
|
|
## Step 6 — Update `terraform.tfvars`
|
|
|
|
### Values updated
|
|
|
|
| Variable | Value |
|
|
|---|---|
|
|
| `subscription_id` | `52274961-4dda-48cc-a79b-3a0b74f8eaac` |
|
|
| `databricks_account_id` | Retrieve from `accounts.azuredatabricks.net` → top-right menu |
|
|
| `cost-center` tag | `Greenfield-CDO` |
|
|
|
|
Get the Databricks account ID from the account console (no az CLI command available for this):
|
|
|
|
```
|
|
https://accounts.azuredatabricks.net → top-right menu → your account name
|
|
```
|
|
|
|
### `backend.tf` alignment
|
|
|
|
The `backend.tf` values must match what was created in Step 5:
|
|
|
|
| Parameter | Value |
|
|
|---|---|
|
|
| `resource_group_name` | `rg-mdp-tfstate` |
|
|
| `storage_account_name` | `staccmdptfstate` |
|
|
| `container_name` | `tfstate` |
|
|
|
|
---
|
|
|
|
## Step 7 — Terraform Init Troubleshooting
|
|
|
|
### Error: 403 AuthorizationPermissionMismatch on `terraform init`
|
|
|
|
```
|
|
Error: Failed to get existing workspaces: listing blobs: executing request: unexpected status 403
|
|
(403 This request is not authorized to perform this operation using this permission.)
|
|
AuthorizationPermissionMismatch
|
|
```
|
|
|
|
**Root cause:** `backend.tf` uses `use_azuread_auth = true`. Terraform authenticates to the storage account via Entra ID rather than a storage access key. The `Contributor` role alone only covers the management-plane (ARM) — it does not grant blob read/write access on the data-plane. A `Storage Blob Data*` role is required explicitly.
|
|
|
|
**Fix:**
|
|
|
|
```bash
|
|
MY_ID=$(az ad signed-in-user show --query id -o tsv)
|
|
|
|
SA_ID=$(az storage account show \
|
|
--name "staccmdptfstate" \
|
|
--resource-group "rg-mdp-tfstate" \
|
|
--query id -o tsv)
|
|
|
|
az role assignment create \
|
|
--assignee $MY_ID \
|
|
--role "Storage Blob Data Contributor" \
|
|
--scope $SA_ID
|
|
```
|
|
|
|
Wait ~2 minutes for role assignment propagation, then re-run `terraform init`.
|
|
|
|
Also grant the same role to the Terraform SP for CI/CD:
|
|
|
|
```bash
|
|
SP_APP_ID=$(az ad sp list --display-name "sp-mdp-terraform-prod" --query "[0].appId" -o tsv)
|
|
|
|
az role assignment create \
|
|
--assignee $SP_APP_ID \
|
|
--role "Storage Blob Data Contributor" \
|
|
--scope $SA_ID
|
|
```
|
|
|
|
---
|
|
|
|
## Summary — Actions Completed This Session
|
|
|
|
| Action | Status |
|
|
|---|---|
|
|
| RBAC assignments for Terraform SP | Done |
|
|
| Created `sp-mdp-terraform-prod` service principal | Done |
|
|
| Created `kv-mdp-prod-001` Key Vault | Done |
|
|
| Stored SP secret in Key Vault | Done |
|
|
| Created `grp-mdp-account-admins` | Done |
|
|
| Created `grp-mdp-platform-engineers` | Done |
|
|
| Updated `terraform.tfvars` with subscription ID and cost-center | Done |
|
|
| Aligned `backend.tf` with actual storage account names | Done |
|
|
| Resolved `terraform init` 403 error (Storage Blob Data Contributor) | Done |
|
|
| `databricks_account_id` in `terraform.tfvars` | Pending — retrieve from account console | |