# MDP Databricks-Primary Architecture — First Steps ## Context Implementation of the Databricks-primary architecture (v8.0) on Azure for Greenfield's Modern Data Platform. Excluding disaster recovery and devtest environments for now. All Terraform scripts are placed in `/Implementation/Terraform`. --- ## Day 1 Tasks — Previous Conversation Summary ### D1-T1: Subscription Provisioning - **Terraform artifacts produced:** `providers.tf` (AzureRM with `subscription_id`), `backend.tf` (state backend), `variables.tf` + `terraform.tfvars` (prod values). - **Note:** Subscription creation itself is a manual/ticket process — Terraform assumes it exists. ### D1-T2: Resource Group Structure (6 RGs) - **Terraform:** `networking/main.tf` lines 14–48 - **Resource groups created:** - `rg-mdp-prod-network` - `rg-mdp-prod-databricks` - `rg-mdp-prod-storage` - `rg-mdp-prod-governance` - `rg-mdp-prod-keyvault` - `rg-mdp-prod-monitoring` ### D1-T3: VNet Design & Deployment (starts D1, finishes D2) - **Terraform:** `networking/main.tf` - **Main VNet:** `10.100.0.0/16` with 3 subnets: - `dbx-host` - `dbx-container` - `private-endpoints` - **Transit VNet:** `10.101.0.0/24` - **Peerings:** main↔transit, optional main↔hub - **NAT Gateway** with public IP on both Databricks subnets - **NSG** with full inbound/outbound rule set - **4 private DNS zones** with VNet links - **Outputs** wired in `outputs.tf` - **Root module** `prod/main.tf` calls the networking module --- ## D1-T1: Manual Subscription Provisioning Guide Before Terraform can run, the Azure subscription must exist. This is a manual process at Greenfield. ### Step 1 — Request or Create the Subscription Submit a request to Cloud Infra (ServiceNow/ITSM ticket) with these specifications: | Parameter | Value | |---|---| | Subscription name | `greenfield-mdp-prod` | | Offer type | Enterprise Agreement (EA) | | Billing account | Greenfield's EA enrollment | | Region affinity | Canada Central | | Management group | Place under Greenfield's MG hierarchy (e.g., `mg-greenfield-data` or `mg-greenfield-prod`) | ### Step 2 — Place Under the Correct Management Group - Go to **Azure Portal → Management groups** - Locate the target MG (e.g., `mg-greenfield-prod` or `mg-greenfield-data-platforms`) - Move the new subscription under it (or ensure Cloud Infra places it there at creation time) - Azure Policy assignments on that MG are inherited automatically ### Step 3 — Register Required Resource Providers ```bash az account set --subscription "greenfield-mdp-prod" az provider register --namespace Microsoft.Databricks az provider register --namespace Microsoft.Storage az provider register --namespace Microsoft.KeyVault az provider register --namespace Microsoft.Network az provider register --namespace Microsoft.ManagedIdentity az provider register --namespace Microsoft.Purview az provider register --namespace Microsoft.OperationalInsights az provider register --namespace Microsoft.Monitor az provider register --namespace Microsoft.EventHub ``` Verify with: ```bash az provider list --query "[?registrationState=='Registered'].namespace" -o table ``` ### Step 4 — Set Up RBAC on the Subscription | Principal | Role | Scope | |---|---|---| | `sp-mdp-terraform-prod` (Terraform SP) | Contributor | Subscription | | `sp-mdp-terraform-prod` | User Access Administrator | Subscription | | `grp-mdp-account-admins` | Owner | Subscription | | `grp-mdp-platform-engineers` | Contributor | Subscription | ```bash SUB_ID=$(az account show --query id -o tsv) az role assignment create --assignee \ --role "Contributor" --scope "/subscriptions/$SUB_ID" az role assignment create --assignee \ --role "User Access Administrator" --scope "/subscriptions/$SUB_ID" ``` ### Step 5 — Create the Terraform State Storage Account In a separate management/shared subscription to avoid circular dependency: ```bash az account set --subscription "" az group create --name rg-mdp-tfstate --location canadacentral az storage account create \ --name staccmdptfstate \ --resource-group rg-mdp-tfstate \ --location canadacentral \ --sku Standard_GRS \ --kind StorageV2 \ --min-tls-version TLS1_2 \ --allow-blob-public-access false az storage container create \ --name tfstate \ --account-name staccmdptfstate az storage account blob-service-properties update \ --account-name staccmdptfstate \ --enable-versioning true ``` ### Step 6 — Update `terraform.tfvars` ```bash az account show --subscription "greenfield-mdp-prod" --query id -o tsv ``` Replace `REPLACE-WITH-PROD-SUBSCRIPTION-ID` in `terraform.tfvars` with the actual UUID. ### Step 7 — Validate Terraform Authentication ```bash cd environments/prod az login --service-principal -u -p --tenant terraform init terraform plan ``` ### Handoff Summary | Action | Who | |---|---| | Create EA subscription `greenfield-mdp-prod` | Cloud Infra (ticket) | | Place under correct Management Group | Cloud Infra | | Register resource providers | Cloud Infra or you (az CLI) | | Create Terraform SP + RBAC | Cloud Infra / IAM team | | Create tfstate storage account | You or DevOps | | Update `terraform.tfvars` | You | | Validate `terraform init/plan` | You | --- ## Azure Policy Setup ### Checking Existing Policies ```bash az policy assignment list \ --scope "/subscriptions/52274961-4dda-48cc-a79b-3a0b74f8eaac" \ --query "[].{Name:displayName, Scope:scope, Policy:policyDefinitionId}" \ -o table ``` ### 1. Allowed Locations Built-in policy ID: `e56962a6-4747-49cd-b67b-bf8b01975c4c` ```bash az policy assignment create \ --name "mdp-allowed-locations" \ --display-name "MDP - Restrict to Canada regions" \ --policy "e56962a6-4747-49cd-b67b-bf8b01975c4c" \ --scope "/subscriptions/52274961-4dda-48cc-a79b-3a0b74f8eaac" \ --params '{ "listOfAllowedLocations": { "value": ["canadacentral", "canadaeast", "global"] } }' ``` > `global` is needed for tenant-level resources (Entra ID, Management Groups, Azure Policy itself). ### 2. Mandatory Tagging (Initiative Approach) Built-in policy "Require a tag on resources": `871b6d14-10aa-478d-b590-94f262ecfa99` **Quick approach — loop per tag:** ```bash SCOPE="/subscriptions/52274961-4dda-48cc-a79b-3a0b74f8eaac" POLICY_ID="871b6d14-10aa-478d-b590-94f262ecfa99" for TAG in project environment cost-center owner data-classification; do az policy assignment create \ --name "mdp-require-tag-${TAG}" \ --display-name "MDP - Require tag: ${TAG}" \ --policy "$POLICY_ID" \ --scope "$SCOPE" \ --params "{\"tagName\": {\"value\": \"${TAG}\"}}" \ --enforcement-mode Default done ``` **Custom initiative approach (recommended):** Create `mandatory-tags-initiative.json`: ```json { "properties": { "displayName": "MDP - Mandatory Tags", "policyType": "Custom", "metadata": { "category": "Tags" }, "parameters": {}, "policyDefinitions": [ { "policyDefinitionId": "/providers/Microsoft.Authorization/policyDefinitions/871b6d14-10aa-478d-b590-94f262ecfa99", "parameters": { "tagName": { "value": "project" } } }, { "policyDefinitionId": "/providers/Microsoft.Authorization/policyDefinitions/871b6d14-10aa-478d-b590-94f262ecfa99", "parameters": { "tagName": { "value": "environment" } } }, { "policyDefinitionId": "/providers/Microsoft.Authorization/policyDefinitions/871b6d14-10aa-478d-b590-94f262ecfa99", "parameters": { "tagName": { "value": "cost-center" } } }, { "policyDefinitionId": "/providers/Microsoft.Authorization/policyDefinitions/871b6d14-10aa-478d-b590-94f262ecfa99", "parameters": { "tagName": { "value": "owner" } } }, { "policyDefinitionId": "/providers/Microsoft.Authorization/policyDefinitions/871b6d14-10aa-478d-b590-94f262ecfa99", "parameters": { "tagName": { "value": "data-classification" } } } ] } } ``` ```bash az policy set-definition create \ --name "mdp-mandatory-tags" \ --display-name "MDP - Mandatory Tags" \ --definitions @mandatory-tags-initiative.json \ --subscription "52274961-4dda-48cc-a79b-3a0b74f8eaac" az policy assignment create \ --name "mdp-mandatory-tags" \ --display-name "MDP - Mandatory Tags" \ --policy-set-definition "mdp-mandatory-tags" \ --scope "/subscriptions/52274961-4dda-48cc-a79b-3a0b74f8eaac" ``` ### 3. Allowed VM SKUs Built-in policy ID: `cccc23c7-8427-4f53-ad12-b6a63eb452b3` ```bash az policy assignment create \ --name "mdp-allowed-vm-skus" \ --display-name "MDP - Allowed VM SKUs" \ --policy "cccc23c7-8427-4f53-ad12-b6a63eb452b3" \ --scope "/subscriptions/52274961-4dda-48cc-a79b-3a0b74f8eaac" \ --params '{ "listOfAllowedSKUs": { "value": [ "Standard_DS3_v2", "Standard_DS4_v2", "Standard_DS5_v2", "Standard_E8s_v5", "Standard_E16s_v5", "Standard_E32s_v5", "Standard_L8s_v3", "Standard_L16s_v3", "Standard_NC8as_T4_v3", "Standard_NC16as_T4_v3" ] } }' ``` **SKU rationale:** - `DS` series — general-purpose Databricks workers - `E` series — memory-optimized (data engineering) - `L` series — storage-optimized (Delta caching) - `NC` series — GPU clusters for ML ### Policy Assignment Scope Guidance | Policy | Recommended scope | |---|---| | Allowed locations | Management group (likely already exists at Greenfield) | | Mandatory tags | Management group (likely already exists at Greenfield) | | Allowed VM SKUs | Subscription (workload-specific) | ### Enforcement Mode For initial rollout, use `--enforcement-mode DoNotEnforce` to run in **audit-only mode** first. This flags non-compliant resources without blocking deployments. Switch to `Default` (enforce) after remediation.