From d8e56c0dd9c9c3af4b53286ad6ab2dd502693506 Mon Sep 17 00:00:00 2001 From: oabrivard Date: Fri, 13 Mar 2026 08:29:40 +0100 Subject: [PATCH] Added first steps for Azure setup --- Implementation/first_steps.md | 313 ++++++++++++++++++++++++++++++++++ 1 file changed, 313 insertions(+) create mode 100644 Implementation/first_steps.md diff --git a/Implementation/first_steps.md b/Implementation/first_steps.md new file mode 100644 index 0000000..17b2217 --- /dev/null +++ b/Implementation/first_steps.md @@ -0,0 +1,313 @@ +# MDP Databricks-Primary Architecture — First Steps + +## Context + +Implementation of the Databricks-primary architecture (v8.0) on Azure for Greenfield's Modern Data Platform. Excluding disaster recovery and devtest environments for now. All Terraform scripts are placed in `/Implementation/Terraform`. + +--- + +## Day 1 Tasks — Previous Conversation Summary + +### D1-T1: Subscription Provisioning + +- **Terraform artifacts produced:** `providers.tf` (AzureRM with `subscription_id`), `backend.tf` (state backend), `variables.tf` + `terraform.tfvars` (prod values). +- **Note:** Subscription creation itself is a manual/ticket process — Terraform assumes it exists. + +### D1-T2: Resource Group Structure (6 RGs) + +- **Terraform:** `networking/main.tf` lines 14–48 +- **Resource groups created:** + - `rg-mdp-prod-network` + - `rg-mdp-prod-databricks` + - `rg-mdp-prod-storage` + - `rg-mdp-prod-governance` + - `rg-mdp-prod-keyvault` + - `rg-mdp-prod-monitoring` + +### D1-T3: VNet Design & Deployment (starts D1, finishes D2) + +- **Terraform:** `networking/main.tf` +- **Main VNet:** `10.100.0.0/16` with 3 subnets: + - `dbx-host` + - `dbx-container` + - `private-endpoints` +- **Transit VNet:** `10.101.0.0/24` +- **Peerings:** main↔transit, optional main↔hub +- **NAT Gateway** with public IP on both Databricks subnets +- **NSG** with full inbound/outbound rule set +- **4 private DNS zones** with VNet links +- **Outputs** wired in `outputs.tf` +- **Root module** `prod/main.tf` calls the networking module + +--- + +## D1-T1: Manual Subscription Provisioning Guide + +Before Terraform can run, the Azure subscription must exist. This is a manual process at Greenfield. + +### Step 1 — Request or Create the Subscription + +Submit a request to Cloud Infra (ServiceNow/ITSM ticket) with these specifications: + +| Parameter | Value | +|---|---| +| Subscription name | `greenfield-mdp-prod` | +| Offer type | Enterprise Agreement (EA) | +| Billing account | Greenfield's EA enrollment | +| Region affinity | Canada Central | +| Management group | Place under Greenfield's MG hierarchy (e.g., `mg-greenfield-data` or `mg-greenfield-prod`) | + +### Step 2 — Place Under the Correct Management Group + +- Go to **Azure Portal → Management groups** +- Locate the target MG (e.g., `mg-greenfield-prod` or `mg-greenfield-data-platforms`) +- Move the new subscription under it (or ensure Cloud Infra places it there at creation time) +- Azure Policy assignments on that MG are inherited automatically + +### Step 3 — Register Required Resource Providers + +```bash +az account set --subscription "greenfield-mdp-prod" + +az provider register --namespace Microsoft.Databricks +az provider register --namespace Microsoft.Storage +az provider register --namespace Microsoft.KeyVault +az provider register --namespace Microsoft.Network +az provider register --namespace Microsoft.ManagedIdentity +az provider register --namespace Microsoft.Purview +az provider register --namespace Microsoft.OperationalInsights +az provider register --namespace Microsoft.Monitor +az provider register --namespace Microsoft.EventHub +``` + +Verify with: + +```bash +az provider list --query "[?registrationState=='Registered'].namespace" -o table +``` + +### Step 4 — Set Up RBAC on the Subscription + +| Principal | Role | Scope | +|---|---|---| +| `sp-mdp-terraform-prod` (Terraform SP) | Contributor | Subscription | +| `sp-mdp-terraform-prod` | User Access Administrator | Subscription | +| `grp-mdp-account-admins` | Owner | Subscription | +| `grp-mdp-platform-engineers` | Contributor | Subscription | + +```bash +SUB_ID=$(az account show --query id -o tsv) + +az role assignment create --assignee \ + --role "Contributor" --scope "/subscriptions/$SUB_ID" + +az role assignment create --assignee \ + --role "User Access Administrator" --scope "/subscriptions/$SUB_ID" +``` + +### Step 5 — Create the Terraform State Storage Account + +In a separate management/shared subscription to avoid circular dependency: + +```bash +az account set --subscription "" + +az group create --name rg-mdp-tfstate --location canadacentral + +az storage account create \ + --name stmdptfstate \ + --resource-group rg-mdp-tfstate \ + --location canadacentral \ + --sku Standard_GRS \ + --kind StorageV2 \ + --min-tls-version TLS1_2 \ + --allow-blob-public-access false + +az storage container create \ + --name tfstate \ + --account-name stmdptfstate + +az storage account blob-service-properties update \ + --account-name stmdptfstate \ + --enable-versioning true +``` + +### Step 6 — Update `terraform.tfvars` + +```bash +az account show --subscription "greenfield-mdp-prod" --query id -o tsv +``` + +Replace `REPLACE-WITH-PROD-SUBSCRIPTION-ID` in `terraform.tfvars` with the actual UUID. + +### Step 7 — Validate Terraform Authentication + +```bash +cd environments/prod +az login --service-principal -u -p --tenant +terraform init +terraform plan +``` + +### Handoff Summary + +| Action | Who | +|---|---| +| Create EA subscription `greenfield-mdp-prod` | Cloud Infra (ticket) | +| Place under correct Management Group | Cloud Infra | +| Register resource providers | Cloud Infra or you (az CLI) | +| Create Terraform SP + RBAC | Cloud Infra / IAM team | +| Create tfstate storage account | You or DevOps | +| Update `terraform.tfvars` | You | +| Validate `terraform init/plan` | You | + +--- + +## Azure Policy Setup + +### Checking Existing Policies + +```bash +az policy assignment list \ + --scope "/subscriptions/" \ + --query "[].{Name:displayName, Scope:scope, Policy:policyDefinitionId}" \ + -o table +``` + +### 1. Allowed Locations + +Built-in policy ID: `e56962a6-4747-49cd-b67b-bf8b01975c4c` + +```bash +az policy assignment create \ + --name "mdp-allowed-locations" \ + --display-name "MDP - Restrict to Canada regions" \ + --policy "e56962a6-4747-49cd-b67b-bf8b01975c4c" \ + --scope "/subscriptions/" \ + --params '{ + "listOfAllowedLocations": { + "value": ["canadacentral", "canadaeast", "global"] + } + }' +``` + +> `global` is needed for tenant-level resources (Entra ID, Management Groups, Azure Policy itself). + +### 2. Mandatory Tagging (Initiative Approach) + +Built-in policy "Require a tag on resources": `871b6d14-10aa-478d-b466-ce391a2e1549` + +**Quick approach — loop per tag:** + +```bash +SCOPE="/subscriptions/" +POLICY_ID="871b6d14-10aa-478d-b466-ce391a2e1549" + +for TAG in project environment cost-center owner data-classification; do + az policy assignment create \ + --name "mdp-require-tag-${TAG}" \ + --display-name "MDP - Require tag: ${TAG}" \ + --policy "$POLICY_ID" \ + --scope "$SCOPE" \ + --params "{\"tagName\": {\"value\": \"${TAG}\"}}" \ + --enforcement-mode Default +done +``` + +**Custom initiative approach (recommended):** + +Create `mandatory-tags-initiative.json`: + +```json +{ + "properties": { + "displayName": "MDP - Mandatory Tags", + "policyType": "Custom", + "metadata": { "category": "Tags" }, + "parameters": {}, + "policyDefinitions": [ + { + "policyDefinitionId": "/providers/Microsoft.Authorization/policyDefinitions/871b6d14-10aa-478d-b466-ce391a2e1549", + "parameters": { "tagName": { "value": "project" } } + }, + { + "policyDefinitionId": "/providers/Microsoft.Authorization/policyDefinitions/871b6d14-10aa-478d-b466-ce391a2e1549", + "parameters": { "tagName": { "value": "environment" } } + }, + { + "policyDefinitionId": "/providers/Microsoft.Authorization/policyDefinitions/871b6d14-10aa-478d-b466-ce391a2e1549", + "parameters": { "tagName": { "value": "cost-center" } } + }, + { + "policyDefinitionId": "/providers/Microsoft.Authorization/policyDefinitions/871b6d14-10aa-478d-b466-ce391a2e1549", + "parameters": { "tagName": { "value": "owner" } } + }, + { + "policyDefinitionId": "/providers/Microsoft.Authorization/policyDefinitions/871b6d14-10aa-478d-b466-ce391a2e1549", + "parameters": { "tagName": { "value": "data-classification" } } + } + ] + } +} +``` + +```bash +az policy set-definition create \ + --name "mdp-mandatory-tags" \ + --display-name "MDP - Mandatory Tags" \ + --definitions @mandatory-tags-initiative.json \ + --subscription "" + +az policy assignment create \ + --name "mdp-mandatory-tags" \ + --display-name "MDP - Mandatory Tags" \ + --policy-set-definition "mdp-mandatory-tags" \ + --scope "/subscriptions/" +``` + +### 3. Allowed VM SKUs + +Built-in policy ID: `cccc23c7-8427-4f53-ad12-b6a63eb452b3` + +```bash +az policy assignment create \ + --name "mdp-allowed-vm-skus" \ + --display-name "MDP - Allowed VM SKUs" \ + --policy "cccc23c7-8427-4f53-ad12-b6a63eb452b3" \ + --scope "/subscriptions/" \ + --params '{ + "listOfAllowedSKUs": { + "value": [ + "Standard_DS3_v2", + "Standard_DS4_v2", + "Standard_DS5_v2", + "Standard_E8s_v5", + "Standard_E16s_v5", + "Standard_E32s_v5", + "Standard_L8s_v3", + "Standard_L16s_v3", + "Standard_NC8as_T4_v3", + "Standard_NC16as_T4_v3" + ] + } + }' +``` + +**SKU rationale:** + +- `DS` series — general-purpose Databricks workers +- `E` series — memory-optimized (data engineering) +- `L` series — storage-optimized (Delta caching) +- `NC` series — GPU clusters for ML + +### Policy Assignment Scope Guidance + +| Policy | Recommended scope | +|---|---| +| Allowed locations | Management group (likely already exists at Greenfield) | +| Mandatory tags | Management group (likely already exists at Greenfield) | +| Allowed VM SKUs | Subscription (workload-specific) | + +### Enforcement Mode + +For initial rollout, use `--enforcement-mode DoNotEnforce` to run in **audit-only mode** first. This flags non-compliant resources without blocking deployments. Switch to `Default` (enforce) after remediation.