You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

437 lines
16 KiB
Markdown

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

# Phase 0 — Remaining Steps D2 to D5
**Status as of 2026-03-18:** D1 is complete. Networking module is deployed (VNets, subnets, NSGs, NAT Gateway, DNS zones, peerings). All 6 resource groups are provisioned. Terraform state backend is configured.
This document covers **only the missing implementation steps** required to complete Week 1.
---
## What Is Already Done (for reference)
| Plan Task | Status |
|-----------|--------|
| D1 — Subscription provisioning | Done |
| D1 — Resource group structure (6 RGs) | Done |
| D1D2 — VNet design & deployment (2 VNets, 4 subnets) | Done |
| D3 — NAT Gateway + static public IP | Done |
| D3D4 — NSG rules (full inbound/outbound rule set) | Done |
| D2D3 — Private DNS zones (4 zones, VNet links) | Done |
| D5 — Terraform state backend (`staccmdptfstate`) | Done |
| D1D2 — VNet peerings (main ↔ transit, conditional hub) | Done |
---
## Step 1 — ADLS Gen2 Storage Account (Plan: D4)
**Module:** `modules/storage/`
**Owner:** Cloud Infra
### 1.1 Implement `modules/storage/variables.tf`
| Variable | Type | Description |
|----------|------|-------------|
| `location` | `string` | Azure region (`canadacentral`) |
| `environment` | `string` | Environment name (`prod`) |
| `project` | `string` | Project prefix (`mdp`) |
| `resource_group_name` | `string` | From `module.networking.rg_storage_name` |
| `subnet_pe_id` | `string` | From `module.networking.subnet_private_endpoints_id` |
| `dns_zone_dfs_id` | `string` | From `module.networking.dns_zone_dfs_id` |
| `replication_type` | `string` | `GRS` for prod |
| `tags` | `map(string)` | Standard tags |
### 1.2 Implement `modules/storage/main.tf`
Resources to create:
1. **`azurerm_storage_account`** — `stglobal<env>mdp` (e.g., `stglobalprodmdp`)
- `account_tier` = `Standard`
- `account_replication_type` = `var.replication_type` (GRS)
- `is_hns_enabled` = `true` (Data Lake Gen2)
- `public_network_access_enabled` = `false`
- `min_tls_version` = `TLS1_2`
- `allow_nested_items_to_be_public` = `false`
- `shared_access_key_enabled` = `false` (force Entra ID auth)
- `blob_properties` block:
- `delete_retention_policy` { `days = 7` }
- `container_delete_retention_policy` { `days = 7` }
2. **`azurerm_storage_container`** — one for each container (6 total):
- `landing` — raw file drops from source systems
- `bronze` — ingested data (raw catalog external location)
- `silver` — cleansed, conformed data (curated catalog)
- `gold` — business-ready data (analytics catalog)
- `archive` — long-term retention (immutability policy)
- `checkpoints` — Structured Streaming / DLT checkpoints
3. **`azurerm_storage_management_policy`** — immutability policy on `archive` container
- Rule: blobs in `archive` container → immutable for 365 days (or as defined by compliance)
4. **`azurerm_private_endpoint`** — Private Endpoint for ADLS (DFS sub-resource)
- `name` = `pe-<prefix>-adls-dfs`
- `subnet_id` = `var.subnet_pe_id`
- `private_service_connection` → sub-resource type `dfs`
- `private_dns_zone_group` → link to `var.dns_zone_dfs_id`
5. **`azurerm_private_endpoint`** — Private Endpoint for ADLS (Blob sub-resource)
- `name` = `pe-<prefix>-adls-blob`
- `subnet_id` = `var.subnet_pe_id`
- `private_service_connection` → sub-resource type `blob`
- Note: Add a `privatelink.blob.core.windows.net` DNS zone in the networking module if blob access is needed. Otherwise, DFS endpoint alone is sufficient for Databricks ABFSS access.
### 1.3 Implement `modules/storage/outputs.tf`
| Output | Value | Consumed By |
|--------|-------|-------------|
| `adls_id` | Storage account resource ID | `identity` module (role assignment) |
| `adls_name` | Storage account name | `unity-catalog` module (external locations) |
| `adls_primary_dfs_endpoint` | DFS endpoint URL | Validation / documentation |
| `container_names` | List of container names | Reference |
### 1.4 Wire into root `main.tf`
Uncomment the `module "storage"` block in `environments/prod/main.tf` (lines 3647). All variable wiring is already in place.
### 1.5 Validate
```bash
cd Implementation/Terraform/environments/prod
terraform plan -target=module.storage
```
Verify: 1 storage account, 6 containers, 1 management policy, 12 private endpoints.
---
## Step 2 — Key Vault (Plan: D4)
**Module:** `modules/keyvault/`
**Owner:** Cloud Infra
### 2.1 Implement `modules/keyvault/variables.tf`
| Variable | Type | Description |
|----------|------|-------------|
| `location` | `string` | Azure region |
| `environment` | `string` | Environment name |
| `project` | `string` | Project prefix |
| `resource_group_name` | `string` | From `module.networking.rg_keyvault_name` |
| `subnet_pe_id` | `string` | From `module.networking.subnet_private_endpoints_id` |
| `dns_zone_vault_id` | `string` | From `module.networking.dns_zone_vault_id` |
| `tenant_id` | `string` | Azure AD tenant ID (add to `variables.tf` and `terraform.tfvars` at root level) |
| `tags` | `map(string)` | Standard tags |
### 2.2 Implement `modules/keyvault/main.tf`
Resources to create:
1. **`azurerm_key_vault`** — `kv-<prefix>-mdp` (e.g., `kv-mdp-prod`)
- `sku_name` = `standard`
- `tenant_id` = `var.tenant_id`
- `soft_delete_retention_days` = `90`
- `purge_protection_enabled` = `true`
- `enable_rbac_authorization` = `true` (use Azure RBAC, not access policies)
- `public_network_access_enabled` = `false`
- `network_acls` block:
- `bypass` = `AzureServices`
- `default_action` = `Deny`
2. **`azurerm_private_endpoint`** — Private Endpoint for Key Vault
- `name` = `pe-<prefix>-kv`
- `subnet_id` = `var.subnet_pe_id`
- `private_service_connection` → sub-resource type `vault`
- `private_dns_zone_group` → link to `var.dns_zone_vault_id`
3. **`azurerm_key_vault_secret`** (x2) — placeholder secrets
- `databricks-pat` = `"PLACEHOLDER"` (to be replaced after workspace deployment)
- `jdbc-source-placeholder` = `"PLACEHOLDER"` (to be replaced with actual credentials)
- Note: These use RBAC, so the deploying principal needs `Key Vault Secrets Officer` role.
### 2.3 Implement `modules/keyvault/outputs.tf`
| Output | Value | Consumed By |
|--------|-------|-------------|
| `keyvault_id` | Key Vault resource ID | `databricks-workspace` module (secret scope), `identity` module |
| `keyvault_name` | Key Vault name | Reference |
| `keyvault_uri` | Key Vault URI | Databricks secret scope configuration |
### 2.4 Root-level changes
1. Add `tenant_id` variable to `environments/prod/variables.tf`:
```hcl
variable "tenant_id" {
description = "Azure AD tenant ID"
type = string
}
```
2. Add `tenant_id` value to `environments/prod/terraform.tfvars`.
3. Pass `tenant_id` into the `module "keyvault"` block and uncomment it (lines 5464).
### 2.5 Validate
```bash
terraform plan -target=module.keyvault
```
Verify: 1 Key Vault, 1 private endpoint, 2 placeholder secrets.
---
## Step 3 — Monitoring Foundation (Plan: D4D5)
**Module:** `modules/monitoring/`
**Owner:** Cloud Infra
### 3.1 Implement `modules/monitoring/variables.tf`
| Variable | Type | Description |
|----------|------|-------------|
| `location` | `string` | Azure region |
| `environment` | `string` | Environment name |
| `project` | `string` | Project prefix |
| `resource_group_name` | `string` | From `module.networking.rg_monitoring_name` |
| `tags` | `map(string)` | Standard tags |
### 3.2 Implement `modules/monitoring/main.tf`
Resources to create:
1. **`azurerm_log_analytics_workspace`** — `law-<prefix>-mdp`
- `sku` = `PerGB2018`
- `retention_in_days` = `90` (align with Greenfield retention policy)
- `daily_quota_gb` = `-1` (unlimited initially; tune after baseline)
2. **`azurerm_monitor_diagnostic_setting`** — for Key Vault
- `target_resource_id` = Key Vault ID (passed as variable; wire after Key Vault module is deployed)
- `log_analytics_workspace_id` = Log Analytics workspace ID
- Enabled log categories: `AuditEvent`, `AzurePolicyEvaluationDetails`
- Enabled metric category: `AllMetrics`
> Note: Diagnostic settings for ADLS and Databricks will be added in later steps as those modules are deployed. Start with Key Vault only.
### 3.3 Implement `modules/monitoring/outputs.tf`
| Output | Value | Consumed By |
|--------|-------|-------------|
| `log_analytics_workspace_id` | LAW resource ID | Databricks diagnostic settings (Week 2), ADLS diagnostic settings |
| `log_analytics_workspace_name` | LAW name | Reference |
### 3.4 Wire into root `main.tf`
Uncomment `module "monitoring"` block (lines 140148). Add a `keyvault_id` input variable to the module to enable the Key Vault diagnostic setting.
### 3.5 Validate
```bash
terraform plan -target=module.monitoring
```
Verify: 1 Log Analytics workspace, 1 diagnostic setting (Key Vault).
---
## Step 4 — Identity Module Foundation (Plan: D4D5)
**Module:** `modules/identity/`
**Owner:** Cloud Infra
This step creates the managed identity and role assignments needed for Databricks to access ADLS and Key Vault. The full identity module will be extended in Week 2 for the Databricks access connector, but the ADLS role assignments should be in place now.
### 4.1 Implement `modules/identity/variables.tf`
| Variable | Type | Description |
|----------|------|-------------|
| `location` | `string` | Azure region |
| `environment` | `string` | Environment name |
| `project` | `string` | Project prefix |
| `rg_databricks_name` | `string` | Resource group for identity resources |
| `rg_storage_name` | `string` | Resource group for storage (not used directly, but for reference) |
| `rg_governance_name` | `string` | Resource group for governance |
| `storage_account_id` | `string` | ADLS Gen2 resource ID (for role assignment) |
| `tags` | `map(string)` | Standard tags |
### 4.2 Implement `modules/identity/main.tf`
Resources to create:
1. **`azurerm_user_assigned_identity`** — `id-<prefix>-dbx-access`
- This is the managed identity that the Databricks access connector will use.
- Location: `var.location`
- Resource group: `var.rg_databricks_name`
2. **`azurerm_databricks_access_connector`** — `dbxac-<prefix>`
- `identity` block → type `UserAssigned`, identity IDs = [managed identity ID]
- Resource group: `var.rg_databricks_name`
3. **`azurerm_role_assignment`** — Storage Blob Data Contributor
- `scope` = `var.storage_account_id`
- `role_definition_name` = `Storage Blob Data Contributor`
- `principal_id` = managed identity principal ID
### 4.3 Implement `modules/identity/outputs.tf`
| Output | Value | Consumed By |
|--------|-------|-------------|
| `managed_identity_id` | User-assigned identity resource ID | `databricks-workspace` module, `unity-catalog` module |
| `managed_identity_principal_id` | Identity principal ID | Reference |
| `access_connector_id` | Access connector resource ID | `unity-catalog` module (storage credential) |
| `access_connector_name` | Access connector name | Reference |
### 4.4 Wire into root `main.tf`
Uncomment `module "identity"` block (lines 7182). Dependency on `module.storage.adls_id` is already wired.
### 4.5 Validate
```bash
terraform plan -target=module.identity
```
Verify: 1 managed identity, 1 access connector, 1 role assignment.
---
## Step 5 — CI/CD Pipeline (Plan: D5)
**Module:** `ci/` directory
**Owner:** DevOps
### 5.1 Create `ci/azure-pipelines.yml` (or `.github/workflows/terraform.yml`)
Pipeline stages:
#### Stage 1 — Validate (runs on every PR)
```
trigger: none
pr:
branches:
include: [main]
steps:
- terraform init -backend=false
- terraform validate
- terraform fmt -check -recursive
- tflint --init && tflint
- checkov -d . --framework terraform --soft-fail
```
#### Stage 2 — Plan (runs on every PR)
```
steps:
- terraform init (with backend config)
- terraform plan -out=tfplan
- Post plan output as PR comment (using az devops / gh CLI)
```
#### Stage 3 — Apply (runs on merge to `main` only)
```
trigger:
branches:
include: [main]
steps:
- terraform init
- terraform plan -out=tfplan
- terraform apply tfplan
```
### 5.2 Create `ci/scripts/validate.sh`
```bash
#!/usr/bin/env bash
set -euo pipefail
echo "=== terraform fmt ==="
terraform fmt -check -recursive
echo "=== terraform validate ==="
terraform init -backend=false
terraform validate
echo "=== tflint ==="
tflint --init
tflint
echo "=== checkov ==="
checkov -d . --framework terraform --soft-fail
```
### 5.3 Branch protection rules
Configure on the Git repository (Azure DevOps or GitHub):
- Require PR for merges to `main`
- Require at least 1 approval
- Require `terraform plan` status check to pass
- Require `validate` status check to pass
- No direct pushes to `main`
### 5.4 Service connection / secrets
- Create a service principal or managed identity for Terraform CI/CD
- Grant it `Contributor` + `User Access Administrator` on the MDP subscription
- Store credentials as pipeline secrets (not in code)
- Grant `Storage Blob Data Contributor` on the tfstate storage account
---
## Step 6 — Security Review Checkpoint (Plan: D5)
**Owner:** Security Architect
### 6.1 Review checklist
The Security Architect must review and sign off on the following before proceeding to Week 2 (Databricks deployment):
| # | Review Item | Evidence |
|---|-------------|----------|
| 1 | VNet topology matches approved design | `terraform state show` for VNets, subnets |
| 2 | NSG rules — no permissive inbound, Internet outbound denied | NSG rule export, compare against §3 of Phase 0 plan |
| 3 | Private Endpoints — ADLS and Key Vault accessible only via PE | `nslookup` from within VNet resolves to private IP |
| 4 | ADLS — public access disabled, HNS enabled, soft delete on | Storage account properties |
| 5 | Key Vault — purge protection on, public access disabled, RBAC auth | Key Vault properties |
| 6 | NAT Gateway — stable egress IP documented | Public IP address recorded for source system allowlisting |
| 7 | Terraform state — stored in separate subscription, blob versioning on | State storage account config |
| 8 | No secrets in code | `git log` search for keys/passwords, `checkov` scan results |
| 9 | Tags applied consistently | `az resource list --tag managed-by=terraform` |
| 10 | RBAC assignments — least privilege | Role assignments export |
### 6.2 Sign-off artifact
Produce a signed document: `Phase0_Week1_Security_Signoff.md` with findings, risk acceptance (if any), and approval to proceed to Databricks workspace deployment.
---
## Execution Order and Dependencies
```
Step 1 (Storage) ──────┐
├──→ Step 4 (Identity) ──→ Step 6 (Security Review)
Step 2 (Key Vault) ─┬──┘ ↑
│ │
└──→ Step 3 (Monitoring) ──────────┘
Step 5 (CI/CD) ─────────────────────────────────────────┘
```
- **Steps 1 & 2** can be implemented in parallel (no dependency on each other).
- **Step 3** (Monitoring) depends on Step 2 (Key Vault ID needed for diagnostic setting).
- **Step 4** (Identity) depends on Step 1 (ADLS storage account ID needed for role assignment).
- **Step 5** (CI/CD) can be done in parallel with Steps 14.
- **Step 6** (Security Review) is the gate — requires all preceding steps to be complete.
---
## New Root Variables Required
| Variable | Value | Added In |
|----------|-------|----------|
| `tenant_id` | Greenfield Azure AD tenant ID | Step 2 (Key Vault) |
## terraform.tfvars Additions
```hcl
tenant_id = "REPLACE-WITH-TENANT-ID"
```