You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
mdp/Implementation/Phase0_Foundation_Detailed_...

20 KiB

Phase 0 — Foundation: Detailed Implementation Plan

Parent document: Databricks-Primary Implementation Plan (MDP v8.0) Duration: Weeks 13 (15 working days) Objective: Establish the Azure landing zone, network security, Databricks workspace, Unity Catalog metastore, and IaC foundation. At the end of Phase 0, the platform is secure, governed, and ready for data engineering workloads — but no data flows yet.


1. Week-by-Week Breakdown

Week 1 — Azure Landing Zone & Networking

Day Task Owner Details
D1 Subscription provisioning Cloud Infra Create subscription: greenfield-mdp-prod (Canada Central). Apply Greenfield management group policies (tagging, allowed regions, allowed SKUs). DR and DevTest subscriptions are out of scope for now.
D1 Resource group structure Cloud Infra Create resource groups per component: rg-mdp-network, rg-mdp-databricks, rg-mdp-storage, rg-mdp-governance, rg-mdp-keyvault, rg-mdp-monitoring.
D1D2 VNet design & deployment Cloud Infra See §2 Network Architecture below. Deploy via Terraform.
D2D3 Private Endpoints Cloud Infra Deploy Private Endpoints for: ADLS Gen2, Key Vault, Purview, Databricks (back-end), Databricks (front-end via Transit VNet). Register private DNS zones in central DNS hub (if Greenfield uses hub-spoke).
D3 NAT Gateway Cloud Infra Attach NAT Gateway to both Databricks subnets (host + container). Provides stable egress IPs for allowlisting at external source systems.
D3D4 NSG rules Cloud Infra / Security Apply NSGs to Databricks subnets. Allow Databricks control plane service tags. Deny all other inbound. Restrict outbound to ADLS, Key Vault, Purview, and Databricks control plane only. See §3 NSG Rule Set.
D4 ADLS Gen2 storage account Cloud Infra Deploy storage account: stglobal<env>mdp with HNS enabled, private endpoint, CMK (Key Vault-backed), soft delete (7 days), immutability policy on archive container. Container layout: landing, bronze, silver, gold, archive, checkpoints.
D4 Key Vault Cloud Infra Deploy Key Vault with private endpoint, soft delete, purge protection. Create initial secrets: Databricks PAT (placeholder), source system JDBC credentials (placeholders). Enable diagnostic logging to Log Analytics.
D5 Terraform state & CI/CD DevOps Azure Storage account for Terraform state (separate subscription, blob versioning, container-level lease lock). Azure DevOps / GitHub Actions pipeline: terraform plan on PR, terraform apply on merge to main. Branch protection rules enforced.
D5 Security review checkpoint Security Architect Review: VNet topology, NSG rules, Private Endpoint configuration, RBAC assignments. Sign-off before proceeding to Databricks deployment.

Week 2 — Databricks Workspace & Identity

Day Task Owner Details
D6 Databricks account setup Platform Admin Configure Azure Databricks account via account console. Enable Unity Catalog at account level. Set account-level admins (Entra ID group: grp-mdp-account-admins).
D6 Workspace deployment Cloud Infra Deploy Databricks workspace via Terraform (azurerm_databricks_workspace). Tier: Premium (required for UC, Private Link, IP access lists, audit logs). VNet injection: bind to pre-created host + container subnets. Managed resource group: rg-mdp-databricks-managed (auto-created by Databricks, do not modify). Enable No Public IP (NPIP) for secure cluster connectivity.
D6D7 Private Link configuration Cloud Infra Back-end Private Link: Private endpoint from Databricks VNet to control plane (workspace, DBFS, artifact). Front-end Private Link: Private endpoint in Transit VNet for user browser/API access. DNS: register adb-<workspace-id>.azuredatabricks.net in private DNS zone.
D7 IP access lists Platform Admin Configure workspace IP access list to allow only Greenfield corporate IP ranges and the Transit VNet CIDR. Block all other access.
D7 Conditional Access (Entra ID) Security Architect Entra ID Conditional Access policy for the Databricks enterprise application: require MFA, require compliant device, block access from non-compliant locations.
D7D8 Identity provisioning (SCIM) Platform Admin Enable automatic identity management (default for accounts created after Aug 2025) or configure SCIM provisioning connector in Entra ID. Sync the following Entra ID groups to Databricks account:
grp-mdp-account-admins → Account Admin role
grp-mdp-platform-engineers → Workspace Admin role
grp-mdp-data-engineers → Workspace User, Can Manage clusters
grp-mdp-data-analysts → Workspace User, SQL access only
grp-mdp-data-stewards → Workspace User, UC metastore admin
grp-mdp-data-scientists → Workspace User, ML runtime access
D8 Workspace configuration Platform Admin Enable: audit logging (to Azure Monitor diagnostic settings), web terminal disabled (security), DBFS disabled for external access, repos enabled (Git integration), serverless compute enabled.
D8D9 Cluster policies Platform Admin Create cluster policies (see §5 Cluster Policies). Assign policies to groups: data engineers get de-job-cluster and de-interactive, analysts get analyst-sql-only, data scientists get ds-ml-cluster.
D9 Git integration DevOps Configure workspace Git integration (Azure DevOps Repos or GitHub). Set default branch: main. Enable branch protection. Create repo structure for DABs (see §6).
D10 Workspace validation Platform Admin Validate: user login via Private Link, cluster launch, notebook execution, ADLS access via managed identity, Key Vault secret scope. Document any issues.

Week 3 — Unity Catalog & Governance Foundation

Day Task Owner Details
D11 Metastore creation Platform Admin Create Unity Catalog metastore for Canada Central region. Root storage: ADLS Gen2 container unitycatalog in the MDP storage account. Storage credential: User-Assigned Managed Identity (not service principal — supports storage behind network rules). Access connector: databricks-access-connector-mdp-prod.
D11 Metastore assignment Platform Admin Assign the metastore to the production workspace.
D11D12 Catalog structure Platform Admin / Data Steward Create the initial catalog hierarchy:
metastore: greenfield_canadacentral
├── catalog: raw               ← Bronze layer (ingested data)
│   ├── schema: customer
│   ├── schema: transactions
│   ├── schema: products
│   ├── schema: claims
│   └── schema: policies
├── catalog: curated            ← Silver layer (cleansed, conformed)
│   ├── schema: customer
│   ├── schema: transactions
│   ├── schema: products
│   ├── schema: claims
│   └── schema: policies
├── catalog: analytics          ← Gold layer (business-ready, Metric Views)
│   ├── schema: dimensional     ← Star schemas
│   ├── schema: metrics         ← Metric Views
│   ├── schema: reference       ← MDM, reference data
│   └── schema: customer360     ← Unified customer view
├── catalog: sandbox            ← Exploratory (data scientists, analysts)
│   └── schema: <per-user or per-team>
└── catalog: system             ← UC system tables (audit, lineage, billing)
Day Task Owner Details
D12 External locations Platform Admin Register ADLS containers as UC external locations: abfss://bronze@stglobal..., abfss://silver@..., abfss://gold@.... Bind each to the managed identity storage credential.
D12 Default grants Data Steward Apply baseline grants:
grp-mdp-data-engineers: ALL PRIVILEGES on raw, curated; SELECT on analytics
grp-mdp-data-analysts: SELECT on curated, analytics; USAGE on sandbox
grp-mdp-data-scientists: SELECT on curated, analytics; ALL PRIVILEGES on sandbox
grp-mdp-data-stewards: ALL PRIVILEGES on all catalogs (metastore admin)
D13 Audit logging Platform Admin Enable UC system tables: system.access.audit, system.access.table_lineage, system.billing.usage. Create a Databricks SQL dashboard for: login events, permission changes, table access patterns. Route audit logs to Azure Monitor via diagnostic settings.
D13 Purview registration Governance Specialist Register the Databricks workspace as a data source in Microsoft Purview. Configure the UC scan using managed identity authentication. Run initial metadata scan (catalogs, schemas — tables will come in Phase 1). Validate that catalog structure appears in Purview Unified Catalog.
D14D15 Phase 0 exit review All Go/No-Go gate. Review against exit criteria (§8). Security sign-off. Architecture sign-off. Handover to Phase 1 team.

2. Network Architecture

VNet Layout (Canada Central — Production)

VNet / Subnet CIDR Purpose
vnet-mdp-prod 10.100.0.0/16 Main MDP VNet
snet-dbx-host 10.100.1.0/24 Databricks host subnet (254 IPs)
snet-dbx-container 10.100.2.0/24 Databricks container subnet (254 IPs)
snet-private-endpoints 10.100.3.0/24 Private endpoints (ADLS, KV, Purview, etc.)
snet-sas-viya 10.100.4.0/24 SAS Viya integration (Phase 2)
vnet-mdp-transit 10.101.0.0/24 Front-end Private Link for user access

Peering

  • vnet-mdp-prodvnet-mdp-transit (for front-end Private Link)
  • vnet-mdp-prod ↔ Greenfield hub VNet (for on-prem connectivity via ExpressRoute/VPN)
  • vnet-mdp-prodvnet-sas-viya (when SAS is deployed in Phase 2)

DNS

  • Private DNS zones hosted in Greenfield's central DNS hub (hub-spoke model)
  • Zones: privatelink.azuredatabricks.net, privatelink.dfs.core.windows.net, privatelink.vaultcore.azure.net, privatelink.purview.azure.com

3. NSG Rule Set (Databricks Subnets)

Inbound Rules

Priority Name Source Destination Port Action
100 AllowDatabricksControlPlane AzureDatabricks service tag snet-dbx-* 443 Allow
200 AllowInternalSubnet snet-dbx-host snet-dbx-container * Allow
201 AllowInternalSubnet2 snet-dbx-container snet-dbx-host * Allow
4096 DenyAllInbound * * * Deny

Outbound Rules

Priority Name Source Destination Port Action
100 AllowDatabricksControlPlane snet-dbx-* AzureDatabricks service tag 443 Allow
110 AllowSQL snet-dbx-* Sql service tag 3306 Allow
120 AllowStorage snet-dbx-* Storage service tag 443 Allow
130 AllowEventHub snet-dbx-* EventHub service tag 9093 Allow
200 AllowKeyVault snet-dbx-* snet-private-endpoints 443 Allow
210 AllowADLS snet-dbx-* snet-private-endpoints 443 Allow
4096 DenyAllOutbound snet-dbx-* Internet * Deny

Note: The DenyAllOutbound to Internet is the data exfiltration protection control. All egress goes through Private Endpoints or service tags.


4. Terraform Module Structure

terraform/
├── environments/
│   └── prod/
│       ├── main.tf
│       ├── variables.tf
│       ├── terraform.tfvars
│       └── backend.tf          ← State in Azure Storage
├── modules/
│   ├── networking/
│   │   ├── main.tf             ← VNet, subnets, NSGs, NAT GW, peerings
│   │   ├── variables.tf
│   │   └── outputs.tf
│   ├── databricks-workspace/
│   │   ├── main.tf             ← Workspace, VNet injection, Private Link
│   │   ├── variables.tf
│   │   └── outputs.tf
│   ├── unity-catalog/
│   │   ├── main.tf             ← Metastore, access connector, storage credential,
│   │   │                          catalogs, schemas, external locations, grants
│   │   ├── variables.tf
│   │   └── outputs.tf
│   ├── storage/
│   │   ├── main.tf             ← ADLS Gen2, containers, CMK, private endpoints
│   │   ├── variables.tf
│   │   └── outputs.tf
│   ├── keyvault/
│   │   ├── main.tf             ← Key Vault, access policies, private endpoint
│   │   ├── variables.tf
│   │   └── outputs.tf
│   ├── monitoring/
│   │   ├── main.tf             ← Log Analytics, diagnostic settings, alerts
│   │   ├── variables.tf
│   │   └── outputs.tf
│   └── identity/
│       ├── main.tf             ← Managed identities, RBAC assignments
│       ├── variables.tf
│       └── outputs.tf
└── ci/
    ├── azure-pipelines.yml     ← Plan on PR, Apply on merge
    └── scripts/
        └── validate.sh         ← terraform validate + tflint + checkov

Key Terraform Resources

Module Key Resources
networking azurerm_virtual_network, azurerm_subnet, azurerm_network_security_group, azurerm_nat_gateway, azurerm_private_endpoint, azurerm_private_dns_zone
databricks-workspace azurerm_databricks_workspace (with custom_parameters for VNet injection), azurerm_private_endpoint (front-end + back-end)
unity-catalog databricks_metastore, databricks_metastore_assignment, databricks_metastore_data_access, databricks_catalog, databricks_schema, databricks_external_location, databricks_grants
storage azurerm_storage_account (HNS, CMK), azurerm_storage_container, azurerm_private_endpoint
identity azurerm_user_assigned_identity, azurerm_databricks_access_connector, azurerm_role_assignment (Storage Blob Data Contributor on ADLS)

5. Cluster Policies

Policy Name Target Group Configuration
de-job-cluster Data Engineers Auto-termination: 20 min. Spot instances: up to 80% of workers. Node type: Standard_DS3_v2 (fixed). Min/max workers: 28. Spark version: latest LTS. Unity Catalog enabled.
de-interactive Data Engineers Auto-termination: 60 min. Single-user access mode. Node type: Standard_DS4_v2. Max workers: 4.
analyst-sql-only Analysts Serverless SQL Warehouse only. No general-purpose clusters. Warehouse size: Small (2X-Small for dev). Auto-stop: 10 min.
ds-ml-cluster Data Scientists GPU-enabled option: Standard_NC6s_v3. Auto-termination: 30 min. ML runtime (latest). Max workers: 4. Single-user access mode.
admin-unrestricted Platform Admins Unrestricted — for platform debugging only. Requires justification tag.

6. DABs Repository Structure

mdp-platform/                     ← Git repository
├── databricks.yml                 ← DAB project config
├── bundles/
│   ├── infrastructure/
│   │   ├── cluster-policies/      ← Policy JSON definitions
│   │   ├── instance-pools/        ← Pool definitions
│   │   └── secret-scopes/         ← Secret scope config
│   ├── ingestion/                 ← Phase 1 (empty in Phase 0)
│   │   ├── bronze/
│   │   └── silver/
│   ├── transformation/            ← Phase 2 (empty in Phase 0)
│   │   └── gold/
│   ├── metrics/                   ← Phase 3 (empty in Phase 0)
│   │   └── metric-views/
│   └── governance/
│       ├── grants/                ← UC grant definitions
│       └── quality/               ← DQ rule definitions
├── tests/
│   ├── unit/
│   └── integration/
└── .github/ or .azure-pipelines/
    └── ci.yml

7. RACI Matrix

Task Platform Admin Cloud Infra Security Architect DevOps Data Steward
Subscription provisioning C R A I I
VNet & networking C R A I I
Private Endpoints C R A I I
NSG rules I R A I I
ADLS Gen2 deployment C R A I I
Key Vault deployment C R A I I
Databricks workspace R C A I I
Private Link config C R A I I
SCIM / identity provisioning R I A I I
IP access lists R I A I I
Conditional Access I I R I I
Cluster policies R I C I I
UC metastore creation R I C I C
Catalog/schema structure C I I I R
External locations R C C I I
Default grants C I A I R
Audit logging R C A I I
Purview registration I I I I R
Terraform modules C R C R I
CI/CD pipeline I C I R I
Phase 0 exit review R R R C C

R = Responsible, A = Accountable, C = Consulted, I = Informed


8. Exit Criteria

# Criterion Verification Method
1 Workspace accessible via Private Link from Greenfield corporate network User logs in via browser → workspace UI loads. curl to workspace URL resolves to private IP.
2 No public internet access to workspace Attempt access from non-corporate IP → blocked. IP access list denies connection.
3 Unity Catalog metastore operational SHOW CATALOGS returns raw, curated, analytics, sandbox.
4 ADLS Gen2 accessible from workspace Notebook: dbutils.fs.ls("abfss://bronze@stglobal...") succeeds via managed identity.
5 Key Vault secret scope functional Notebook: dbutils.secrets.get(scope="mdp-keyvault", key="test-secret") returns value.
6 SCIM sync active All 6 Entra ID groups visible in workspace admin console with correct members.
7 Cluster policies enforced User in grp-mdp-data-analysts cannot create general-purpose cluster — only SQL Warehouse.
8 Audit logs flowing Azure Monitor shows Databricks diagnostic logs. UC audit table system.access.audit contains login events.
9 Purview metadata scan completed Purview Unified Catalog shows raw, curated, analytics, sandbox catalogs.
10 Terraform state clean terraform plan returns "No changes. Your infrastructure matches the configuration."
11 Security review passed Security architect sign-off document on file. No critical findings.
12 CI/CD pipeline operational PR triggers terraform plan. Merge to main triggers terraform apply. Both succeed.

9. Risks Specific to Phase 0

Risk Impact Mitigation
Subscription provisioning delay (Greenfield IT) Blocks everything Engage cloud team 2 weeks before Phase 0 starts. Pre-approve subscription request.
Private DNS zone conflicts with existing hub Workspace unreachable Coordinate with central networking team. Use conditional forwarders if needed.
Databricks Premium contract not signed Cannot deploy workspace Procurement must close before D6. Escalate to VP if delayed.
SCIM sync issues (nested groups, group size limits) Incomplete identity provisioning Test with a small pilot group first. Flatten nested groups if needed.
NSG rules too restrictive Cluster launch failures Start permissive (log-only), harden iteratively based on traffic analysis.
Terraform provider version mismatch Drift between plan and apply Pin provider versions in required_providers. Validate in CI pipeline before applying.