You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

27 KiB

Raw Blame History

Microsoft Purview & Fabric — Required Improvements for Greenfield

Context: Greenfield Modern Data Platform — Fabric-Primary Architecture (v1.0) Date: March 2026 Classification: Internal — Confidential Purpose: Structured improvement request list for Microsoft partnership engagement

Background

Greenfield's Fabric-primary architecture (v1.0) positions Microsoft Fabric as the primary analytics platform with OneLake as the unified storage foundation. Databricks is retained for specialized workloads, and SAS Viya serves regulated actuarial analytics. Microsoft Purview provides enterprise governance. This document identifies all product gaps in Purview and Fabric that currently force architectural complexity, dual-governance overhead, or dependency on third-party tools. Improvements are sorted by decreasing impact on architecture simplification.

1. Unified Cross-Engine Security Enforcement in Fabric (Critical)

Gap: Fabric Spark notebooks bypass all fine-grained security controls (RLS, CLS, DDM) defined on SQL Analytics Endpoints and Power BI semantic models. A user with Member role on a workspace who opens a Spark notebook sees the complete, unmasked, unfiltered dataset — including NAS/SIN numbers, financial account details, and other Restricted-classified data. This is not a misconfiguration; it is a fundamental architectural limitation of the current Fabric Spark engine.

Current workaround: Greenfield must operate a dual-enforcement model where all Confidential and Restricted datasets are routed through Databricks SQL Warehouses with Unity Catalog ABAC enforcement. This forces the architecture to maintain two enforcement planes, two audit systems, two sets of security policies, and JDBC routing for SAS Viya and Fabric workloads accessing sensitive data. It is the single largest source of architectural complexity in the Fabric-primary design.

Requested improvement: Deliver OneLake-level security policies (row filters, column masks, dynamic data masking) that are evaluated at the storage layer and enforced uniformly across all Fabric compute engines — Spark, SQL Analytics Endpoint, KQL, Power BI, and any future engine. These policies should be defined once (ideally as SQL functions or declarative rules tied to Entra ID attributes) and enforced transparently regardless of how the data is accessed. This is the equivalent of Databricks Unity Catalog's Attribute-Based Access Control (ABAC) model.

Architecture impact: Eliminates the need for the dual-enforcement model, removes the dependency on Databricks for security enforcement on ~80% of sensitive datasets, simplifies the audit trail to a single system, removes JDBC routing overhead for SAS Viya, and enables true Fabric-primary governance without compromise. This single improvement would transform the architecture from a pragmatic compromise to a clean, unified platform.

2. Purview Native SAS Viya Lineage Connector (Critical)

Gap: Microsoft Purview has no connector for SAS Viya. Purview cannot parse SAS code (DATA steps, PROC SQL, macros) or ingest lineage from SAS Viya's metadata. This leaves a complete blind spot in the enterprise lineage graph for one of Greenfield's three technology pillars.

Current workaround: Greenfield deploys IBM Manta as a standalone cross-platform lineage engine specifically because Manta is the only product that natively parses SAS code and stitches SAS → Databricks → Fabric lineage end-to-end. This adds licensing cost, integration complexity, and a third-party dependency that would otherwise be unnecessary if Purview could cover all three pillars.

Requested improvement: Develop a native Purview connector for SAS Viya that parses SAS programs (DATA steps, PROC SQL, SAS macros, LIBNAME references), extracts column-level lineage, and stitches it into the existing Purview lineage graph alongside Fabric and Databricks lineage. The connector should support SAS Viya Compute Server (not CAS) as this is Greenfield's deployed configuration.

Architecture impact: Eliminates the dependency on Manta for lineage, reduces the governance stack from four systems (Purview + OneLake Security + Unity Catalog + Manta) to three, simplifies the Tier 3 lineage layer, reduces licensing cost, and delivers a truly unified enterprise lineage graph within Purview.

3. Purview Policy Push-Down to Databricks Unity Catalog (High)

Gap: Purview can scan Unity Catalog metadata and read lineage, but it cannot write policies back into Unity Catalog. Classification labels, access policies, and governance rules defined in Purview do not propagate as enforceable ACLs (row filters, column masks, grants) in Unity Catalog. The only product that provides this capability today is Collibra Protect.

Current workaround: Security policies must be defined and maintained independently in both Purview (for Fabric enforcement) and Unity Catalog (for Databricks enforcement). Classification labels are synchronized from Purview to Unity Catalog tags, but this is metadata replication only — not policy enforcement. Any policy change requires dual implementation, creating drift risk and operational overhead.

Requested improvement: Enable Purview to push governance policies (access rules, masking rules, row-level filters) directly into Databricks Unity Catalog as enforceable ACLs. When a steward classifies a column as Restricted in Purview and defines a masking rule, that rule should automatically propagate to Unity Catalog as a column mask function, enforced at query time in Databricks. This is bidirectional policy synchronization, not just metadata scanning.

Architecture impact: Creates a single governance authority (Purview) that enforces policies across both Fabric and Databricks, eliminates dual policy management, reduces drift risk, and removes the need to evaluate Collibra Protect as a complementary tool for Unity Catalog enforcement.

4. Purview Data Quality — Complex Business Rule Engine (High)

Gap: Purview Data Quality is limited to sampling-based statistical checks (completeness, conformity, accuracy, freshness, uniqueness, consistency) with no-code/low-code rule authoring. It lacks the ability to define complex multi-table business rules, cross-dataset referential integrity checks, or domain-specific validation logic required by regulated financial domains (actuarial, IFRS 17, AMF). The Greenfield architecture documents explicitly acknowledge this limitation (AD-08 trade-off).

Current workaround: Purview DQ is scoped only to Tier 1 (pre-Bronze structural validation). Tier 2 and Tier 3 quality enforcement relies on Great Expectations embedded in Fabric Spark notebooks and Databricks DLT Expectations — both requiring custom code. IKC's DQ profiling was evaluated as significantly more capable for complex business rules, creating pressure to retain IBM tooling for quality governance.

Requested improvement: Extend Purview DQ with a business rule engine supporting multi-table joins, temporal comparisons (e.g., month-over-month variance thresholds), domain-specific validation expressions, and ML-driven anomaly detection. Provide a rule SDK or SQL-based rule authoring interface that governance stewards can use without writing Spark code. Enable DQ scores to aggregate at data product level with SLA tracking and breach alerting.

Architecture impact: Enables Purview to serve as the unified DQ authority across all three tiers, eliminates the argument for retaining IBM Knowledge Catalog for quality governance, simplifies the DQ stack, and provides a single pane of glass for data quality scores tied to data product SLAs.

5. Purview Data Product Certification & Marketplace (High)

Gap: Purview's data product capabilities are nascent. It lacks mature workflows for data product certification, lifecycle management (proposal → design → build → certify → publish → operate → retire), formal data contract management, and a data product marketplace where consumers can discover, request access to, and subscribe to certified data products. IKC's data product certification and marketplace capabilities are acknowledged as more mature.

Current workaround: Data product lifecycle management is handled through manual processes, Purview catalog entries, and custom Power BI dashboards for SLA tracking. There is no integrated certification workflow that connects quality scores, lineage completeness, access policies, and freshness SLAs into a single product health assessment.

Requested improvement: Build a native data product management module in Purview that supports formal data contracts (schema, freshness SLA, quality thresholds, access policies, lineage, owner), automated certification workflows with quality gates, a consumer-facing marketplace for data product discovery and access request, subscription management with change notification, and data product health dashboards integrating DQ scores, lineage coverage, and SLA compliance.

Architecture impact: Delivers the governance backbone needed for Greenfield's Gold-layer data products (Customer 360, Reference Data Hub, Financial Aggregates, Risk Feature Store). Eliminates the argument for retaining IKC for data product governance and enables a "data as a product" operating model fully integrated in the Microsoft governance stack.

6. Fabric Native GPU Compute for ML Training (High)

Gap: Fabric Data Science lacks GPU instance pools for deep learning, LLM fine-tuning, and GPU-intensive model training. All GPU workloads must be routed to Databricks, which maintains GPU-optimized clusters (V100, A100).

Current workaround: The AI/ML architecture operates a dual-engine model where Fabric Data Science handles standard ML (classification, regression, clustering) and Databricks handles advanced ML requiring GPU compute. This splits the ML lifecycle across two platforms with different experiment tracking, model registry, and deployment paradigms.

Requested improvement: Provide GPU-accelerated Spark compute pools in Fabric Data Science, supporting at minimum NVIDIA A100/H100 instances for deep learning training, distributed training with PyTorch/TensorFlow, and integration with Fabric's MLflow tracking. Include cost controls (per-workspace GPU budget caps, auto-terminate policies).

Architecture impact: Consolidates the ML lifecycle into Fabric for the majority of workloads including deep learning, reduces Databricks scope further, simplifies experiment tracking and model governance, and accelerates the path to a truly unified Fabric platform.

7. Fabric Production-Grade Model Serving (High)

Gap: Fabric lacks a production-grade model serving endpoint with SLA guarantees, auto-scaling, A/B testing, canary deployments, and GPU inference support. All real-time model serving must use Databricks Model Serving or Azure AI Foundry.

Current workaround: Models trained in Fabric Data Science must be exported to Databricks Model Serving or Azure AI Foundry for production deployment. This breaks the end-to-end Fabric experience and requires cross-platform MLflow model registry synchronization.

Requested improvement: Deliver a Fabric Model Serving capability with REST API endpoints, auto-scaling (including scale-to-zero), SLA guarantees (p99 latency, availability), A/B testing and canary deployment support, GPU inference for deep learning models, and native integration with Fabric MLflow model registry.

Architecture impact: Completes the ML lifecycle within Fabric, eliminates the dependency on Databricks Model Serving for the majority of production models, and enables a single platform for train → register → deploy → monitor.

8. Fabric Online Feature Store (High)

Gap: Fabric has no equivalent to Databricks Feature Store for low-latency online feature serving. Feature engineering can be done in Fabric Spark, but there is no mechanism to serve features in real-time (sub-millisecond) for online inference.

Current workaround: All online feature serving uses Databricks Feature Store. Batch features computed in Fabric must be replicated to Databricks Feature Store for real-time consumption, creating data movement and synchronization overhead.

Requested improvement: Build a Fabric-native online Feature Store with low-latency serving (sub-10ms), point lookups by entity key, automatic feature freshness from Fabric Lakehouses (batch) and Eventstream (streaming), integration with Fabric Model Serving endpoints, and feature monitoring (drift detection, serving statistics).

Architecture impact: Removes the last major ML infrastructure dependency on Databricks for organizations that do not require GPU training, enabling a fully self-contained Fabric ML stack for standard ML/AI workloads.

9. Purview Stewardship Workflow Engine (Medium-High)

Gap: Purview's stewardship workflows are limited compared to IBM Knowledge Catalog. IKC provides mature governance workflow capabilities including multi-step approval chains, domain-scoped stewardship assignment, issue tracking, remediation workflows, and governance scorecards. Purview's workflow support is basic and lacks the depth needed for regulated domain governance (AMF, OSFI, Law 25).

Current workaround: Governance workflows are supplemented with external tools (ServiceNow, custom Power Automate flows) to handle access request approvals, data quality issue remediation, classification review, and regulatory evidence pack generation.

Requested improvement: Enhance Purview with a configurable stewardship workflow engine supporting multi-step approval chains with SLA tracking, domain-scoped stewardship assignment and delegation, governance issue tracking with remediation workflows, automated regulatory evidence pack generation (audit trail, classification history, access reviews), and integration with data product certification gates.

Architecture impact: Enables Purview to serve as the single governance operating environment for all stewardship activities, eliminates the need for external workflow tools, and removes one of the key arguments for retaining IKC.

10. Fabric Complex Stateful Streaming (Medium-High)

Gap: Fabric Real-Time Intelligence (Eventhouse, Eventstream, KQL) handles event-driven analytics and near-real-time dashboarding well, but it cannot support complex stateful stream processing with exactly-once guarantees, complex event processing (CEP), and windowed aggregations at the level of Databricks Structured Streaming.

Current workaround: Complex streaming pipelines (fraud detection, payment transaction processing, complex CEP) remain on Databricks Structured Streaming. This is one of the explicit carve-outs in the Fabric-primary architecture that maintains Databricks compute scope.

Requested improvement: Extend Fabric Real-Time Intelligence (or introduce a new Fabric Streaming workload) with support for exactly-once processing guarantees, complex stateful processing (sessionization, deduplication with large state stores), complex event pattern matching, and watermark-based late data handling comparable to Spark Structured Streaming.

Architecture impact: Reduces the remaining Databricks compute scope, potentially eliminating the streaming carve-out entirely for organizations whose streaming requirements are within the enhanced capability envelope.

11. Fabric Capacity Metrics — User-Level and Item-Level CU Attribution (Medium)

Gap: Fabric Capacity Metrics API provides CU consumption at workspace and item-type level, but granular attribution at user level and individual item level is limited. This creates challenges for sector-level chargeback when multiple teams share workspaces or when pinpointing cost drivers within a workspace.

Current workaround: The FinOps architecture uses workspace-to-sector mapping as the primary attribution mechanism. Shared workspaces require proportional allocation using measured CU consumption per user/workload, which is imprecise with current API granularity.

Requested improvement: Extend the Capacity Metrics API with per-user CU consumption per workspace per day, per-item (individual Lakehouse, Pipeline, Notebook, Report) CU consumption, service principal attribution for automated workloads, and historical retention of at least 90 days for trend analysis.

Architecture impact: Enables precise sector-level chargeback without proportional estimation, improves FinOps governance, supports capacity right-sizing decisions, and delivers the cost transparency expected by Greenfield's CFO and sector stewards.

12. Purview DQ Observability and Pipeline Health Monitoring (Medium)

Gap: Purview has no standalone DQ observability module. Quality monitoring relies on Fabric Monitoring Hub for pipeline execution and Purview for scan results, with no integrated view of pipeline health, data freshness, volume anomalies, and schema drift across the entire data estate.

Current workaround: Custom Power BI dashboards aggregate DQ metrics from multiple sources (Purview DQ scan results, Great Expectations outputs, DLT Expectations logs, Fabric pipeline execution logs). There is no unified alerting system for data quality incidents.

Requested improvement: Build a Purview DQ Observability module with continuous monitoring of data freshness (actual vs. SLA), volume anomaly detection (row count drift, file size spikes), schema drift detection and alerting, pipeline health correlation (DQ failure → pipeline failure → downstream impact), and unified alerting with integration to Azure Monitor action groups.

Architecture impact: Provides a single pane of glass for data operations health, reduces mean-time-to-detect for data quality incidents, and enables proactive SLA management for Gold-layer data products.

13. Fabric Managed VNet — GA Maturity and Full Workload Coverage (Medium)

Gap: Fabric Managed VNet capability is available but not yet mature for all workload types. Network isolation for Fabric Spark workloads, private endpoint connectivity to external Azure services, and egress restriction enforcement are not consistently available across all Fabric item types.

Current workaround: Production Fabric capacities are deployed within Managed VNets where available. For workloads where Managed VNet is not supported, Azure Firewall and private endpoints are configured separately, creating inconsistent network security posture.

Requested improvement: Deliver Managed VNet GA with full coverage across all Fabric workload types (Spark, Data Factory, Dataflows Gen2, Real-Time Intelligence, Data Science). Ensure consistent private endpoint support, egress restriction enforcement, and integration with Azure Firewall for workloads that require custom network policies.

Architecture impact: Delivers consistent network security posture across all Fabric workloads, meets OSFI and AMF network isolation expectations, and simplifies the infrastructure security model.

14. Purview Sensitivity Label Enforcement in Fabric Spark (Medium)

Gap: Purview sensitivity labels propagate to Fabric items and Power BI, but they do not enforce access restrictions when data is accessed through Fabric Spark notebooks. A table labeled "Restricted" in Purview can still be read by any notebook user with workspace Member role.

Current workaround: Sensitivity labels serve as classification metadata only in the Spark context. Actual enforcement relies on workspace RBAC (coarse-grained) and the dual-enforcement model routing Restricted data through Unity Catalog.

Requested improvement: Enable Purview sensitivity labels to trigger enforceable access policies in Fabric Spark. When a table or column is labeled "Restricted," Spark should enforce access checks against the user's Entra ID group memberships before returning data. This is closely related to Improvement #1 but specifically addresses the Purview-to-Spark policy chain.

Architecture impact: Closes the loop between Purview classification and Fabric runtime enforcement, making sensitivity labels actionable across all engines rather than advisory metadata for Spark.

15. Fabric Deployment Pipelines — Full Data Engineering CI/CD (Medium)

Gap: Fabric Deployment Pipelines support promotion of Fabric items (reports, semantic models, Lakehouses) across environments (dev → test → prod), but the CI/CD experience for data engineering artifacts (Spark notebooks, pipeline definitions, Dataflow Gen2 definitions) is less mature than Databricks Asset Bundles with Azure DevOps integration.

Current workaround: Fabric Deployment Pipelines are used for item promotion, supplemented by Azure DevOps Git integration for version control of notebooks and pipeline definitions. The integration is functional but lacks the maturity of Databricks Asset Bundles for parameterized deployments, automated testing, and rollback.

Requested improvement: Enhance Fabric Deployment Pipelines with parameterized environment configurations (connection strings, capacity assignments, security group mappings per environment), automated data quality test execution as a promotion gate, rollback capability with one-click revert to previous deployment, Git-based pipeline-as-code with full API support for CI/CD orchestration, and integration with Azure DevOps/GitHub Actions for approval gates.

Architecture impact: Brings Fabric CI/CD to parity with Databricks Asset Bundles, enables mature DevOps practices for Fabric data engineering teams, and reduces the risk of production deployment issues.

16. Materialized Lake Views — Accelerate to GA and Close DLT Feature Gap (Medium)

Gap: Fabric's Materialized Lake Views (MLVs), announced at Build 2025, are the direct counterpart to Databricks Delta Live Tables (DLT). They share the same declarative philosophy: define SQL transformations, let the engine handle execution order, dependency resolution, incremental refresh, and built-in data quality checks. However, MLVs remain in preview (as of March 2026) and have significant feature gaps compared to DLT: SQL-only (no Python/Scala transformations), no cross-lakehouse lineage or execution, incremental refresh limited to append-only sources (deletes and updates trigger full rebuilds), no UDF support in defining queries, no streaming table equivalent, no CDC-based change processing, and no quarantine flow for records failing quality expectations.

Current workaround: Fabric data engineering teams use imperative Spark notebooks and Data Pipelines with manually coded quality checks (Great Expectations) for Silver/Gold transformations. Some teams have adopted MLVs for simpler Gold-layer aggregations, but complex pipelines with CDC, UDFs, or mixed Python/SQL logic remain on Databricks DLT.

Requested improvement: Accelerate MLVs to GA with the following enhancements: Python/Scala transformation support (not just SQL), full CDC/SCD Type 2 handling with incremental refresh on updates and deletes, UDF support in defining queries, cross-lakehouse lineage and execution, streaming MLVs (equivalent to DLT streaming tables) for near-real-time materialization from Eventstream sources, quarantine/dead-letter handling for records failing quality expectations, pipeline monitoring dashboard with refresh history and data quality trends, and integration with Purview DQ for quality score publication.

Architecture impact: Closes the declarative pipeline gap that currently keeps Databricks DLT in scope for complex data engineering. With mature MLVs, the majority of Silver/Gold transformations — including CDC-heavy pipelines — can run natively in Fabric, further reducing Databricks compute scope and simplifying the engineering experience.

17. Fabric IQ Ontology — Accelerated GA with Purview Integration (Lower)

Gap: Fabric IQ (ontology, graph engine, Data Agent, Operations Agent) is in preview. The ontology's integration with Purview glossary raises overlap concerns: both systems could claim authority over business terms and definitions, recreating a dual-glossary problem.

Current workaround: Fabric IQ adoption is deferred to Horizon 2 with validation gates. The Customer 360 ontology POC is planned but contingent on clear Purview integration semantics and Architecture Board approval.

Requested improvement: Accelerate Fabric IQ to GA with clear delineation of responsibilities between Fabric IQ ontology and Purview glossary (e.g., Purview owns governance/classification terms; Fabric IQ owns semantic relationships and query intelligence). Ensure the ontology bootstraps from Purview glossary terms rather than requiring duplicate definition. Deliver Operations Agent GA with grounding in enterprise ontology for real-time stream intelligence.

Architecture impact: Enables the Horizon 2-3 semantic intelligence vision (ontology + RAG for structured + unstructured retrieval), unlocks AI-powered data discovery and natural language querying, and positions Fabric as a differentiated analytics platform.

18. Purview Cross-Platform Lineage Stitching — Fabric ↔ Databricks ↔ SAS (Lower)

Gap: Purview natively stitches Fabric ↔ Databricks lineage but cannot stitch SAS Viya into the graph (see Improvement #2). Even for Fabric ↔ Databricks, the stitching relies on scanning Unity Catalog system tables — column-level lineage from Databricks notebook runs is GA but SQL queries and jobs lineage is still in preview.

Current workaround: Manta provides the complete cross-platform lineage graph, published to Purview for visualization. Purview serves as lineage consumer, not producer, for cross-platform flows.

Requested improvement: Bring Databricks SQL query and job lineage scanning to GA in Purview. Deliver the SAS Viya connector (Improvement #2). Once both are available, Purview can serve as both lineage producer and consumer for the full three-pillar platform, potentially eliminating the Manta dependency.

Architecture impact: Reduces the governance stack from four tools to three (eliminating Manta), simplifies lineage architecture, and delivers a truly unified lineage experience within Purview.

Summary — Priority Impact Matrix

#	Improvement	Product	Impact	Complexity
1	Unified cross-engine security (ABAC in OneLake)	Fabric	Critical	Very High
2	Native SAS Viya lineage connector	Purview	Critical	High
3	Policy push-down to Databricks Unity Catalog	Purview	High	High
4	Complex business rule DQ engine	Purview	High	Medium
5	Data product certification & marketplace	Purview	High	Medium
6	Native GPU compute for ML training	Fabric	High	High
7	Production-grade model serving	Fabric	High	High
8	Online Feature Store	Fabric	High	High
9	Stewardship workflow engine	Purview	Medium-High	Medium
10	Complex stateful streaming	Fabric	Medium-High	High
11	User/item-level CU cost attribution	Fabric	Medium	Low
12	DQ observability & pipeline health monitoring	Purview	Medium	Medium
13	Managed VNet GA — full workload coverage	Fabric	Medium	Medium
14	Sensitivity label enforcement in Spark	Purview/Fabric	Medium	High
15	Full data engineering CI/CD in Deployment Pipelines	Fabric	Medium	Medium
16	Materialized Lake Views — accelerate GA, close DLT gap	Fabric	Medium	Medium
17	Fabric IQ ontology — accelerated GA	Fabric	Lower	High
18	Cross-platform lineage stitching GA	Purview	Lower	Medium

Document prepared by the Office of the Chief Data Officer, Data & AI Solutions, Greenfield.

27 KiB Raw Blame History