You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
95 lines
12 KiB
Markdown
95 lines
12 KiB
Markdown
# IBM CP4D vs. Microsoft Purview vs. Collibra — Deep-Dive Comparison
|
|
|
|
**Context:** Greenfield Modern Data Platform — Databricks + Microsoft Fabric + SAS Viya on Azure
|
|
**Date:** March 2026
|
|
|
|
---
|
|
|
|
## 1. Data Quality
|
|
|
|
| Dimension | IBM Cloud Pak for Data (5.x) | Microsoft Purview (Unified Catalog) | Collibra DQ & Observability |
|
|
|---|---|---|---|
|
|
| **Approach** | Embedded in the data-fabric layer via Knowledge Catalog. Rule authoring in Data Refinery / DataStage. ML-assisted profiling and anomaly detection. | Standalone DQ service inside Unified Catalog. Spark 3.4/Delta Lake 2.4 compute engine managed by Purview. | SQL-based pushdown engine. Rules execute directly against the source database — no data movement. |
|
|
| **Rule authoring** | Low-code rule builder in Knowledge Catalog + custom rules in DataStage flows. Natural-language rule description mapped to enforcement. | No-code OOB rules (completeness, conformity, accuracy, freshness, uniqueness, consistency) + custom rules via ADF expression, SQL expression (GA March 2026), and regex. AI-generated rule suggestions from profiling. | Natural-language → SQL translation (Gen AI). 30+ OOB scanners. Full SQL-based custom rules with pushdown to source. |
|
|
| **Profiling** | Automated statistical profiling with ML-driven column classification. Integrated in Knowledge Catalog. | AI-powered profiling with auto-column recommendation. Standard stats: distribution, min/max, stddev, uniqueness, completeness, duplicates. | Automated profiling with anomaly, duplicate, outlier, and threshold detection. ML learns patterns over time and refines rules. |
|
|
| **Scoring & aggregation** | Quality scores at asset level, tied to governance artifacts in the catalog. | Scores aggregated at column → asset → data product → governance domain levels. Scores visible across the Unified Catalog. | Quality scoring at asset level, integrated into the Collibra governance model. Scores feed into trust/certification workflows. |
|
|
| **Observability** | DataBand (acquired) provides pipeline observability and anomaly alerting. Integrated in CP4D DataOps. | Incremental scanning (preview) and error-record publishing (GA Feb 2026) enable pipeline-level quality monitoring. No standalone observability module — relies on Fabric monitoring. | Native DQ & Observability module. Continuous monitoring of pipeline health, freshness, volume, schema drift. Integrated alerting. |
|
|
| **Key differentiator** | Tight coupling with DataStage ETL; quality rules can execute inside the transformation pipeline itself. Strong for hybrid/multi-cloud. | Native Azure/Fabric integration. DQ scans run on managed Spark — no infra to manage. Best when the estate is predominantly Azure. | Source-pushdown execution means no data copying. ML-adaptive rules reduce manual tuning. Strongest standalone DQ engine of the three. |
|
|
|
|
**Assessment for Greenfield:**
|
|
Purview has the home-field advantage on Fabric and is catching up fast (SQL custom rules now GA, error-record publishing). Collibra has the most mature and engine-agnostic DQ capability — critical if you need uniform quality rules across Databricks, Fabric, *and* SAS Viya. CP4D's DQ is solid but adds a fourth platform to manage.
|
|
|
|
---
|
|
|
|
## 2. Lineage — Coverage of Fabric, Databricks, and SAS Viya
|
|
|
|
| Source | IBM CP4D (Manta) | Microsoft Purview | Collibra Data Lineage |
|
|
|---|---|---|---|
|
|
| **Databricks Unity Catalog** | Supported. Lineage import from Azure Databricks notebooks and Unity Catalog metadata. Configuration documented since CP4D 5.2. | Native connector. Scans Unity Catalog system tables (`system.access`). Column-level lineage from notebook runs (GA). SQL queries, jobs, pipelines lineage in preview (expected late 2025/early 2026). | Native integration via Edge. Column-level lineage from notebooks, jobs, and dashboards (enhanced 2025.05). Metadata ingestion with proxy support. |
|
|
| **Microsoft Fabric / Power BI** | New Fabric Warehouse connector in CP4D 5.3. Power BI (Azure) lineage import supported. | **Native and deepest.** Full lineage across Fabric items (Lakehouse, Warehouse, Pipelines, Dataflows, Power BI datasets/reports). Automatic — no configuration needed for Fabric-to-Fabric flows. | Power BI lineage supported via Edge (report → dataset → source stitching). ADF lineage via dedicated integration. Fabric Lakehouse/Warehouse lineage is indirect — stitched through ADF/Power BI connectors, not a native Fabric connector. |
|
|
| **SAS Viya** | **Strongest.** Manta natively parses SAS code programs (DATA steps, PROC SQL, macros). SAS integration requirements are documented. Every SAS code file counts as a lineage script. | **Not supported.** No SAS connector exists. Custom lineage via Atlas hooks / REST API is the only workaround. Solidatus (partner) could bridge the gap but adds cost and complexity. | **Not natively supported** in the OOB scanner list. Would require Custom Technical Lineage (CTL) via Collibra's API or a third-party connector. No documented SAS scanner. |
|
|
| **Cross-platform stitching** | Manta stitches lineage across all connectors it supports (SAS → Databricks → Power BI flows). Requires configuration of each connector but delivers an end-to-end graph. | Stitches Fabric ↔ Databricks lineage natively in the Unified Catalog. SAS remains a gap — the graph is incomplete without a manual bridge. | Stitches across all Edge-connected sources. Databricks ↔ Power BI stitching works. SAS would be a gap unless CTL is implemented. CLI harvester deprecated (EOL July 2026) — must use Edge. |
|
|
| **Granularity** | Column-level for databases and ETL. Code-level for SAS and SQL scripts. | Column-level for Fabric and Databricks notebooks. | Column-level for Databricks, Power BI, and JDBC sources. |
|
|
|
|
**Assessment for Greenfield:**
|
|
SAS Viya lineage is the discriminating factor. Only IBM Manta natively parses SAS code and can stitch SAS → Databricks → Fabric lineage end-to-end. Purview gives you the deepest Fabric lineage but leaves a SAS-shaped hole. Collibra is strong on Databricks and Power BI but also lacks native SAS coverage. For a three-pillar platform (Databricks + Fabric + SAS), Manta is currently the only product that covers all three natively.
|
|
|
|
---
|
|
|
|
## 3. Deep Integration with Protection / Enforce Planes
|
|
|
|
The enforce planes in scope are **Microsoft OneLake** (Fabric security), **Databricks Unity Catalog** (UC), and **SAS Viya** security.
|
|
|
|
### 3.1 Microsoft OneLake / Fabric
|
|
|
|
| Capability | IBM CP4D | Microsoft Purview | Collibra |
|
|
|---|---|---|---|
|
|
| **Policy definition** | Policies defined in CP4D Knowledge Catalog. No native push to OneLake — enforcement relies on Data Virtualization layer or manual sync. | **Native integration.** Purview sensitivity labels propagate to Fabric items. Information Protection policies (GA 2025) enforce access at the Fabric workspace and item level. Data Product Access Policies in Unified Catalog control who can access governed data products. | No native OneLake policy push. Collibra governs metadata and access policies in its own model — enforcement in Fabric must be done via ADF/Fabric RBAC separately. |
|
|
| **Row/column security** | Via Data Virtualization masking and access rules — but this is a CP4D-side enforcement, not pushed into OneLake. | OneLake supports row-level, column-level, and table-level security. Purview labels and policies cascade through Power BI, Spark, and T-SQL engines. | Not directly enforceable on OneLake from Collibra. |
|
|
| **Integration depth** | Shallow. CP4D can *read* from Fabric Warehouse but doesn't *enforce* in OneLake. | **Deep and native.** Purview is the control plane for Fabric security. | Shallow. Collibra integrates with Power BI metadata but does not push enforcement into OneLake. |
|
|
|
|
### 3.2 Databricks Unity Catalog
|
|
|
|
| Capability | IBM CP4D | Microsoft Purview | Collibra |
|
|
|---|---|---|---|
|
|
| **Policy definition** | CP4D governance policies do not push down into Unity Catalog ACLs. Enforcement stays within CP4D's own perimeter (Data Virtualization, Watson Query). | Purview can scan UC and read lineage but **does not write policies back into UC.** UC remains its own enforce plane. Purview provides visibility, not enforcement. | **Collibra Protect for Databricks (GA late 2024).** Column-level and row-level access enforcement pushed directly into Unity Catalog. Bring-Your-Own-Masking support for custom masking functions. This is true policy push-down. |
|
|
| **Masking** | CP4D-side masking via Data Virtualization. Not pushed to UC. | Not pushed to UC. | Masking rules defined in Collibra, enforced in UC via Protect. Supports hashing, redaction, and custom functions. |
|
|
| **Integration depth** | Shallow. Read lineage, no write-back. | Medium. Reads metadata/lineage, runs DQ scans against UC, but no policy push-down. | **Deep.** Collibra Protect is the only product of the three that enforces access policies directly inside Unity Catalog. |
|
|
|
|
### 3.3 SAS Viya Security
|
|
|
|
| Capability | IBM CP4D | Microsoft Purview | Collibra |
|
|
|---|---|---|---|
|
|
| **Policy push-down** | No documented integration that pushes CP4D policies into SAS Viya's authorization model. SAS Viya has its own identity/authorization subsystem. However, CP4D can understand SAS code lineage (via Manta), which supports impact analysis for policy decisions. | No integration with SAS Viya security layer. | No integration with SAS Viya security layer. |
|
|
| **Integration depth** | Lineage-only (Manta). No enforce-plane integration. | None. | None. |
|
|
|
|
**Note:** SAS Viya security is a closed ecosystem. None of the three platforms can push policies into SAS Viya's authorization model. SAS manages its own access control through its Identity and Authorization subsystem. The best any external governance tool can do is provide lineage and impact analysis (which only Manta does today).
|
|
|
|
---
|
|
|
|
## Summary Matrix
|
|
|
|
| Axis | IBM CP4D (Manta) | Microsoft Purview | Collibra |
|
|
|---|---|---|---|
|
|
| **Data Quality** | Solid, ETL-integrated. Adds platform complexity. | Native Azure/Fabric. Fast-maturing. GA SQL rules, error publishing. | Most mature standalone engine. Source-pushdown, ML-adaptive. |
|
|
| **Lineage — Fabric** | Connector in 5.3 (new). | **Native and deepest.** | Indirect via Power BI / ADF stitching. |
|
|
| **Lineage — Databricks** | Supported via Manta. | Native UC scanner. Column-level from notebooks. | Native via Edge. Notebooks, jobs, dashboards. |
|
|
| **Lineage — SAS Viya** | **Native SAS code parsing. Unique advantage.** | Not supported. | Not supported. |
|
|
| **Enforce — OneLake** | Shallow. | **Native. Labels + policies cascade.** | Shallow. |
|
|
| **Enforce — Unity Catalog** | Shallow. | Medium (read, no write-back). | **Deep. Collibra Protect pushes ACLs into UC.** |
|
|
| **Enforce — SAS Viya** | Lineage only. | None. | None. |
|
|
|
|
---
|
|
|
|
## Implications for Greenfield
|
|
|
|
No single product covers all three enforce planes natively. The realistic options are:
|
|
|
|
1. **Purview + Collibra Protect** — Purview governs Fabric/OneLake natively and provides DQ. Collibra Protect handles Unity Catalog enforcement. SAS lineage remains a gap (mitigated by Purview REST API / custom lineage, or by adding Manta standalone for SAS parsing only).
|
|
|
|
2. **Purview + Manta (standalone)** — Purview for Fabric governance and DQ. Manta specifically for SAS lineage (and cross-platform stitching). Unity Catalog enforcement stays native within Databricks (acceptable if UC-native RBAC is sufficient).
|
|
|
|
3. **Collibra as primary governance + Purview for Fabric enforce** — Collibra provides DQ, Databricks enforcement (Protect), and catalog. Purview remains the enforce plane for Fabric (unavoidable — it's built into the platform). SAS lineage still requires Manta or custom development.
|
|
|
|
The critical trade-off: **SAS Viya lineage is only natively covered by IBM Manta**, and **Unity Catalog policy push-down is only natively covered by Collibra Protect**. Purview owns the Fabric enforce plane by design. A best-of-breed approach likely combines at least two of these tools.
|