A compiler for your data platform

The neutral
intermediate
representation
for enterprise ETL.

Turn your legacy Informatica, DataStage, and SSIS estate into a platform-agnostic blueprint you own forever. Analyze it. Extract end-to-end column-level lineage. Emit to any modern target — on your timeline, with the partner of your choice.

canonical-ir · customer_ltv.cir.yaml v0.1
# Canonical IR — derived from m_CUSTOMER_LTV.xml
pipeline: customer_ltv
version: "0.1"
source:
  kind: relational
  ref: conn://oracle_crm/CUSTOMERS

transforms:
  - id: filter_active
    op: Filter
    predicate: (status == "ACTIVE")

  - id: compute_ltv
    op: Project
    expressions:
      ltv: Sum(order.total) * Coalesce(segment.mult, 1.0)
    types:
      ltv: Decimal(18,4)

lineage:
  out.ltv:
     order.total, segment.mult
    via: [Sum, Coalesce, Multiply]
raw-ir → c-ir lossless emitter-ready
Parses from
Informatica PowerCenter · IICS · SSIS · DataStage · and more
Emits to
dbt · Databricks · Snowflake · Airflow · Azure Data Factory · AWS Glue
§01 / Thesis

Migration is a project. C-IR is an asset.

Every major ETL migration vendor sells you hours, code, and a one-way door. We sell you a specification-grade representation of your estate that you own forever — and a choice, renewable every year, about what to do with it.

01

Nobody knows what they actually have.

Documentation is stale. Original developers left years ago. Platform-native catalogs capture only surface metadata. No migration decision is defensible without a complete, trustworthy inventory of logic and lineage — and that inventory does not exist today.

02

Migration is a one-shot bet.

Choosing between dbt, Databricks, Airflow, or ADF is a bet that is difficult to reverse. Customers either commit prematurely and regret it, or stall indefinitely and bleed license fees. Both failure modes are expensive.

03

Lineage is trapped inside platforms.

Column-level lineage lives inside Informatica. Inside Unity Catalog. Inside Purview. Inside Collibra. It does not exist across them. The estate's true lineage spans the whole stack — and nothing owns that span natively.

§02 / Architecture

A two-stage compiler for your data pipelines.

Source artifacts parse into a lossless Raw IR. A normalizer lowers them into Canonical IR — typed, specified, platform-agnostic. Intelligence runs over C-IR. Emitters produce target artifacts. Everything in between is versioned, diffable, and yours.

SOURCES RAW IR CANONICAL IR EMITTERS TARGETS PowerCenter DataStage SSIS IICS parse Raw IR lossless source-shaped normalize Canonical IR typed expressions transform graph control flow lineage graph specified · versioned · yours Intelligence & Lineage emit emit-dbt emit-databricks emit-airflow emit-adf partner SDK dbt / Snowflake Databricks Airflow ADF / Glue
01 · Parse Source-specific parsers produce lossless Raw IR that preserves every platform quirk.
02 · Normalize The normalizer lowers Raw IR into the specified Canonical IR with formal semantics.
03 · Analyze Intelligence runs over C-IR: lineage, complexity, duplication, dead code, risk.
04 · Emit Plug-in emitters — first-party, partner, or yours — produce target artifacts.
05 · Verify Round-trip and cross-emitter equivalence tests prove neutrality, not just claim it.
§03 / Product

Three stacked layers. Each one earns its price alone.

You can buy ETLIR for the blueprint alone. Or for the intelligence. Or for the emitters. Most customers start with one and grow into all three. The layers compose — they do not lock in.

LAYER 01 · ASSET

The blueprint of your estate.

A lossless, versioned, neutral Canonical IR repository of your entire ETL estate, living in your own git. Diffable. Reviewable. Portable. The product is valuable even if you never migrate.

  • Git-native C-IR artifacts
  • Formal specification & validator
  • Ingestion snapshots & diffs
  • Customer-owned forever
LAYER 02 · INTELLIGENCE

Clarity over a legacy estate.

Analytics and lineage computed from C-IR expression trees — not scraped from platform metadata. Column-level, cross-platform, pre-migration. Regulator-defensible by construction.

  • End-to-end column-level lineage
  • Complexity & risk scoring
  • Dead-code & duplication mining
  • Data-contract extraction
LAYER 03 · EMISSION

Target platforms, on your timeline.

Reference emitters for modern targets plus a stable SDK so your team or your partners can build custom emitters. A certified marketplace for the long tail. You choose the target — and the moment.

  • Reference: dbt, Databricks, Airflow
  • Open SDK + conformance suite
  • Round-trip equivalence testing
  • Partner-certified marketplace
The central insight
The durable value is not the migration. It is a neutral, platform-agnostic representation of your ETL estate — queryable, version-controlled, and enriched with the end-to-end lineage that no platform-bound tool can give you.
ETLIR · Product & Business Documentation v1.0
§04 / Lineage

Column-level lineage, before you migrate.

Because C-IR holds full expression ASTs — not opaque SQL — lineage is a derivation, not a heuristic.

Every output column traces back through every transform to its source columns, with the operators applied along the way made explicit. Works across platforms you haven't migrated to yet. Works across parsers. Exports as OpenLineage events or a native graph you can query.

Granularity
Column-level
Scope
End-to-end
Derivation
AST-based
Export
OpenLineage + Graph API
SOURCE COLS TRANSFORMS OUTPUT COLS orders.amount orders.status segments.mult customers.id customers.region exchange.rate Filter(status) Sum(amount) Multiply(·, mult) Join(customer) Convert(rate) output.ltv output.region output.ltv_usd derived from C-IR expression trees
§05 / Emitters

Every target. None of the lock-in.

First-party reference emitters ship under an open license. The SDK and conformance harness are open. Commercial and partner emitters extend the catalogue — certified against the same equivalence tests as our own.

Emitter catalogue v0.1 · reference & partner
emit-dbt
→ dbt · Snowflake · BigQuery
Reference · GA
emit-databricks
→ PySpark · Workflows
Reference · GA
emit-airflow
→ Python DAGs
Beta
emit-adf
→ Azure Data Factory
Beta
emit-glue
→ AWS Glue PySpark
Roadmap
emit-dagster
→ Dagster assets
Roadmap
emit-matillion
→ Matillion jobs
Partner · Proposed
your-emitter
→ Any target · SDK
Bring your own
§06 / Positioning

Why ETLIR, and not the others.

Capability Services-led migration Target-sponsored tools Catalog & lineage platforms ETLIR
Platform neutrality Delivery-dependent Biased toward one target Platform-aware surface only Neutral by construction
Customer-owned asset Target-specific code Target-specific code Metadata inside their platform Git-native C-IR, yours forever
Pre-migration lineage Manual, partial Not available Limited to scraped metadata Column-level, expression-derived
Optionality across targets Lost on day one Lost on day one N/A Preserved indefinitely
Engineering verifiability Opaque Closed Black-box lineage Equivalence-tested, conformance-suite
§07 / Roadmap

Where we are. Where we go.

Phase 01
Months 0 – 6

Foundation

  • C-IR specification v0.1
  • PowerCenter reference parser
  • Lineage & complexity v1
  • Three design partners live
Phase 02
Months 6 – 12

Early commercial

  • Hosted intelligence platform
  • emit-dbt & emit-databricks GA
  • Emitter SDK + conformance
  • OpenLineage export
Phase 03
Months 12 – 24

Platform

  • IICS & SSIS parsers
  • emit-airflow & emit-adf GA
  • Duplication mining, contracts
  • Partner marketplace
Phase 04
Months 24 – 36

Ecosystem

  • DataStage & premium parsers
  • Governance integrations
  • AI-assisted emitter authoring
  • International via SIs
§08 / Questions

The questions you're going to ask anyway.

Frequently asked,
answered directly.
Is ETLIR a migration tool?

No. ETLIR produces the neutral intermediate representation that migrations are built on top of. You can migrate with it, with a partner, or never at all. The C-IR and the intelligence over it are valuable on their own — most customers buy us long before they commit to a target platform.

How do you prove C-IR is actually neutral?

Every release ships round-trip and cross-emitter equivalence tests. The conformance suite is public. If a second emitter produces a semantically inequivalent pipeline from the same C-IR, that's a bug we can see. Neutrality is a test, not a claim.

Where does the C-IR live? What if I churn?

C-IR lives in your git repository as versioned YAML artifacts. You own them. If you stop paying us tomorrow, you still have the blueprint, the specification is open, and reference parsers and emitters are open source. The asset survives the vendor.

How is this different from OpenLineage or a catalog?

Catalogs and OpenLineage operate above the platform layer on metadata surfaces. ETLIR operates below — at the level of transforms and expression trees. We export to OpenLineage, so we complement existing investments. We do not replace them.

What sources do you parse on day one?

Informatica PowerCenter first — mappings, parameter files, connections, sessions, workflows, reusable transformations, and mapplets. IICS, SSIS, and DataStage follow. Community and partner parsers for the long tail are supported by the SDK.

Who's this for?

Enterprises with one thousand or more production ETL mappings on a legacy platform, a mandate or intent to modernize, and a regulated or governance-sensitive profile. Financial services, insurance, healthcare, life sciences, utilities, and public sector are the early ideal customers.

Own your ETL estate.
On your terms.

We are selecting three to five design partners for the first wave. Regulated industry, serious Informatica footprint, data leadership committed to a modernization answer within twelve months. If that's you, we should talk.