Skip to content

Metadata Model Structure

Ingestia is a metadata-driven framework.

All execution behavior is derived from structured metadata definitions — not from hardcoded logic inside pipelines.

The Metadata Model defines how structure, constraints, and execution behavior are declared.


Core Principle

Data behavior must be declared, not inferred.

The Ingest Engine interprets metadata definitions and translates them into deterministic operations.

No structural decision is implicit.


Metadata Architecture

Ingestia uses a simplified and explicit metadata structure:

  • One metadata dictionary per table
  • One optional integrity dictionary
  • Partition strategy declared at column level
%%{init: { 
  "theme": "base",
  "flowchart": { "nodeSpacing": 26, "rankSpacing": 34, "curve": "basis" },
  "themeVariables": {
    "mainBkg": "transparent",
    "fontSize": "14px"
  }
}}%%

flowchart TB
  classDef default fill:transparent,stroke-width:1px;

  DD["data_dictionary"] --> TN["table_name
────────"]

  TN --> TBL["table
────────
catalog
schema
domain
container
description
purpose
write_mode
merge_schema
table_properties"]

  TN --> COLS["columns
────────
<source_column_name>
target_column
data_type
nullable
transform_expr
key
partition
z_order
delta_column"]

  TN --> QR["quality_rules
────────
<rule_name>
type
active
severity
expr"]

  TN --> CS["constraints
────────
<constraint_name>
type
active
severity
columns
reference"]

  CS --> REF["reference
────────
catalog
schema
table
columns"]

Table Metadata Dictionary

Each table has its own metadata dictionary.

It defines:

  • target layer
  • domain
  • object name
  • column structure
  • write behavior
  • control column activation
  • integrity reference (optional)

This dictionary is the single source of truth for table behavior.


Column-Level Partition Strategy

Partitioning is declared directly within column definitions.

There is no separate partition dictionary.

If a column participates in partitioning:

  • it is declared explicitly
  • helper columns such as _partition_<column_name> may be generated
  • derivation logic is deterministic

Partition behavior is structural, not inferred.


Integrity Dictionary

The integrity dictionary is a separate structure referenced by the table metadata.

It may define:

  • primary key rules
  • unique combinations
  • referential integrity
  • structural validations

This separation ensures:

  • structural metadata remains clean
  • constraint logic remains reusable
  • integrity rules can evolve independently

The integrity dictionary is optional but recommended for governed environments.


Execution Relationship

Responsibility Table Metadata Integrity Dictionary Engine
Structure Yes No Executes
Keys Yes May validate Executes
Constraints No Yes Executes
Partitioning Yes (column-level) No Executes
Write Mode Yes No Executes

The engine interprets both dictionaries during execution.


Deterministic Design

Because metadata is explicit:

  • execution is reproducible
  • structural behavior is auditable
  • engine logic remains stable
  • changes are configuration-driven

The engine evolves independently from metadata definitions.


Scope Boundaries

The Metadata Model does not:

  • enforce a specific modeling methodology
  • impose naming conventions beyond reserved patterns
  • dictate enterprise governance models

It provides a declarative structural control plane for the Lakehouse.