Ingest Engine¶
The Ingest Engine is the execution core of the Ingestia framework.
It translates declarative metadata into deterministic, controlled, and reproducible data operations.
Ingestia is not a collection of notebooks — it is a metadata-driven execution engine.
Design Principles¶
The Ingest Engine is built on the following principles:
- Metadata over hardcode
- Deterministic execution
- Idempotent by design
- Explicit operational metadata
- Clear separation between business and technical columns
- Layer-aware processing
Conceptual Execution Flow¶
The engine follows a predictable and deterministic execution pipeline.
%%{init: {
"theme": "base",
"flowchart": { "nodeSpacing": 20, "rankSpacing": 25, "curve": "basis" },
"themeVariables": {
"mainBkg": "transparent",
"lineColor": "#9a9a9a",
"fontSize": "14px"
}
}}%%
flowchart TD
classDef default fill:transparent,stroke-width:1px,color:#d7d7d7;
classDef decision fill:transparent,stroke-width:1px,color:#d7d7d7;
A[Receive Source DataFrame + Metadata] --> B[Parse Metadata Dictionary]
B --> C[Validate Structural Requirements]
C --> D[Apply Column Transformations]
D --> E[Add Control Columns]
E --> F{Constraints Enabled?}
F -- Yes --> G[Apply Constraint Validation]
F -- No --> H[Skip Constraint Layer]
G --> I[Apply Partition Logic]
H --> I
I --> J[Execute Write Mode]
J --> K[Return Structured Execution Result]
class F decision
Each step is explicitly derived from metadata definitions.
The ingest() Contract¶
The engine is executed through a single entry point:
ingest()
Conceptually, it receives:
- a source dataset
- a metadata definition
- execution configuration
- optional runtime parameters
The metadata determines:
- column structure
- key definitions
- partition strategy
- write mode
- constraint behavior
- operational column handling
The engine does not infer business logic.
All structural decisions must be declared.
Write Modes¶
Write behavior is explicitly declared in metadata.
append¶
Adds new records without removing existing data.
overwrite¶
Replaces target content based on declared strategy.
merge (future-ready)¶
Supports key-based upsert logic when primary keys are defined.
The engine never infers write behavior.
Control and Operational Metadata¶
The engine manages operational traceability through reserved columns such as:
_batch_id_ingestion_id_ingestion_dt_partition_<column_name>
These enable:
- idempotent execution
- incremental strategies
- traceability
- deterministic reprocessing
Operational columns are never considered business attributes.
Idempotency¶
Ingestia is designed to avoid inconsistent states.
Idempotency is achieved through:
- deterministic batch identification
- explicit write strategies
- metadata-controlled partition logic
- structured execution boundaries
Reprocessing the same batch under the same metadata must produce the same result.
Error Handling Philosophy¶
The engine does not silently ignore structural violations.
Execution results are structured and explicit:
- status
- validation messages
- execution metadata
- processing metrics
Failure is visible and traceable.
Future extensions may introduce severity levels such as:
- ERROR
- WARN
- QUARANTINE
- SKIP
Layer Awareness¶
The engine respects logical layer boundaries:
- Raw layer → minimal structural enforcement
- Transformation layer → standardization and structural rules
- Serving layer → consumption-oriented datasets
The engine enforces structure but does not dictate modeling methodology.
Scope Boundaries¶
The Ingest Engine does not:
- enforce surrogate key usage
- impose modeling frameworks (Kimball, Inmon, etc.)
- manage semantic layer logic
- dictate enterprise governance models
It focuses strictly on deterministic ingestion and structural enforcement.