Skip to content

Latest commit

 

History

History
91 lines (55 loc) · 5.99 KB

File metadata and controls

91 lines (55 loc) · 5.99 KB
graph LR
    Dataflow_Definition_API["Dataflow Definition API"]
    Input_Connectors["Input Connectors"]
    Output_Connectors["Output Connectors"]
    Stream_Processing_Core["Stream Processing Core"]
    State_Management_Recovery["State Management & Recovery"]
    Runtime_Execution_Engine["Runtime Execution Engine"]
    Dataflow_Definition_API -- "Defines/Configures" --> Input_Connectors
    Dataflow_Definition_API -- "Defines/Composes" --> Stream_Processing_Core
    Dataflow_Definition_API -- "Defines/Configures" --> Output_Connectors
    Dataflow_Definition_API -- "Provides Definition To" --> Runtime_Execution_Engine
    Input_Connectors -- "Feeds Data To" --> Runtime_Execution_Engine
    Runtime_Execution_Engine -- "Orchestrates Execution Of" --> Stream_Processing_Core
    Stream_Processing_Core -- "Reads/Writes State" --> State_Management_Recovery
    Stream_Processing_Core -- "Sends Processed Data To" --> Output_Connectors
    State_Management_Recovery -- "Provides State Persistence For" --> Stream_Processing_Core
    click Dataflow_Definition_API href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/bytewax/Dataflow_Definition_API.md" "Details"
    click Input_Connectors href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/bytewax/Input_Connectors.md" "Details"
    click Output_Connectors href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/bytewax/Output_Connectors.md" "Details"
    click Stream_Processing_Core href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/bytewax/Stream_Processing_Core.md" "Details"
    click State_Management_Recovery href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/bytewax/State_Management_Recovery.md" "Details"
    click Runtime_Execution_Engine href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/bytewax/Runtime_Execution_Engine.md" "Details"
Loading

CodeBoardingDemoContact

Details

The Bytewax architecture is centered around a robust, distributed stream processing model. Users interact with the Dataflow Definition API to construct data pipelines, specifying how data flows from Input Connectors through a series of transformations within the Stream Processing Core, and finally to Output Connectors. The Runtime Execution Engine is the backbone, responsible for orchestrating the entire dataflow execution across a cluster, ensuring efficient data distribution and fault tolerance. Critical to maintaining data consistency and enabling recovery is the State Management & Recovery component, which provides persistent storage for the internal state of stream operators. This design emphasizes a clear, linear data flow from ingestion to egress, supported by a resilient execution and state management infrastructure, making it ideal for real-time data processing and analytics.

Dataflow Definition API [Expand]

The user-facing Python API for programmatically constructing and defining real-time data processing pipelines.

Related Classes/Methods:

Input Connectors [Expand]

Components responsible for ingesting raw data from diverse external sources (e.g., Kafka, files, standard input) into the Bytewax dataflow.

Related Classes/Methods:

Output Connectors [Expand]

Components managing the egress of processed data from the Bytewax dataflow to various external sinks (e.g., Kafka, files, standard output).

Related Classes/Methods:

Stream Processing Core [Expand]

The central component encompassing all stateless and stateful transformations, aggregations, and windowing operations applied to data streams.

Related Classes/Methods:

State Management & Recovery [Expand]

Provides the underlying mechanisms for operators to maintain and recover their internal state across processing steps, machine restarts, and failures, ensuring fault tolerance.

Related Classes/Methods:

Runtime Execution Engine [Expand]

The core distributed execution engine responsible for loading, distributing, and running the defined dataflows across a cluster or locally.

Related Classes/Methods: