Skip to content

Latest commit

 

History

History
106 lines (60 loc) · 5.98 KB

File metadata and controls

106 lines (60 loc) · 5.98 KB
graph LR
    Data_Ingestion_Subsystem["Data Ingestion Subsystem"]
    Connection_Manager["Connection Manager"]
    Data_Loader["Data Loader"]
    S3_Connector["S3 Connector"]
    Local_Connector["Local Connector"]
    CSV_Loader["CSV Loader"]
    JSON_Loader["JSON Loader"]
    DataFrame_Converter["DataFrame Converter"]
    Data_Ingestion_Subsystem -- "orchestrates" --> Connection_Manager
    Data_Ingestion_Subsystem -- "orchestrates" --> Data_Loader
    Connection_Manager -- "delegates to" --> S3_Connector
    Connection_Manager -- "delegates to" --> Local_Connector
    Connection_Manager -- "provides connection details to" --> Data_Loader
    Data_Loader -- "utilizes connection from" --> Connection_Manager
    Data_Loader -- "delegates parsing to" --> CSV_Loader
    CSV_Loader -- "returns parsed data to" --> Data_Loader
    Data_Loader -- "delegates parsing to" --> JSON_Loader
    JSON_Loader -- "returns parsed data to" --> Data_Loader
    Data_Loader -- "sends raw data to" --> DataFrame_Converter
    DataFrame_Converter -- "returns Optimus DataFrame to" --> Data_Loader
Loading

CodeBoardingDemoContact

Details

The Data Ingestion Subsystem serves as the primary entry point for all data loading and connection operations within Optimus. It orchestrates both the Connection Manager and the Data Loader. The Connection Manager is responsible for establishing connections to various data sources, delegating to specific connectors like S3 Connector and Local Connector, and subsequently providing connection details to the Data Loader. The Data Loader then utilizes these connection details to delegate the parsing of diverse file formats to specialized components such as the CSV Loader and JSON Loader. Once data is parsed, these format-specific loaders return the processed data back to the Data Loader, which then sends this raw data to the DataFrame Converter for transformation into a standardized Optimus DataFrame. This structured flow ensures consistent data ingestion regardless of the source or format.

Data Ingestion Subsystem

The overarching module responsible for providing a unified interface for all data connection and loading operations within Optimus. It acts as the entry point for external modules requiring data.

Related Classes/Methods:

Connection Manager

This component serves as a factory and orchestrator for establishing connections to various external data sources, such as S3, local file systems, HDFS, GCS, and MAS. It abstracts the complexities of different storage systems, providing a consistent interface.

Related Classes/Methods:

Data Loader

This component provides a unified, high-level API for loading data from diverse file formats (e.g., CSV, JSON, Parquet) into Optimus DataFrames. It orchestrates the parsing and conversion process, delegating to format-specific implementations.

Related Classes/Methods:

S3 Connector

A specific adapter responsible for handling the logic of connecting to and interacting with Amazon S3 storage, including authentication and data retrieval.

Related Classes/Methods:

Local Connector

Manages connections and operations on the local file system, providing a standardized way to access files stored locally.

Related Classes/Methods:

CSV Loader

Contains the specific implementation details for reading and parsing data from CSV formatted files, handling various CSV specific options.

Related Classes/Methods:

JSON Loader

Contains the specific implementation details for reading and parsing data from JSON formatted files, including handling different JSON structures.

Related Classes/Methods:

DataFrame Converter

A utility method responsible for converting raw loaded data (from various formats) into the standardized Optimus DataFrame structure, ensuring consistency across different data sources and formats for subsequent processing.

Related Classes/Methods: