graph LR
Data_Ingestion_Subsystem["Data Ingestion Subsystem"]
Connection_Manager["Connection Manager"]
Data_Loader["Data Loader"]
S3_Connector["S3 Connector"]
Local_Connector["Local Connector"]
CSV_Loader["CSV Loader"]
JSON_Loader["JSON Loader"]
DataFrame_Converter["DataFrame Converter"]
Data_Ingestion_Subsystem -- "orchestrates" --> Connection_Manager
Data_Ingestion_Subsystem -- "orchestrates" --> Data_Loader
Connection_Manager -- "delegates to" --> S3_Connector
Connection_Manager -- "delegates to" --> Local_Connector
Connection_Manager -- "provides connection details to" --> Data_Loader
Data_Loader -- "utilizes connection from" --> Connection_Manager
Data_Loader -- "delegates parsing to" --> CSV_Loader
CSV_Loader -- "returns parsed data to" --> Data_Loader
Data_Loader -- "delegates parsing to" --> JSON_Loader
JSON_Loader -- "returns parsed data to" --> Data_Loader
Data_Loader -- "sends raw data to" --> DataFrame_Converter
DataFrame_Converter -- "returns Optimus DataFrame to" --> Data_Loader
The Data Ingestion Subsystem serves as the primary entry point for all data loading and connection operations within Optimus. It orchestrates both the Connection Manager and the Data Loader. The Connection Manager is responsible for establishing connections to various data sources, delegating to specific connectors like S3 Connector and Local Connector, and subsequently providing connection details to the Data Loader. The Data Loader then utilizes these connection details to delegate the parsing of diverse file formats to specialized components such as the CSV Loader and JSON Loader. Once data is parsed, these format-specific loaders return the processed data back to the Data Loader, which then sends this raw data to the DataFrame Converter for transformation into a standardized Optimus DataFrame. This structured flow ensures consistent data ingestion regardless of the source or format.
The overarching module responsible for providing a unified interface for all data connection and loading operations within Optimus. It acts as the entry point for external modules requiring data.
Related Classes/Methods:
This component serves as a factory and orchestrator for establishing connections to various external data sources, such as S3, local file systems, HDFS, GCS, and MAS. It abstracts the complexities of different storage systems, providing a consistent interface.
Related Classes/Methods:
This component provides a unified, high-level API for loading data from diverse file formats (e.g., CSV, JSON, Parquet) into Optimus DataFrames. It orchestrates the parsing and conversion process, delegating to format-specific implementations.
Related Classes/Methods:
A specific adapter responsible for handling the logic of connecting to and interacting with Amazon S3 storage, including authentication and data retrieval.
Related Classes/Methods:
Manages connections and operations on the local file system, providing a standardized way to access files stored locally.
Related Classes/Methods:
Contains the specific implementation details for reading and parsing data from CSV formatted files, handling various CSV specific options.
Related Classes/Methods:
Contains the specific implementation details for reading and parsing data from JSON formatted files, including handling different JSON structures.
Related Classes/Methods:
A utility method responsible for converting raw loaded data (from various formats) into the standardized Optimus DataFrame structure, ensuring consistency across different data sources and formats for subsequent processing.
Related Classes/Methods: