graph LR
File_I_O_Utilities["File I/O Utilities"]
Partition_Specification_and_Management["Partition Specification and Management"]
Column_Expression_Building["Column Expression Building"]
SQL_Column_Expression_Generation["SQL Column Expression Generation"]
Partition_Specification_and_Management -- "relies on" --> Column_Expression_Building
Column_Expression_Building -- "is foundational for" --> SQL_Column_Expression_Generation
The Data I/O & Schema Management subsystem in Fugue is responsible for handling external data interactions and defining the structure and transformations of data within the framework. It acts as a foundational layer for ensuring data consistency and optimizing processing through schema definition and partitioning.
Provides fundamental utilities for reading and writing data to and from various file formats (e.g., Parquet, CSV, JSON) and managing file paths, including glob patterns and local/remote file system interactions. This component directly addresses the "Data I/O" aspect of the subsystem.
Related Classes/Methods:
Defines and manages how data is partitioned and sorted for processing, encapsulating partition keys, presort specifications, and the number of partitions. This is a critical aspect for optimizing data processing in a data framework and falls under "Schema Management" as it dictates data organization.
Related Classes/Methods:
Offers an abstract, programmatic interface for building column-level expressions, including literals, named columns, function calls, and various arithmetic and boolean operations. This forms the basis for defining schema transformations and computations, making it a core part of the "Schema Management" system.
Related Classes/Methods:
Translates the abstract column expressions defined in fugue.column.expressions into SQL-compatible strings, handling SQL-specific constructs such as aggregation functions and clauses. This is a specialized part of the "Schema Management" system focused on SQL integration.
Related Classes/Methods: