graph LR
Column_Operations["Column Operations"]
Row_Operations["Row Operations"]
Data_Profiling["Data Profiling"]
Type_Inference["Type Inference"]
Core_Functions["Core Functions"]
Masking_Logic["Masking Logic"]
String_Clustering["String Clustering"]
Outlier_Detection["Outlier Detection"]
Column_Operations -- "delegates to" --> Core_Functions
Column_Operations -- "calls" --> Masking_Logic
Column_Operations -- "uses" --> Type_Inference
Row_Operations -- "calls" --> Masking_Logic
Row_Operations -- "calls" --> Core_Functions
Data_Profiling -- "consumes from" --> Column_Operations
Data_Profiling -- "consumes from" --> Row_Operations
Core_Functions -- "utilizes" --> Type_Inference
String_Clustering -- "interacts with" --> Column_Operations
The Data Processing & Analysis subsystem in optimus is responsible for comprehensive data cleaning, transformation, feature engineering, profiling, and quality checks. It forms the core of the library's data manipulation capabilities, adhering to the Data Flow and Engine Agnosticism architectural biases.
Acts as the primary interface for column-wise data transformations, cleaning, and feature engineering. This is a core component for comprehensive column-level data manipulation, including type conversions, string operations, date/time parsing, and phonetic algorithms.
Related Classes/Methods:
Manages row-wise data manipulation, including filtering, sorting, and counting. Essential for data subsetting and reordering.
Related Classes/Methods:
Central for generating comprehensive data profiles, providing insights into data quality and characteristics. It calculates and merges data profiles, offering statistical summaries and metadata.
Related Classes/Methods:
Crucial for automatic data type inference and pattern detection, which underpins many data quality and transformation steps. It determines data types and identifies patterns within data, supporting data validation and automated processing.
Related Classes/Methods:
Serves as a foundational utility layer, offering atomic data processing functions consumed by other components. Provides a wide range of fundamental data processing utilities (e.g., type conversions, statistical calculations, string processing, date/time extraction).
Related Classes/Methods:
Provides the core logic for creating boolean masks, fundamental for filtering and conditional operations across columns and rows. It offers the underlying logic for creating masks based on various conditions (e.g., missing values, nulls, duplicates, regex matches).
Related Classes/Methods:
Focuses on data standardization and cleaning by grouping similar strings, implementing algorithms for data cleaning and standardization.
Related Classes/Methods:
Provides abstract and concrete implementations for outlier detection methods (e.g., MAD, Tukey's fences), offering common methods for identifying anomalous data points.
Related Classes/Methods: