Skip to content

Latest commit

 

History

History
102 lines (56 loc) · 5.02 KB

File metadata and controls

102 lines (56 loc) · 5.02 KB
graph LR
    Column_Operations["Column Operations"]
    Row_Operations["Row Operations"]
    Data_Profiling["Data Profiling"]
    Type_Inference["Type Inference"]
    Core_Functions["Core Functions"]
    Masking_Logic["Masking Logic"]
    String_Clustering["String Clustering"]
    Outlier_Detection["Outlier Detection"]
    Column_Operations -- "delegates to" --> Core_Functions
    Column_Operations -- "calls" --> Masking_Logic
    Column_Operations -- "uses" --> Type_Inference
    Row_Operations -- "calls" --> Masking_Logic
    Row_Operations -- "calls" --> Core_Functions
    Data_Profiling -- "consumes from" --> Column_Operations
    Data_Profiling -- "consumes from" --> Row_Operations
    Core_Functions -- "utilizes" --> Type_Inference
    String_Clustering -- "interacts with" --> Column_Operations
Loading

CodeBoardingDemoContact

Details

The Data Processing & Analysis subsystem in optimus is responsible for comprehensive data cleaning, transformation, feature engineering, profiling, and quality checks. It forms the core of the library's data manipulation capabilities, adhering to the Data Flow and Engine Agnosticism architectural biases.

Column Operations

Acts as the primary interface for column-wise data transformations, cleaning, and feature engineering. This is a core component for comprehensive column-level data manipulation, including type conversions, string operations, date/time parsing, and phonetic algorithms.

Related Classes/Methods:

Row Operations

Manages row-wise data manipulation, including filtering, sorting, and counting. Essential for data subsetting and reordering.

Related Classes/Methods:

Data Profiling

Central for generating comprehensive data profiles, providing insights into data quality and characteristics. It calculates and merges data profiles, offering statistical summaries and metadata.

Related Classes/Methods:

Type Inference

Crucial for automatic data type inference and pattern detection, which underpins many data quality and transformation steps. It determines data types and identifies patterns within data, supporting data validation and automated processing.

Related Classes/Methods:

Core Functions

Serves as a foundational utility layer, offering atomic data processing functions consumed by other components. Provides a wide range of fundamental data processing utilities (e.g., type conversions, statistical calculations, string processing, date/time extraction).

Related Classes/Methods:

Masking Logic

Provides the core logic for creating boolean masks, fundamental for filtering and conditional operations across columns and rows. It offers the underlying logic for creating masks based on various conditions (e.g., missing values, nulls, duplicates, regex matches).

Related Classes/Methods:

String Clustering

Focuses on data standardization and cleaning by grouping similar strings, implementing algorithms for data cleaning and standardization.

Related Classes/Methods:

Outlier Detection

Provides abstract and concrete implementations for outlier detection methods (e.g., MAD, Tukey's fences), offering common methods for identifying anomalous data points.

Related Classes/Methods: