graph LR
Data_Loader["Data Loader"]
Data_Validator["Data Validator"]
Data_Formatter_Cache_Manager["Data Formatter & Cache Manager"]
Conditional_Independence_Test_Orchestrator_CIT_["Conditional Independence Test Orchestrator (CIT)"]
Data_Loader -- "provides raw data to" --> Data_Validator
Data_Validator -- "passes validated data to" --> Data_Formatter_Cache_Manager
Data_Formatter_Cache_Manager -- "provides formatted/cached data to" --> Conditional_Independence_Test_Orchestrator_CIT_
Conditional_Independence_Test_Orchestrator_CIT_ -- "calls" --> Data_Validator
Conditional_Independence_Test_Orchestrator_CIT_ -- "calls" --> Data_Formatter_Cache_Manager
Data_Validator -- "provides validation services to" --> Conditional_Independence_Test_Orchestrator_CIT_
The Data Management subsystem is primarily defined by the functionalities within causallearn.utils.Dataset and causallearn.utils.cit. It serves as the initial data ingress and preparation layer for all causal discovery algorithms within the causal-learn project.
Responsible for fetching raw datasets from external sources (e.g., URLs) and parsing them into fundamental data structures like NumPy arrays and lists of labels. It is the initial entry point for external data into the system.
Related Classes/Methods:
Performs comprehensive structural and content validation on the raw input data. This includes crucial checks for dimensions, data types, and handling of missing values, ensuring the data adheres to the strict requirements of downstream causal discovery algorithms.
Related Classes/Methods:
Transforms the validated input data into a standardized format (e.g., X, Y, Z for conditional independence tests) suitable for direct consumption by algorithms. It also generates a unique cache key for the formatted data to optimize performance by avoiding redundant computations.
Related Classes/Methods:
Orchestrates the execution of conditional independence tests, which are fundamental building blocks for many causal discovery algorithms. While not strictly a data management component, it acts as the primary consumer and orchestrator of the prepared data, ensuring it is validated and formatted correctly before initiating the tests. It bridges the data preparation layer with the core algorithmic layer.
Related Classes/Methods: