graph LR
DataReader_Component["DataReader Component"]
Data_Preprocessor_Component["Data Preprocessor Component"]
Main_Data_Loading_Orchestrator["Main Data Loading Orchestrator"]
Main_Data_Loading_Orchestrator -- "delegates data loading to" --> DataReader_Component
DataReader_Component -- "provides raw data and feature dictionary to" --> Main_Data_Loading_Orchestrator
Main_Data_Loading_Orchestrator -- "passes raw data to" --> Data_Preprocessor_Component
Data_Preprocessor_Component -- "returns processed data to" --> Main_Data_Loading_Orchestrator
The Data Ingestion & Preprocessing subsystem primarily encompasses functionality defined within example/DataReader.py and relevant data loading and preprocessing logic within example/main.py. These files contain the core components responsible for preparing raw data for the DeepFM model.
This component is responsible for the initial loading of raw data from its source. Its key function is to initialize the data reading process and, crucially, to generate a feature dictionary (gen_feat_dict). This dictionary defines the structure, types, and mappings of features, which is essential for the DeepFM model to correctly interpret the input data.
Related Classes/Methods:
This component takes the raw data, potentially along with the feature dictionary, and performs necessary transformations to prepare it for model consumption. This includes tasks such as normalization, encoding categorical features, or other forms of feature engineering, ensuring the data is in a clean and structured format suitable for training and inference.
Related Classes/Methods:
This component acts as the high-level controller for the entire data ingestion and preprocessing workflow. It coordinates the sequence of operations, ensuring that data is first loaded by the DataReader and then passed through the Data Preprocessor. It manages the flow of data through the pipeline, ensuring data readiness for both training and inference.
Related Classes/Methods: