graph LR
Data_Reader["Data Reader"]
Data_Preprocessor["Data Preprocessor"]
NLP_Preprocessor["NLP Preprocessor"]
Image_Preprocessor["Image Preprocessor"]
Image_Caption_Helpers["Image Caption Helpers"]
Data_Reader -- "Data Flow" --> Data_Preprocessor
Data_Preprocessor -- "Specialized Delegation" --> NLP_Preprocessor
Image_Preprocessor -- "Data Preparation and Utility Support" --> Image_Caption_Helpers
The Data Management & Preprocessing subsystem in libra is a critical part of the Machine Learning Library, responsible for preparing diverse raw data for model consumption. It is primarily defined by the libra.preprocessing package, encompassing modules and classes dedicated to data ingestion, cleaning, transformation, and feature engineering.
This component is the initial entry point for raw data. It handles data ingestion from various sources and adapts the loading mechanism based on environmental factors (e.g., GPU availability) and the specific data type. It ensures raw data is efficiently loaded into the preprocessing pipeline.
Related Classes/Methods:
Serving as the orchestrator for structured data, this component manages general cleaning, transformation, and feature engineering tasks. It acts as a central hub for preparing tabular or general structured datasets and delegates specialized text processing to the NLP Preprocessing component.
Related Classes/Methods:
This specialized component focuses exclusively on text-specific preprocessing. It performs tasks such as slang correction, tokenization, normalization, and general text clean-up, ensuring text data is properly formatted and ready for Natural Language Processing models.
Related Classes/Methods:
This component is responsible for comprehensive preparation of image data. It handles tasks like resizing, organization, color adjustments, and format conversions, supporting various input structures (e.g., sets, CSV, single class). It ensures image data meets the quality and format requirements for image-based ML models.
Related Classes/Methods:
This utility component provides helper functions specifically designed for tasks related to image caption generation. It supports the Image Preprocessor by assisting with image loading, attention mechanism integration, and other specialized operations required for captioning workflows.
Related Classes/Methods: