Skip to content

Latest commit

 

History

History
66 lines (35 loc) · 3.91 KB

File metadata and controls

66 lines (35 loc) · 3.91 KB
graph LR
    Data_Reader["Data Reader"]
    Data_Preprocessor["Data Preprocessor"]
    NLP_Preprocessor["NLP Preprocessor"]
    Image_Preprocessor["Image Preprocessor"]
    Image_Caption_Helpers["Image Caption Helpers"]
    Data_Reader -- "Data Flow" --> Data_Preprocessor
    Data_Preprocessor -- "Specialized Delegation" --> NLP_Preprocessor
    Image_Preprocessor -- "Data Preparation and Utility Support" --> Image_Caption_Helpers
Loading

CodeBoardingDemoContact

Details

The Data Management & Preprocessing subsystem in libra is a critical part of the Machine Learning Library, responsible for preparing diverse raw data for model consumption. It is primarily defined by the libra.preprocessing package, encompassing modules and classes dedicated to data ingestion, cleaning, transformation, and feature engineering.

Data Reader

This component is the initial entry point for raw data. It handles data ingestion from various sources and adapts the loading mechanism based on environmental factors (e.g., GPU availability) and the specific data type. It ensures raw data is efficiently loaded into the preprocessing pipeline.

Related Classes/Methods:

Data Preprocessor

Serving as the orchestrator for structured data, this component manages general cleaning, transformation, and feature engineering tasks. It acts as a central hub for preparing tabular or general structured datasets and delegates specialized text processing to the NLP Preprocessing component.

Related Classes/Methods:

NLP Preprocessor

This specialized component focuses exclusively on text-specific preprocessing. It performs tasks such as slang correction, tokenization, normalization, and general text clean-up, ensuring text data is properly formatted and ready for Natural Language Processing models.

Related Classes/Methods:

Image Preprocessor

This component is responsible for comprehensive preparation of image data. It handles tasks like resizing, organization, color adjustments, and format conversions, supporting various input structures (e.g., sets, CSV, single class). It ensures image data meets the quality and format requirements for image-based ML models.

Related Classes/Methods:

Image Caption Helpers

This utility component provides helper functions specifically designed for tasks related to image caption generation. It supports the Image Preprocessor by assisting with image loading, attention mechanism integration, and other specialized operations required for captioning workflows.

Related Classes/Methods: