Skip to content

Latest commit

 

History

History
121 lines (67 loc) · 8.8 KB

File metadata and controls

121 lines (67 loc) · 8.8 KB
graph LR
    mole_cli_mole_train_train["mole.cli.mole_train:train"]
    pytorch_lightning_Trainer["pytorch_lightning.Trainer"]
    mole_training_data_data_modules_MolDataModule["mole.training.data.data_modules.MolDataModule"]
    mole_training_models_base_Model["mole.training.models.base.Model"]
    mole_training_models_mole_MolE["mole.training.models.mole.MolE"]
    mole_training_models_encoder_Encoder["mole.training.models.encoder.Encoder"]
    mole_training_nn_bert["mole.training.nn.bert"]
    mole_training_utils_metrics["mole.training.utils.metrics"]
    mole_training_utils_schedulers["mole.training.utils.schedulers"]
    mole_training_models_utils_masks["mole.training.models.utils.masks"]
    mole_cli_mole_train_train -- "orchestrates" --> pytorch_lightning_Trainer
    mole_cli_mole_train_train -- "instantiates and configures" --> mole_training_data_data_modules_MolDataModule
    mole_cli_mole_train_train -- "instantiates and configures" --> mole_training_models_base_Model
    pytorch_lightning_Trainer -- "interacts with" --> mole_training_models_base_Model
    pytorch_lightning_Trainer -- "consumes data from" --> mole_training_data_data_modules_MolDataModule
    mole_training_models_base_Model -- "utilizes" --> mole_training_utils_metrics
    mole_training_models_base_Model -- "utilizes" --> mole_training_models_utils_masks
    mole_training_models_mole_MolE -- "inherits from" --> mole_training_models_base_Model
    mole_training_models_mole_MolE -- "incorporates components from" --> mole_training_models_encoder_Encoder
    mole_training_models_mole_MolE -- "incorporates components from" --> mole_training_nn_bert
    mole_cli_mole_train_train -- "uses" --> mole_training_utils_schedulers
Loading

CodeBoardingDemoContact

Details

The Training & Evaluation Manager subsystem is the core of the mole project's deep learning capabilities, embodying a Pipeline Architecture and Configuration-Driven Design. It orchestrates the entire lifecycle of model training and evaluation, from data preparation to model deployment.

mole.cli.mole_train:train

This function serves as the primary entry point and orchestrator for the entire training and evaluation pipeline. It is responsible for reading the configuration (likely via Hydra), instantiating the PyTorch Lightning Trainer, the MolDataModule, and the specific Model to be trained. It also handles crucial setup steps like seeding for reproducibility, preparing data, calculating learning rate schedules, logging hyperparameters, and finally initiating the training process via trainer.fit(). It's fundamental as the high-level control flow manager.

Related Classes/Methods:

pytorch_lightning.Trainer

This is the central execution engine provided by PyTorch Lightning. It abstracts away the complexities of the training loop, handling forward and backward passes, optimization, logging, checkpointing, and distributed training. It interacts directly with the Model and DataModule to execute the training and validation steps. It's fundamental because it provides the robust and scalable infrastructure for deep learning training.

Related Classes/Methods: None

mole.training.data.data_modules.MolDataModule

This component is responsible for managing the entire data pipeline, including loading, preprocessing, and batching of molecular data. It provides DataLoader instances for training, validation, and testing phases to the Trainer. It also encapsulates data preparation logic, such as downloading datasets if necessary. It's fundamental for implementing a Data-Centric Architecture, ensuring efficient and consistent data flow.

Related Classes/Methods:

mole.training.models.base.Model

This serves as the abstract base class for all deep learning models within the mole.training framework. It defines a consistent interface for common model operations, including training_step, validation_step, test_step, and metric logging. Concrete model implementations inherit from this class, ensuring architectural consistency and extensibility. It's fundamental for defining the "Model" aspect of the conceptual MVC pattern.

Related Classes/Methods:

mole.training.models.mole.MolE

This is a concrete implementation of a molecular deep learning model, inheriting from mole.training.models.base.Model. It encapsulates the specific neural network architecture and forward pass logic for molecular embeddings and predictions. It represents the actual deep learning model being trained and evaluated. It's fundamental as the specific "subject" of the training process.

Related Classes/Methods:

mole.training.models.encoder.Encoder

This component likely serves as a base or abstract class for various encoder architectures used within the MolE model or other molecular models. Encoders are crucial for transforming raw molecular data into meaningful numerical representations that the neural network can process. It's fundamental as a reusable building block for complex model architectures.

Related Classes/Methods:

mole.training.nn.bert

This module contains various BERT-like neural network components (e.g., BertAttention, BertEmbeddings, BertEncoder, TaskPredictionHead). These components are fundamental building blocks for constructing advanced molecular models, especially those leveraging transformer-like architectures for sequence or graph data. They provide the core neural network layers for learning complex representations.

Related Classes/Methods:

mole.training.utils.metrics

This utility module provides functions and classes for calculating, aggregating, and reporting various evaluation metrics (e.g., regression metrics, classification metrics). It's crucial for monitoring model performance during training and evaluation, enabling informed decisions about model optimization. It's fundamental for the "Evaluation" aspect of the pipeline.

Related Classes/Methods:

mole.training.utils.schedulers

This utility module contains helper functions for managing learning rate schedules, such as calculating the number of warmup steps based on the total training steps. Learning rate scheduling is a critical optimization technique for stable and efficient model training. It's fundamental for fine-tuning the training process.

Related Classes/Methods:

mole.training.models.utils.masks

This utility module provides functions for creating or manipulating masks. Masks are often used in attention mechanisms within molecular models to handle variable-length inputs, pad sequences, or focus on specific parts of a molecular graph. It's fundamental for correctly processing and interpreting complex molecular data within the neural network.

Related Classes/Methods: