Skip to content

Latest commit

 

History

History
129 lines (73 loc) · 7.81 KB

File metadata and controls

129 lines (73 loc) · 7.81 KB
graph LR
    train_openfold_py["train_openfold.py"]
    run_pretrained_openfold_py["run_pretrained_openfold.py"]
    openfold_train_openfold_OpenFoldWrapper["openfold.train_openfold.OpenFoldWrapper"]
    openfold_data_data_modules_OpenFoldDataModule["openfold.data.data_modules.OpenFoldDataModule"]
    openfold_utils_loss_AlphaFoldLoss["openfold.utils.loss.AlphaFoldLoss"]
    openfold_config["openfold.config"]
    openfold_data_data_pipeline_AlignmentRunner["openfold.data.data_pipeline.AlignmentRunner"]
    openfold_data_feature_pipeline_FeaturePipeline["openfold.data.feature_pipeline.FeaturePipeline"]
    openfold_model_model_AlphaFold["openfold.model.model.AlphaFold"]
    openfold_utils_callbacks["openfold.utils.callbacks"]
    train_openfold_py -- "Orchestrates" --> openfold_train_openfold_OpenFoldWrapper
    train_openfold_py -- "Configures" --> openfold_data_data_modules_OpenFoldDataModule
    train_openfold_py -- "Utilizes" --> openfold_utils_loss_AlphaFoldLoss
    train_openfold_py -- "Integrates" --> openfold_utils_callbacks
    run_pretrained_openfold_py -- "Orchestrates" --> openfold_model_model_AlphaFold
    run_pretrained_openfold_py -- "Utilizes" --> openfold_data_data_pipeline_AlignmentRunner
    run_pretrained_openfold_py -- "Utilizes" --> openfold_data_feature_pipeline_FeaturePipeline
    openfold_train_openfold_OpenFoldWrapper -- "Encapsulates" --> openfold_model_model_AlphaFold
    openfold_train_openfold_OpenFoldWrapper -- "Uses" --> openfold_utils_loss_AlphaFoldLoss
    openfold_data_data_modules_OpenFoldDataModule -- "Uses" --> openfold_data_feature_pipeline_FeaturePipeline
    openfold_config -- "Configures" --> openfold_train_openfold_OpenFoldWrapper
    openfold_config -- "Configures" --> openfold_data_data_modules_OpenFoldDataModule
    openfold_config -- "Configures" --> openfold_model_model_AlphaFold
    openfold_config -- "Configures" --> openfold_data_data_pipeline_AlignmentRunner
    openfold_data_data_pipeline_AlignmentRunner -- "Produces data consumed by" --> openfold_data_feature_pipeline_FeaturePipeline
    openfold_data_feature_pipeline_FeaturePipeline -- "Produces input for" --> openfold_model_model_AlphaFold
Loading

CodeBoardingDemoContact

Details

The Training & Inference Orchestration subsystem in OpenFold is responsible for managing the entire lifecycle of protein structure prediction, from model training to inference. It provides the main entry points and control flow, integrating with PyTorch Lightning for efficient execution and resource management.

train_openfold.py

This script serves as the primary entry point for initiating and managing the training process. It sets up the PyTorch Lightning Trainer, configures the OpenFoldWrapper, OpenFoldDataModule, loss functions, learning rate schedulers, and various callbacks for monitoring and saving the training progress.

Related Classes/Methods:

run_pretrained_openfold.py

This script is the main entry point for executing the inference pipeline. It orchestrates the entire prediction workflow, including parsing command-line arguments, loading model configurations and weights, precomputing alignments, generating input features, running the AlphaFold Model, and performing post-processing steps like Amber relaxation.

Related Classes/Methods:

openfold.train_openfold.OpenFoldWrapper

This is the core PyTorch Lightning module that encapsulates the AlphaFold Model, defines the forward pass, computes the loss, and manages the training and validation steps. It handles the integration with PyTorch Lightning's training loop, including Exponential Moving Average (EMA) updates and metric logging.

Related Classes/Methods:

openfold.data.data_modules.OpenFoldDataModule

This PyTorch Lightning DataModule handles the loading, preprocessing, and batching of data specifically for training and validation. It integrates with the DataPipeline and FeaturePipeline to prepare the input features for the model.

Related Classes/Methods:

openfold.utils.loss.AlphaFoldLoss

This class defines the composite loss function used during the training of the AlphaFold Model. It combines various individual loss terms (e.g., FAPE, distogram, masked MSA loss) to guide the model's learning.

Related Classes/Methods:

openfold.config

This centralized module defines all hyperparameters, model architectures, data pipeline settings, and training/inference parameters. It ensures reproducibility and flexibility in experimentation by providing a single source of truth for configuration.

Related Classes/Methods:

openfold.data.data_pipeline.AlignmentRunner

Utilized by run_pretrained_openfold.py, this class is responsible for generating Multiple Sequence Alignments (MSAs) and identifying structural templates using external bioinformatics tools.

Related Classes/Methods:

openfold.data.feature_pipeline.FeaturePipeline

This component transforms the raw biological data and generated alignments into the numerical feature dictionaries (tensors) that the AlphaFold Model can directly consume. It is crucial for both training and inference data preparation.

Related Classes/Methods:

openfold.model.model.AlphaFold

The core deep learning model responsible for predicting protein structures. It is the central computational engine for both training and inference.

Related Classes/Methods:

openfold.utils.callbacks

This module provides a collection of PyTorch Lightning callbacks, such as EarlyStoppingVerbose and ModelCheckpoint, which are crucial for monitoring training progress, saving model checkpoints, and preventing overfitting.

Related Classes/Methods: