awesome-architecture-mds/ai-ml/PGL/Training_Evaluation_Orchestrator.md at main · CodeBoarding/awesome-architecture-mds

graph LR
    Training_Workflow_Manager["Training Workflow Manager"]
    Evaluation_Workflow_Manager["Evaluation Workflow Manager"]
    Optimizer_Integrator["Optimizer Integrator"]
    Loss_Function_Applicator["Loss Function Applicator"]
    Metric_Reporter["Metric Reporter"]
    GNN_Model_Implementations["GNN Model Implementations"]
    Dataset_DataLoader["Dataset/DataLoader"]
    Distributed_Training_Coordinator["Distributed Training Coordinator"]
    Training_Workflow_Manager -- "requests data from" --> Dataset_DataLoader
    Training_Workflow_Manager -- "invokes" --> GNN_Model_Implementations
    Training_Workflow_Manager -- "utilizes" --> Loss_Function_Applicator
    Training_Workflow_Manager -- "directs" --> Optimizer_Integrator
    Training_Workflow_Manager -- "delegates to" --> Distributed_Training_Coordinator
    Training_Workflow_Manager -- "provides metrics to" --> Metric_Reporter
    Evaluation_Workflow_Manager -- "requests data from" --> Dataset_DataLoader
    Evaluation_Workflow_Manager -- "invokes" --> GNN_Model_Implementations
    Evaluation_Workflow_Manager -- "provides metrics to" --> Metric_Reporter
    GNN_Model_Implementations -- "provides predictions to" --> Loss_Function_Applicator
    Dataset_DataLoader -- "provides targets to" --> Loss_Function_Applicator
    Dataset_DataLoader -- "provides data to" --> Training_Workflow_Manager
    Dataset_DataLoader -- "provides data to" --> Evaluation_Workflow_Manager
    click GNN_Model_Implementations href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/PGL/GNN_Model_Implementations.md" "Details"

Details

The PGL (Paddle Graph Learning) subsystem is designed to facilitate the development and deployment of Graph Neural Networks. Its core functionality revolves around a modular architecture that separates concerns such as data handling, model implementation, training orchestration, and evaluation. The system provides specialized components for managing the entire lifecycle of GNN-based applications, from data loading and batching to distributed training and performance reporting. This design promotes flexibility, allowing researchers and developers to easily integrate custom GNN models, loss functions, and optimization strategies while leveraging PGL's robust data and distributed training capabilities.

Training Workflow Manager

Orchestrates the complete training lifecycle. This involves fetching data, invoking the GNN model for forward computation, calculating the loss, performing backpropagation to compute gradients, and applying the optimizer to update model parameters. It manages the training loop across epochs and mini-batches.

Related Classes/Methods:

examples/gcn/train.py

Evaluation Workflow Manager

Manages the model evaluation process. This includes loading evaluation datasets, performing inference with the trained model, and computing evaluation metrics.

Related Classes/Methods:

apps/Graph4KG/eval.py

Optimizer Integrator

Provides a bridge between the training workflow and PaddlePaddle's optimization routines, ensuring correct application of parameter updates.

Related Classes/Methods:

examples/kddcup2022/wpf_baseline/optimization.py

Loss Function Applicator

Computes the loss value based on model predictions and ground truth labels, guiding the optimization process.

Related Classes/Methods:

apps/Graph4KG/models/loss_func.py

Metric Reporter

Gathers and presents key performance indicators (e.g., accuracy, F1-score, AUC) from both training and evaluation phases.

Related Classes/Methods:

examples/kddcup2022/wpf_baseline/metrics.py

GNN Model Implementations [Expand]

Contains the core logic for various GNN models, performing forward computation and inference based on graph data.

Related Classes/Methods:

pgl/nn/conv.py

Dataset/DataLoader

Provides mini-batches of graph data for training and evaluation, abstracting data loading and batching complexities.

Related Classes/Methods:

pgl/utils/data/dataloader.py

Distributed Training Coordinator

Orchestrates the distribution of graph data, model parameters, and computations across multiple nodes or devices, leveraging PaddleFleet for scalable training.

Related Classes/Methods:

pgl/distributed/launch.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Details

Training Workflow Manager

Evaluation Workflow Manager

Optimizer Integrator

Loss Function Applicator

Metric Reporter

GNN Model Implementations [Expand]

Dataset/DataLoader

Distributed Training Coordinator

FAQ

FilesExpand file tree

Training_Evaluation_Orchestrator.md

Latest commit

History

Training_Evaluation_Orchestrator.md

File metadata and controls

Details

Training Workflow Manager

Evaluation Workflow Manager

Optimizer Integrator

Loss Function Applicator

Metric Reporter

GNN Model Implementations [Expand]

Dataset/DataLoader

Distributed Training Coordinator

FAQ