Skip to content

Latest commit

 

History

History
90 lines (49 loc) · 4.64 KB

File metadata and controls

90 lines (49 loc) · 4.64 KB
graph LR
    SparkModel["SparkModel"]
    Distributed_Training_Orchestrator["Distributed Training Orchestrator"]
    Distributed_Prediction_Orchestrator["Distributed Prediction Orchestrator"]
    Distributed_Evaluation_Orchestrator["Distributed Evaluation Orchestrator"]
    Parameter_Server_Startup["Parameter Server Startup"]
    Parameter_Server_Shutdown["Parameter Server Shutdown"]
    Master_Network_Communication["Master Network Communication"]
    SparkModel -- "delegates to" --> Distributed_Training_Orchestrator
    SparkModel -- "delegates to" --> Distributed_Prediction_Orchestrator
    SparkModel -- "delegates to" --> Distributed_Evaluation_Orchestrator
    Distributed_Training_Orchestrator -- "initiates" --> Parameter_Server_Startup
    Distributed_Training_Orchestrator -- "calls" --> Parameter_Server_Shutdown
    Distributed_Prediction_Orchestrator -- "utilizes" --> Master_Network_Communication
    Distributed_Evaluation_Orchestrator -- "utilizes" --> Master_Network_Communication
Loading

CodeBoardingDemoContact

Details

The Spark Driver Orchestrator subsystem in Elephas is responsible for managing the overall distributed deep learning workflow from the Spark driver. This includes orchestrating data distribution, broadcasting models, and managing the lifecycle of the Parameter Server, which is crucial for parameter synchronization during distributed training.

SparkModel

The primary user-facing API and entry point for integrating Keras models with Spark. It encapsulates the Keras model and provides high-level methods for distributed training, prediction, and evaluation.

Related Classes/Methods:

Distributed Training Orchestrator

The core orchestrator for the distributed model training workflow. It manages the entire training lifecycle from the Spark driver, including data partitioning, model distribution, and gradient aggregation.

Related Classes/Methods:

Distributed Prediction Orchestrator

The internal method responsible for orchestrating distributed predictions across the Spark cluster, applying the trained model to new data in parallel.

Related Classes/Methods:

Distributed Evaluation Orchestrator

The internal method responsible for orchestrating distributed model evaluation on the Spark cluster, assessing model performance on distributed datasets.

Related Classes/Methods:

Parameter Server Startup

Manages the startup of the Parameter Server on the Spark driver. This server is critical for sharing and synchronizing model parameters during distributed training.

Related Classes/Methods:

Parameter Server Shutdown

Manages the graceful shutdown of the Parameter Server process on the Spark driver, ensuring proper resource release.

Related Classes/Methods:

Master Network Communication

A mechanism or component within SparkModel that facilitates distributed predictions and evaluations by providing model access or communication channels to worker nodes.

Related Classes/Methods: