graph LR
SparkModel["SparkModel"]
Distributed_Training_Orchestrator["Distributed Training Orchestrator"]
Distributed_Prediction_Orchestrator["Distributed Prediction Orchestrator"]
Distributed_Evaluation_Orchestrator["Distributed Evaluation Orchestrator"]
Parameter_Server_Startup["Parameter Server Startup"]
Parameter_Server_Shutdown["Parameter Server Shutdown"]
Master_Network_Communication["Master Network Communication"]
SparkModel -- "delegates to" --> Distributed_Training_Orchestrator
SparkModel -- "delegates to" --> Distributed_Prediction_Orchestrator
SparkModel -- "delegates to" --> Distributed_Evaluation_Orchestrator
Distributed_Training_Orchestrator -- "initiates" --> Parameter_Server_Startup
Distributed_Training_Orchestrator -- "calls" --> Parameter_Server_Shutdown
Distributed_Prediction_Orchestrator -- "utilizes" --> Master_Network_Communication
Distributed_Evaluation_Orchestrator -- "utilizes" --> Master_Network_Communication
The Spark Driver Orchestrator subsystem in Elephas is responsible for managing the overall distributed deep learning workflow from the Spark driver. This includes orchestrating data distribution, broadcasting models, and managing the lifecycle of the Parameter Server, which is crucial for parameter synchronization during distributed training.
The primary user-facing API and entry point for integrating Keras models with Spark. It encapsulates the Keras model and provides high-level methods for distributed training, prediction, and evaluation.
Related Classes/Methods:
The core orchestrator for the distributed model training workflow. It manages the entire training lifecycle from the Spark driver, including data partitioning, model distribution, and gradient aggregation.
Related Classes/Methods:
The internal method responsible for orchestrating distributed predictions across the Spark cluster, applying the trained model to new data in parallel.
Related Classes/Methods:
The internal method responsible for orchestrating distributed model evaluation on the Spark cluster, assessing model performance on distributed datasets.
Related Classes/Methods:
Manages the startup of the Parameter Server on the Spark driver. This server is critical for sharing and synchronizing model parameters during distributed training.
Related Classes/Methods:
Manages the graceful shutdown of the Parameter Server process on the Spark driver, ensuring proper resource release.
Related Classes/Methods:
A mechanism or component within SparkModel that facilitates distributed predictions and evaluations by providing model access or communication channels to worker nodes.
Related Classes/Methods: