graph LR
Controller["Controller"]
WorkerInfo["WorkerInfo"]
Controller_Factory["Controller Factory"]
Worker_Status_Checker["Worker Status Checker"]
Worker_Timeout_Handler["Worker Timeout Handler"]
No_Worker_Handler["No Worker Handler"]
Configuration_Loader["Configuration Loader"]
Controller -- "manages" --> WorkerInfo
Controller -- "invokes" --> Worker_Status_Checker
Controller -- "calls" --> Worker_Timeout_Handler
Controller -- "delegates to" --> No_Worker_Handler
Controller -- "utilizes" --> Configuration_Loader
Controller_Factory -- "instantiates" --> Controller
Controller_Factory -- "leverages" --> Configuration_Loader
The fastchat.serve subsystem is designed around a central Controller that orchestrates the management and interaction with distributed model workers. The Controller maintains a registry of WorkerInfo objects, representing the state and capabilities of each worker. To ensure robust operation, the Controller integrates with specialized handlers like the Worker Status Checker, Worker Timeout Handler, and No Worker Handler to manage worker health and availability. The system's initialization is streamlined through a Controller Factory, which encapsulates the setup logic, including the loading of configurations via the Configuration Loader. This architecture ensures efficient request routing, fault tolerance, and scalable management of model serving infrastructure.
The primary orchestrator managing model worker registration, deregistration, and request routing. It maintains the overall health and availability of the distributed model serving infrastructure.
Related Classes/Methods:
A data structure used by the Controller to store essential metadata for each registered model worker, including address, model capabilities, and last heartbeat.
Related Classes/Methods:
A factory function responsible for instantiating and initializing the Controller class, encapsulating setup logic.
Related Classes/Methods:
Actively queries the operational status and health of individual model workers, providing up-to-date information to the Controller.
Related Classes/Methods:
Manages scenarios where a model worker fails to respond within an expected timeframe, defining system behavior in such failure cases.
Related Classes/Methods:
Addresses situations where the Controller cannot find any suitable or available model worker to fulfill a client request.
Related Classes/Methods:
Parses and loads configuration settings, ensuring the Controller operates with the correct parameters.
Related Classes/Methods: