graph LR
API_Server_Gateway["API Server Gateway"]
Chat_Completion_Handler["Chat Completion Handler"]
Text_Completion_Handler["Text Completion Handler"]
Embeddings_Handler["Embeddings Handler"]
Remote_Service_Fetcher["Remote Service Fetcher"]
Worker_Address_Resolver["Worker Address Resolver"]
Generation_Parameter_Extractor["Generation Parameter Extractor"]
API_Error_Responder["API Error Responder"]
API_Server_Gateway -- "dispatches to" --> Chat_Completion_Handler
API_Server_Gateway -- "dispatches to" --> Text_Completion_Handler
API_Server_Gateway -- "dispatches to" --> Embeddings_Handler
Chat_Completion_Handler -- "calls" --> Generation_Parameter_Extractor
Chat_Completion_Handler -- "calls" --> Worker_Address_Resolver
Chat_Completion_Handler -- "calls" --> Remote_Service_Fetcher
Chat_Completion_Handler -- "calls" --> API_Error_Responder
Text_Completion_Handler -- "calls" --> Generation_Parameter_Extractor
Text_Completion_Handler -- "calls" --> Worker_Address_Resolver
Text_Completion_Handler -- "calls" --> Remote_Service_Fetcher
Text_Completion_Handler -- "calls" --> API_Error_Responder
Embeddings_Handler -- "calls" --> Worker_Address_Resolver
Embeddings_Handler -- "calls" --> Remote_Service_Fetcher
Embeddings_Handler -- "calls" --> API_Error_Responder
Worker_Address_Resolver -- "uses" --> Remote_Service_Fetcher
Worker_Address_Resolver -- "provides addresses to" --> Chat_Completion_Handler
Worker_Address_Resolver -- "provides addresses to" --> Text_Completion_Handler
Worker_Address_Resolver -- "provides addresses to" --> Embeddings_Handler
The API Server (OpenAI Compatible) subsystem acts as the primary interface for external applications, translating OpenAI API requests into internal FastChat operations. It is primarily defined within fastchat/serve/openai_api_server.py and utilizes data structures from fastchat/protocol/openai_api_protocol.py.
The main entry point for all incoming OpenAI-compatible API requests (e.g., chat completions, text completions, embeddings). It dispatches requests to the appropriate internal handlers.
Related Classes/Methods:
Processes requests specifically for chat-based text generation. It handles parameter parsing, validates input, discovers available model workers, and orchestrates the generation request to a worker.
Related Classes/Methods:
Manages requests for traditional text completion. Its workflow mirrors the chat completion handler, focusing on simpler, single-turn prompts.
Related Classes/Methods:
Processes requests to generate numerical embeddings from input text. It handles model selection, worker discovery, and dispatches the embedding request.
Related Classes/Methods:
Provides a generic mechanism for asynchronous communication with other FastChat services (e.g., Controller, Model Workers). It abstracts the underlying network communication details.
Related Classes/Methods:
Queries the FastChat Controller service to obtain the network address of an available model worker that can handle a specific model.
Related Classes/Methods:
Extracts and validates generation-specific parameters (e.g., temperature, max tokens) from incoming API requests, ensuring they conform to expected types and constraints.
Related Classes/Methods:
Generates consistent and standardized error responses for API clients, ensuring a uniform error format across the API.
Related Classes/Methods: