graph LR
Evaluation_Pipeline_Core_Orchestrator_["Evaluation Pipeline (Core Orchestrator)"]
Model_Initialization["Model Initialization"]
Task_Request_Preparation["Task & Request Preparation"]
Evaluation_Execution_Engine["Evaluation Execution Engine"]
Model_Inference_Handler["Model Inference Handler"]
Metric_Computation["Metric Computation"]
Result_Persistence["Result Persistence"]
Result_Access_Display["Result Access & Display"]
Evaluation_Pipeline_Core_Orchestrator_ -- "initializes" --> Model_Initialization
Evaluation_Pipeline_Core_Orchestrator_ -- "initializes" --> Task_Request_Preparation
Evaluation_Pipeline_Core_Orchestrator_ -- "orchestrates" --> Evaluation_Execution_Engine
Evaluation_Execution_Engine -- "executes" --> Model_Inference_Handler
Evaluation_Execution_Engine -- "triggers" --> Metric_Computation
Evaluation_Execution_Engine -- "provides results to" --> Result_Persistence
Evaluation_Execution_Engine -- "provides results to" --> Result_Access_Display
The Evaluation Pipeline Core subsystem is the central orchestrator of the end-to-end evaluation process within lighteval. It manages the flow from running models with prepared prompts to collecting responses and initiating metric computations, handling both synchronous and asynchronous model runs.
The overarching component that initializes the evaluation environment, manages model loading, sets up tasks, and orchestrates the entire evaluation flow. It serves as the main entry point for initiating an evaluation run.
Related Classes/Methods:
Responsible for dynamically loading and configuring the Language Model (LLM) to be evaluated, preparing it for inference based on the provided evaluation parameters.
Related Classes/Methods:
Prepares and manages the specific evaluation tasks and generates the corresponding input requests (prompts) that will be fed to the model. This component ensures that tasks are correctly formatted and ready for execution.
Related Classes/Methods:
Drives the core evaluation loop. It orchestrates the sequence of operations, including dispatching model inference calls, collecting responses, and triggering metric computations. This is the heart of the evaluation process.
Related Classes/Methods:
Abstracts and dispatches model inference calls to the underlying LLM. It handles the specifics of interacting with different model backends, supporting both synchronous (_run_model_sync) and asynchronous (_run_model_async) execution.
Related Classes/Methods:
lighteval.pipeline.Pipeline:_run_modellighteval.pipeline.Pipeline:_run_model_synclighteval.pipeline.Pipeline:_run_model_async
Calculates various evaluation metrics based on the model's generated responses and the ground truth data associated with each task. This component is responsible for quantifying model performance.
Related Classes/Methods:
Manages the saving of evaluation results. This includes writing results to local storage and optionally pushing them to remote repositories such as Hugging Face Hub or Amazon S3 for sharing and long-term storage.
Related Classes/Methods:
Provides programmatic access to the final evaluation results and formats them for clear and concise display to the user, enabling easy analysis and comparison of model performance.
Related Classes/Methods: