Skip to content

[ENH] add time-series forecasting pipeline test-function #20

@SimonBlanke

Description

@SimonBlanke

Time-series forecasting pipelines have a different structure from tabular ML pipelines because the feature engineering step creates the tabular representation from raw time-series data. This test function optimizes the full chain from feature extraction through model training and returns the forecast accuracy on a held-out temporal test set.

The search space parameters:

{
    "n_lags": [3, 5, 7, 10, 14, 21],
    "rolling_window": [0, 3, 7, 14],
    "differencing": [0, 1, 2],
    "scaler": ["none", "standard", "minmax"],
    "model": ["ridge", "rf", "gb"],
    "model__regularization": [0.001, 0.01, 0.1, 1.0, 10.0],
}

The n_lags parameter controls how many past observations become input features. Setting rolling_window to 0 disables rolling mean/std features, while positive values add rolling statistics computed over that window size. The differencing parameter applies 0, 1 or 2 rounds of first-order differencing to the target before feature extraction. The model__regularization parameter maps to alpha for Ridge and to min_samples_leaf (discretized) for tree-based models.

The implementation uses a temporal train/test split rather than cross-validation, because shuffled CV violates the temporal ordering assumption. The last 20% of the series is held out for evaluation. Feature construction is done manually with numpy or with sktime. The score is negative mean absolute error on the held-out period, where less negative is better.

The constructor accepts a dataset parameter with the same time-series datasets as the existing forecasting functions (airline, energy, sine_wave). Only scikit-learn and numpy are needed under surfaces[ml].

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions