Skip to content

[ENH] add regression preprocessing pipeline test-function #21

@SimonBlanke

Description

@SimonBlanke

In regression tasks the preprocessing choices often matter more than the model hyperparameters. Missing value handling, feature scaling, and feature transformation interact with each other and with the downstream model in ways that create a landscape with strong parameter dependencies. This test function optimizes a full sklearn Pipeline from imputation through prediction.

The search space parameters would be:

{
    "imputer_strategy": ["mean", "median", "most_frequent"],
    "scaler": ["standard", "minmax", "robust", "none"],
    "feature_transform": ["none", "polynomial_2", "polynomial_3", "log1p"],
    "feature_selection_k": [5, 10, 15, 20, "all"],
    "model": ["ridge", "lasso", "elastic_net", "gb"],
    "model__alpha": [0.001, 0.01, 0.1, 1.0, 10.0, 100.0],
}

The implementation builds a sklearn Pipeline based on the parameter values. When feature_transform is "none" the corresponding pipeline step is skipped. The feature_selection_k parameter controls SelectKBest with f_regression scoring, where "all" disables selection. The model__alpha parameter applies to Ridge, Lasso and ElasticNet as regularization strength but is ignored when model="gb" (GradientBoostingRegressor uses its own defaults).

The constructor accepts a dataset parameter with the same regression datasets as the existing regressor functions (diabetes, california, friedman1, friedman2, linear). The diabetes dataset is the most interesting choice here because it has moderate dimensionality and benefits visibly from preprocessing, while the california housing dataset provides a harder problem where feature selection has a larger effect. The score is mean cross-validated R2, consistent with the existing regression functions. Only scikit-learn is needed under surfaces[ml].

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions