[ENH] add regression preprocessing pipeline test-function

In regression tasks the preprocessing choices often matter more than the model hyperparameters. Missing value handling, feature scaling, and feature transformation interact with each other and with the downstream model in ways that create a landscape with strong parameter dependencies. This test function optimizes a full sklearn Pipeline from imputation through prediction.

The search space parameters would be:

```python
{
    "imputer_strategy": ["mean", "median", "most_frequent"],
    "scaler": ["standard", "minmax", "robust", "none"],
    "feature_transform": ["none", "polynomial_2", "polynomial_3", "log1p"],
    "feature_selection_k": [5, 10, 15, 20, "all"],
    "model": ["ridge", "lasso", "elastic_net", "gb"],
    "model__alpha": [0.001, 0.01, 0.1, 1.0, 10.0, 100.0],
}
```

The implementation builds a sklearn Pipeline based on the parameter values. When `feature_transform` is `"none"` the corresponding pipeline step is skipped. The `feature_selection_k` parameter controls `SelectKBest` with `f_regression` scoring, where `"all"` disables selection. The `model__alpha` parameter applies to Ridge, Lasso and ElasticNet as regularization strength but is ignored when `model="gb"` (GradientBoostingRegressor uses its own defaults).

The constructor accepts a `dataset` parameter with the same regression datasets as the existing regressor functions (diabetes, california, friedman1, friedman2, linear). The diabetes dataset is the most interesting choice here because it has moderate dimensionality and benefits visibly from preprocessing, while the california housing dataset provides a harder problem where feature selection has a larger effect. The score is mean cross-validated R2, consistent with the existing regression functions. Only `scikit-learn` is needed under `surfaces[ml]`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] add regression preprocessing pipeline test-function #21

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[ENH] add regression preprocessing pipeline test-function #21

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions