Combined Algorithm Selection and Hyperparameter optimization (CASH) is the core problem behind Auto-sklearn, Auto-WEKA and FLAML. The optimizer must simultaneously choose which classifier to use and set its hyperparameters, where different algorithms have different parameter spaces. This test function exposes that joint selection problem as a single flat search space using parameter prefixing.
The search space combines algorithm selection with algorithm-specific hyperparameters:
{
"algorithm": ["knn", "dt", "rf", "svm", "gb"],
"knn__n_neighbors": [3, 5, 7, 11, 15, 21, 31],
"dt__max_depth": [None, 2, 5, 10, 20],
"dt__min_samples_split": [2, 5, 10, 20],
"rf__n_estimators": [10, 50, 100, 200],
"rf__max_depth": [None, 5, 10, 20],
"svm__C": [0.01, 0.1, 1.0, 10.0, 100.0],
"svm__kernel": ["linear", "rbf", "poly"],
"gb__n_estimators": [50, 100, 200],
"gb__learning_rate": [0.01, 0.05, 0.1, 0.2],
"gb__max_depth": [3, 5, 7],
}
The _ml_objective implementation selects the classifier based on the algorithm parameter, picks only the matching prefixed hyperparameters (ignoring parameters for other algorithms), trains with cross-validated accuracy and returns the mean score. Parameters for non-selected algorithms have no effect on the score, creating large neutral regions in the landscape that surround narrow algorithm-specific valleys.
The score is mean cross-validated accuracy, consistent with the existing classification functions. The constructor takes dataset and cv following the same pattern as RandomForestClassifierFunction. Only scikit-learn is needed under surfaces[ml].
Combined Algorithm Selection and Hyperparameter optimization (CASH) is the core problem behind Auto-sklearn, Auto-WEKA and FLAML. The optimizer must simultaneously choose which classifier to use and set its hyperparameters, where different algorithms have different parameter spaces. This test function exposes that joint selection problem as a single flat search space using parameter prefixing.
The search space combines algorithm selection with algorithm-specific hyperparameters:
{ "algorithm": ["knn", "dt", "rf", "svm", "gb"], "knn__n_neighbors": [3, 5, 7, 11, 15, 21, 31], "dt__max_depth": [None, 2, 5, 10, 20], "dt__min_samples_split": [2, 5, 10, 20], "rf__n_estimators": [10, 50, 100, 200], "rf__max_depth": [None, 5, 10, 20], "svm__C": [0.01, 0.1, 1.0, 10.0, 100.0], "svm__kernel": ["linear", "rbf", "poly"], "gb__n_estimators": [50, 100, 200], "gb__learning_rate": [0.01, 0.05, 0.1, 0.2], "gb__max_depth": [3, 5, 7], }The
_ml_objectiveimplementation selects the classifier based on thealgorithmparameter, picks only the matching prefixed hyperparameters (ignoring parameters for other algorithms), trains with cross-validated accuracy and returns the mean score. Parameters for non-selected algorithms have no effect on the score, creating large neutral regions in the landscape that surround narrow algorithm-specific valleys.The score is mean cross-validated accuracy, consistent with the existing classification functions. The constructor takes
datasetandcvfollowing the same pattern asRandomForestClassifierFunction. Onlyscikit-learnis needed undersurfaces[ml].