Skip to content

Commit b6c4a37

Browse files
authored
Merge pull request #160 from KhiopsML/75-allow-single-column-dataframe-for-targets
Support single-column Pandas dataframes as targets in estimators
2 parents 92f609b + ebd1539 commit b6c4a37

66 files changed

Lines changed: 212 additions & 48905 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

khiops/sklearn/estimators.py

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1368,8 +1368,8 @@ def fit(self, X, y=None, **kwargs):
13681368
first element of the list is the main table and the following are
13691369
secondary ones joined to the main table using ``key`` estimator parameter.
13701370
1371-
y : :external:term:`array-like` of shape (n_samples,)
1372-
:external:term:`array-like` object containing the target values.
1371+
y : :external:term:`array-like` of shape (n_samples,) or
1372+
a `pandas.Dataframe` of shape (n_samples, 1) containing the target values.
13731373
13741374
**Deprecated input modes** (will be removed in khiops-python 11):
13751375
- str: A path to a data table file for file-based ``dict`` dataset
@@ -1820,8 +1820,8 @@ def fit(self, X, y, **kwargs):
18201820
first element of the list is the main table and the following are
18211821
secondary ones joined to the main table using ``key`` estimator parameter.
18221822
1823-
y : :external:term:`array-like` of shape (n_samples,)
1824-
:external:term:`array-like` object containing the target values.
1823+
y : :external:term:`array-like` of shape (n_samples,) or
1824+
a `pandas.Dataframe` of shape (n_samples, 1) containing the target values
18251825
18261826
**Deprecated input modes** (will be removed in khiops-python 11):
18271827
- str: A path to a data table file for file-based ``dict`` dataset
@@ -2147,8 +2147,8 @@ def fit(self, X, y=None, **kwargs):
21472147
first element of the list is the main table and the following are
21482148
secondary ones joined to the main table using ``key`` estimator parameter.
21492149
2150-
y : :external:term:`array-like` of shape (n_samples,)
2151-
:external:term:`array-like` object containing the target values.
2150+
y : :external:term:`array-like` of shape (n_samples,) or
2151+
a `pandas.Dataframe` of shape (n_samples, 1) containing the target values
21522152
21532153
**Deprecated input modes** (will be removed in khiops-python 11):
21542154
- str: A path to a data table file for file-based ``dict`` dataset
@@ -2462,8 +2462,8 @@ def fit(self, X, y=None, **kwargs):
24622462
first element of the list is the main table and the following are
24632463
secondary ones joined to the main table using ``key`` estimator parameter.
24642464
2465-
y : :external:term:`array-like` of shape (n_samples,)
2466-
:external:term:`array-like` object containing the target values.
2465+
y : :external:term:`array-like` of shape (n_samples,) or
2466+
a `pandas.Dataframe` of shape (n_samples, 1) containing the target values
24672467
24682468
**Deprecated input modes** (will be removed in khiops-python 11):
24692469
- str: A path to a data table file for file-based ``dict`` dataset

khiops/sklearn/tables.py

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -691,11 +691,13 @@ def _check_input_mapping(self, X, y=None):
691691
main_table_source, _ = list(X["tables"].values())[0]
692692
else:
693693
main_table_source, _ = X["tables"][X["main_table"]]
694-
if isinstance(main_table_source, pd.DataFrame) and not isinstance(
695-
y, pd.Series
694+
if (
695+
isinstance(main_table_source, pd.DataFrame)
696+
and not isinstance(y, pd.Series)
697+
and not isinstance(y, pd.DataFrame)
696698
):
697699
raise TypeError(
698-
type_error_message("y", y, pd.Series)
700+
type_error_message("y", y, pd.Series, pd.DataFrame)
699701
+ " (X's tables are of type pandas.DataFrame)"
700702
)
701703
if isinstance(main_table_source, str) and not isinstance(y, str):
@@ -1010,6 +1012,14 @@ def __init__(
10101012
f"Target series name '{target_column.name}' "
10111013
f"is already present in dataframe : {list(dataframe.columns)}"
10121014
)
1015+
elif isinstance(target_column, pd.DataFrame):
1016+
number_of_target_columns = len(target_column.columns)
1017+
if number_of_target_columns != 1:
1018+
raise ValueError(
1019+
"Target dataframe should contain exactly one column. "
1020+
f"It contains {number_of_target_columns}."
1021+
)
1022+
target_column = target_column.iloc[:, 0]
10131023

10141024
# Initialize the attributes
10151025
self.dataframe = dataframe

tests/resources/sklearn/results/ref_json_reports/Adult/KhiopsClassifier/dataframe/AllReports.khj renamed to tests/resources/sklearn/results/ref_json_reports/Adult/KhiopsClassifier/AllReports.khj

File renamed without changes.

tests/resources/sklearn/results/ref_json_reports/Adult/KhiopsClassifier/file_dataset/AllReports.khj

Lines changed: 0 additions & 12168 deletions
This file was deleted.

tests/resources/sklearn/results/ref_json_reports/Adult/KhiopsEncoder/dataframe/AllReports.khj renamed to tests/resources/sklearn/results/ref_json_reports/Adult/KhiopsEncoder/AllReports.khj

File renamed without changes.

0 commit comments

Comments
 (0)