graph LR
BorutaPy_API_Interface["BorutaPy API Interface"]
Core_Feature_Selection_Engine["Core Feature Selection Engine"]
Feature_Importance_Module["Feature Importance Module"]
Statistical_Testing_Module["Statistical Testing Module"]
Data_Transformation_Module["Data Transformation Module"]
External_Estimator["External Estimator"]
BorutaPy_API_Interface -- "Initiates Feature Selection" --> Core_Feature_Selection_Engine
Core_Feature_Selection_Engine -- "Requests Importance Calculation" --> Feature_Importance_Module
Feature_Importance_Module -- "Utilizes for Training/Prediction" --> External_Estimator
Core_Feature_Selection_Engine -- "Sends Importance Results for Testing" --> Statistical_Testing_Module
Core_Feature_Selection_Engine -- "Updates Selection Results" --> BorutaPy_API_Interface
BorutaPy_API_Interface -- "Triggers Data Filtering" --> Data_Transformation_Module
click BorutaPy_API_Interface href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/boruta_py/BorutaPy_API_Interface.md" "Details"
click Core_Feature_Selection_Engine href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/boruta_py/Core_Feature_Selection_Engine.md" "Details"
The BorutaPy library provides a robust feature selection mechanism by iteratively comparing the importance of original features with randomly generated "shadow" features. The process begins with the BorutaPy API Interface which orchestrates the entire workflow. The Core Feature Selection Engine drives the iterative process, leveraging the Feature Importance Module to calculate feature importances using an External Estimator. Statistical validation is performed by the Statistical Testing Module to confirm or reject features. Finally, the Data Transformation Module applies the selection results to the input data. This modular design allows for clear separation of concerns and integration with various machine learning estimators.
BorutaPy API Interface [Expand]
The primary user-facing component, handling initialization and exposing fit, transform, and fit_transform methods. It orchestrates the overall feature selection process.
Related Classes/Methods:
boruta.boruta_py.BorutaPy.__init__boruta.boruta_py.BorutaPy.fitboruta.boruta_py.BorutaPy.transformboruta.boruta_py.BorutaPy.fit_transform
Core Feature Selection Engine [Expand]
Encapsulates the iterative Boruta algorithm's main loop, managing the flow of feature evaluation, comparison with shadow features, and decision-making for feature retention or rejection.
Related Classes/Methods:
Responsible for generating randomized "shadow" features and computing feature importances for both original and shadow features using the provided external estimator.
Related Classes/Methods:
boruta.boruta_py.BorutaPy._add_shadows_get_impsboruta.boruta_py.BorutaPy._get_impboruta.boruta_py.BorutaPy._get_shuffle
Performs statistical significance tests on feature importances, typically using False Discovery Rate (FDR) correction, to determine which features are truly important.
Related Classes/Methods:
Applies the results of the feature selection process to input data, returning a dataset containing only the selected features (confirmed and/or tentative).
Related Classes/Methods:
Represents the user-provided machine learning model (e.g., RandomForestClassifier) that BorutaPy wraps and uses internally to calculate feature importances. This is an external dependency to the boruta_py library and does not have a direct source code reference within the boruta_py project.
Related Classes/Methods: None