graph LR
autoviz_AutoViz_Class_AutoViz_Main["autoviz.AutoViz_Class.AutoViz_Main"]
autoviz_AutoViz_Utils_classify_print_vars["autoviz.AutoViz_Utils.classify_print_vars"]
autoviz_AutoViz_Utils_load_file_dataframe["autoviz.AutoViz_Utils.load_file_dataframe"]
autoviz_classify_method_classify_columns["autoviz.classify_method.classify_columns"]
autoviz_AutoViz_Class_AutoViz_Main -- "delegates to" --> autoviz_AutoViz_Utils_classify_print_vars
autoviz_AutoViz_Utils_classify_print_vars -- "calls" --> autoviz_AutoViz_Utils_load_file_dataframe
autoviz_AutoViz_Utils_classify_print_vars -- "directs data to" --> autoviz_classify_method_classify_columns
autoviz_AutoViz_Utils_load_file_dataframe -- "returns data to" --> autoviz_AutoViz_Utils_classify_print_vars
autoviz_classify_method_classify_columns -- "returns results to" --> autoviz_AutoViz_Utils_classify_print_vars
autoviz_AutoViz_Utils_classify_print_vars -- "provides prepared data to" --> autoviz_AutoViz_Class_AutoViz_Main
The AutoViz data preparation subsystem is orchestrated by AutoViz_Class.AutoViz_Main, which initiates the data processing pipeline. The central component, AutoViz_Utils.classify_print_vars, manages the flow by first invoking AutoViz_Utils.load_file_dataframe to ingest and standardize the input data. Once the data is loaded, classify_print_vars then passes a sampled version of this data to classify_method.classify_columns for detailed variable type identification and initial data quality assessment. The classification results and prepared data are subsequently returned to classify_print_vars, which then consolidates this information before providing the final prepared dataset and metadata back to AutoViz_Class.AutoViz_Main for further visualization stages. This structured interaction ensures robust data handling and intelligent variable classification.
Serves as the high-level orchestrator and entry point from the main AutoViz API into the data preparation phase. It initiates the data loading and classification process by delegating to the specialized utility components.
Related Classes/Methods:
Acts as the central manager within the data preparation pipeline. It coordinates the data loading, triggers the column classification, and aggregates the resulting metadata about the dataset. This component is crucial for bridging the raw data input with the classification logic.
Related Classes/Methods:
Handles the fundamental task of reading data from various input sources (e.g., CSV, text, JSON, Excel) or directly from a Pandas DataFrame. It ensures data is in a standardized and usable format for subsequent analysis, including handling nrows for sampling and removing duplicate columns.
Related Classes/Methods:
Implements the core intelligence for automatically identifying and categorizing the data type and semantic role of each column within the loaded DataFrame. This includes distinguishing between numerical, categorical, date, ID, boolean, and NLP (text) variables, and also identifies columns to be deleted based on criteria like single unique value, high missing values, mixed data types, or infinity values.
Related Classes/Methods: