graph LR
DataTransformer["DataTransformer"]
Raw_Pandas_DataFrames["Raw Pandas DataFrames"]
NumPy_arrays["NumPy arrays"]
Pandas_DataFrames_Synthetic_Data_["Pandas DataFrames (Synthetic Data)"]
CTGAN_Model["CTGAN Model"]
Raw_Pandas_DataFrames -- "inputs_to" --> DataTransformer
DataTransformer -- "transforms_into" --> NumPy_arrays
NumPy_arrays -- "feeds_into" --> CTGAN_Model
CTGAN_Model -- "generates" --> NumPy_arrays
NumPy_arrays -- "transformed_back_by" --> DataTransformer
DataTransformer -- "outputs" --> Pandas_DataFrames_Synthetic_Data_
click CTGAN_Model href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/CTGAN/CTGAN_Model.md" "Details"
The CTGAN subsystem is centered around the DataTransformer and the CTGAN Model. The process begins with Raw Pandas DataFrames which are ingested by the DataTransformer. The DataTransformer then transforms this raw tabular data into NumPy arrays, a numerical format suitable for deep learning. These NumPy arrays are fed into the CTGAN Model for training and synthetic data generation. The CTGAN Model outputs generated data, also in the form of NumPy arrays, which are then passed back to the DataTransformer. The DataTransformer performs an inverse transformation, converting the numerical synthetic data back into human-readable Pandas DataFrames (Synthetic Data), thus completing the data synthesis pipeline. This architecture clearly delineates the roles of data preparation, model training/generation, and output formatting.
This component is responsible for the entire lifecycle of data preparation for generative models. It learns the statistical distributions of raw tabular data, transforms this data into a numerical format suitable for model training (e.g., PyTorch tensors), and performs the inverse transformation to convert generated synthetic data back into its original, human-readable representation. It specifically handles both continuous and discrete data types, ensuring data integrity and compatibility throughout the synthesis pipeline.
Related Classes/Methods:
Raw tabular data, typically provided as Pandas DataFrames, serving as the initial input for fitting and transformation. This represents the external data source.
Related Classes/Methods: None
Numerical data in the form of NumPy arrays, which are the standard input format for the deep learning models after transformation by the DataTransformer.
Related Classes/Methods: None
Structured Pandas DataFrames, representing the generated synthetic data restored to its original tabular representation by the DataTransformer.
Related Classes/Methods: None
CTGAN Model [Expand]
The core generative adversarial network model responsible for learning the underlying data distribution and generating synthetic data. It consumes numerical data prepared by DataTransformer for training and generation.
Related Classes/Methods: