graph LR
DataSampler["DataSampler"]
is_discrete_column["is_discrete_column"]
sample_condvec["sample_condvec"]
_random_choice_prob_index["_random_choice_prob_index"]
DataSampler -- "orchestrates" --> sample_condvec
DataSampler -- "uses" --> is_discrete_column
sample_condvec -- "uses" --> _random_choice_prob_index
The Data Sampling subsystem is encapsulated within the ctgan.data_sampler module. This module is responsible for all operations related to preparing and sampling data, particularly for generating conditional vectors that guide the synthetic data generation process.
The primary orchestrator of the data sampling process. It identifies discrete columns, prepares the input data, and manages the generation of conditional vectors. It serves as the main interface for other parts of the CTGAN model to obtain sampled data and conditional information.
Related Classes/Methods:
A utility function that determines whether a given column in the dataset is discrete. This is crucial for applying appropriate encoding and sampling strategies during data preparation.
Related Classes/Methods:
Generates the conditional vectors. These vectors are fundamental for guiding the generative models (e.g., CTGAN's generator) to produce synthetic data with specific characteristics, especially for discrete columns, ensuring the generated data adheres to desired distributions.
Related Classes/Methods:
A low-level utility function that performs probabilistic index selection based on given probabilities. It ensures diversity and adherence to data distributions during the sampling process, particularly when selecting indices for conditional vectors.
Related Classes/Methods: