Skip to content

Latest commit

 

History

History
56 lines (30 loc) · 3 KB

File metadata and controls

56 lines (30 loc) · 3 KB
graph LR
    DataSampler["DataSampler"]
    is_discrete_column["is_discrete_column"]
    sample_condvec["sample_condvec"]
    _random_choice_prob_index["_random_choice_prob_index"]
    DataSampler -- "orchestrates" --> sample_condvec
    DataSampler -- "uses" --> is_discrete_column
    sample_condvec -- "uses" --> _random_choice_prob_index
Loading

CodeBoardingDemoContact

Details

The Data Sampling subsystem is encapsulated within the ctgan.data_sampler module. This module is responsible for all operations related to preparing and sampling data, particularly for generating conditional vectors that guide the synthetic data generation process.

DataSampler

The primary orchestrator of the data sampling process. It identifies discrete columns, prepares the input data, and manages the generation of conditional vectors. It serves as the main interface for other parts of the CTGAN model to obtain sampled data and conditional information.

Related Classes/Methods:

is_discrete_column

A utility function that determines whether a given column in the dataset is discrete. This is crucial for applying appropriate encoding and sampling strategies during data preparation.

Related Classes/Methods:

sample_condvec

Generates the conditional vectors. These vectors are fundamental for guiding the generative models (e.g., CTGAN's generator) to produce synthetic data with specific characteristics, especially for discrete columns, ensuring the generated data adheres to desired distributions.

Related Classes/Methods:

_random_choice_prob_index

A low-level utility function that performs probabilistic index selection based on given probabilities. It ensures diversity and adherence to data distributions during the sampling process, particularly when selecting indices for conditional vectors.

Related Classes/Methods: