Skip to content

Latest commit

 

History

History
110 lines (64 loc) · 6.57 KB

File metadata and controls

110 lines (64 loc) · 6.57 KB
graph LR
    Blocking_Module["Blocking Module"]
    Blocking_Rule_Management["Blocking Rule Management"]
    Comparison_Module["Comparison Module"]
    Comparison_Level_Management["Comparison Level Management"]
    EM_Training_Session["EM Training Session"]
    Expectation_Maximisation_EM_Engine["Expectation Maximisation (EM) Engine"]
    Prediction_Module["Prediction Module"]
    Clustering_Module["Clustering Module"]
    Blocking_Module -- "obtains rules from" --> Blocking_Rule_Management
    Comparison_Module -- "is composed of" --> Comparison_Level_Management
    Comparison_Module -- "generates comparison vectors for" --> Prediction_Module
    EM_Training_Session -- "drives" --> Expectation_Maximisation_EM_Engine
    EM_Training_Session -- "uses probabilities from" --> Comparison_Level_Management
    Prediction_Module -- "receives comparison vectors from" --> Comparison_Module
    Prediction_Module -- "provides pairwise match probabilities to" --> Clustering_Module
    Clustering_Module -- "receives match probabilities from" --> Prediction_Module
Loading

CodeBoardingDemoContact

Details

The Core Linkage Processing Engine subsystem is responsible for executing the main record linkage pipeline, encompassing candidate pair generation (blocking), attribute comparison, statistical model training (EM algorithm), match probability prediction, and record clustering.

Blocking Module

Efficiently reduces the number of record pairs to be compared by applying user-defined blocking rules.

Related Classes/Methods:

Blocking Rule Management

Provides a structured and extensible way to define, create, and retrieve blocking rules, including generating their SQL representations.

Related Classes/Methods:

Comparison Module

Defines how individual attributes are compared between records and how these comparisons contribute to a comparison vector.

Related Classes/Methods:

Comparison Level Management

Specifies the conditions for different levels of agreement/disagreement for an attribute, along with their associated m and u probabilities.

Related Classes/Methods:

EM Training Session

Manages the entire EM training process, including fetching data, running iterations, checking for convergence, and logging progress.

Related Classes/Methods:

Expectation Maximisation (EM) Engine

The core statistical engine that iteratively estimates the m and u probabilities for each comparison level and the overall probability of two random records matching.

Related Classes/Methods:

Prediction Module

Computes the final match probability for each record pair based on their comparison vector and the trained m and u probabilities.

Related Classes/Methods:

Clustering Module

Groups records into clusters, where each cluster represents a linked entity, based on the predicted match probabilities.

Related Classes/Methods: