graph LR
Blocking_Module["Blocking Module"]
Blocking_Rule_Management["Blocking Rule Management"]
Comparison_Module["Comparison Module"]
Comparison_Level_Management["Comparison Level Management"]
EM_Training_Session["EM Training Session"]
Expectation_Maximisation_EM_Engine["Expectation Maximisation (EM) Engine"]
Prediction_Module["Prediction Module"]
Clustering_Module["Clustering Module"]
Blocking_Module -- "obtains rules from" --> Blocking_Rule_Management
Comparison_Module -- "is composed of" --> Comparison_Level_Management
Comparison_Module -- "generates comparison vectors for" --> Prediction_Module
EM_Training_Session -- "drives" --> Expectation_Maximisation_EM_Engine
EM_Training_Session -- "uses probabilities from" --> Comparison_Level_Management
Prediction_Module -- "receives comparison vectors from" --> Comparison_Module
Prediction_Module -- "provides pairwise match probabilities to" --> Clustering_Module
Clustering_Module -- "receives match probabilities from" --> Prediction_Module
The Core Linkage Processing Engine subsystem is responsible for executing the main record linkage pipeline, encompassing candidate pair generation (blocking), attribute comparison, statistical model training (EM algorithm), match probability prediction, and record clustering.
Efficiently reduces the number of record pairs to be compared by applying user-defined blocking rules.
Related Classes/Methods:
Provides a structured and extensible way to define, create, and retrieve blocking rules, including generating their SQL representations.
Related Classes/Methods:
splink.internals.blocking_rule_creatorsplink.internals.blocking_rule_librarysplink.internals.blocking_analysis
Defines how individual attributes are compared between records and how these comparisons contribute to a comparison vector.
Related Classes/Methods:
Specifies the conditions for different levels of agreement/disagreement for an attribute, along with their associated m and u probabilities.
Related Classes/Methods:
splink.internals.comparison_creatorsplink.internals.comparison_levelsplink.internals.comparison_level_library
Manages the entire EM training process, including fetching data, running iterations, checking for convergence, and logging progress.
Related Classes/Methods:
The core statistical engine that iteratively estimates the m and u probabilities for each comparison level and the overall probability of two random records matching.
Related Classes/Methods:
splink.internals.expectation_maximisationsplink.internals.estimate_usplink.internals.m_u_records_to_parameters
Computes the final match probability for each record pair based on their comparison vector and the trained m and u probabilities.
Related Classes/Methods:
Groups records into clusters, where each cluster represents a linked entity, based on the predicted match probabilities.
Related Classes/Methods: