Skip to content

Latest commit

 

History

History
69 lines (38 loc) · 4.11 KB

File metadata and controls

69 lines (38 loc) · 4.11 KB
graph LR
    OpenAddress_JSON_Processor["OpenAddress JSON Processor"]
    OSM_XML_Processor["OSM XML Processor"]
    JSON_Address_List_Converter["JSON Address List Converter"]
    XML_Address_List_Converter["XML Address List Converter"]
    Training_XML_Formatter["Training XML Formatter"]
    OpenAddress_JSON_Processor -- "orchestrates" --> JSON_Address_List_Converter
    OpenAddress_JSON_Processor -- "orchestrates" --> Training_XML_Formatter
    OSM_XML_Processor -- "delegates to" --> XML_Address_List_Converter
    JSON_Address_List_Converter -- "feeds into" --> Training_XML_Formatter
    XML_Address_List_Converter -- "feeds into" --> Training_XML_Formatter
Loading

CodeBoardingDemoContact

Details

The Training Data Processors subsystem is responsible for the offline ingestion and conversion of raw data sources into the specific XML format required for training or retraining the Probabilistic Tagging Engine (CRF). This subsystem is critical for ensuring the accuracy and quality of the training data.

OpenAddress JSON Processor

Orchestrates the end-to-end conversion of raw OpenAddress JSON data into the final training XML format. It serves as the primary entry point for OpenAddress data processing within the training pipeline.

Related Classes/Methods:

OSM XML Processor

Manages the initial parsing and conversion of OpenStreetMap (OSM) XML data (both natural and synthetic) into a standardized address list. It handles the OSM-specific XML structures before general processing.

Related Classes/Methods:

JSON Address List Converter

Transforms OpenAddress JSON input into an intermediate, standardized list of address components. This component normalizes the JSON structure into a consistent internal representation, decoupling the source format from subsequent processing steps.

Related Classes/Methods:

XML Address List Converter

Parses various XML inputs (specifically from OSM natural or synthetic data) and converts them into the same standardized list of address components as the JSON converter. This ensures a unified intermediate data structure regardless of the original XML source.

Related Classes/Methods:

Training XML Formatter

Takes the standardized address list (produced by either JSON Address List Converter or XML Address List Converter) and formats it into the final XML structure required for the usaddress training process. This is the crucial final stage of data preparation for the Probabilistic Tagging Engine (CRF).

Related Classes/Methods: