Skip to content

Latest commit

 

History

History
68 lines (37 loc) · 3.96 KB

File metadata and controls

68 lines (37 loc) · 3.96 KB
graph LR
    Dataset["Dataset"]
    Vocab["Vocab"]
    DependencyGraphConstructor["DependencyGraphConstructor"]
    ConstituencyGraphConstructor["ConstituencyGraphConstructor"]
    GraphData["GraphData"]
    Dataset -- "delegates text preprocessing to" --> Vocab
    Dataset -- "invokes graph construction methods from" --> DependencyGraphConstructor
    Dataset -- "invokes graph construction methods from" --> ConstituencyGraphConstructor
    Dataset -- "produces" --> GraphData
    Vocab -- "provides mappings used by" --> Dataset
    DependencyGraphConstructor -- "populates" --> GraphData
    ConstituencyGraphConstructor -- "populates" --> GraphData
    GraphData -- "returned to" --> Dataset
Loading

CodeBoardingDemoContact

Details

The graph4nlp data processing subsystem orchestrates the transformation of raw input data into a graph-based representation suitable for neural network consumption. The Dataset component initiates this pipeline, managing data ingestion and coordinating with Vocab for text preprocessing and DependencyGraphConstructor or ConstituencyGraphConstructor for generating specific graph topologies. The ultimate output of this subsystem is the GraphData object, which encapsulates the structured graph information for downstream model processing. This clear separation of concerns ensures modularity and facilitates the integration of various graph construction techniques.

Dataset

Acts as the primary orchestrator for the data pipeline within this subsystem. It handles raw data ingestion, coordinates preprocessing steps like tokenization and vocabulary building, and invokes specific graph construction modules to generate graph topologies. Its ultimate responsibility is to produce vectorized GraphData instances ready for model consumption.

Related Classes/Methods:

Vocab

Manages the vocabulary for text data. This includes building the vocabulary from raw text, providing mappings between words and numerical indices, and supporting the loading of pre-trained word embeddings. It's a crucial utility for text-to-ID conversion.

Related Classes/Methods:

DependencyGraphConstructor

Specializes in constructing dependency graphs from input text. It parses text to identify syntactic dependency relations and builds the corresponding graph structure, populating a GraphData instance with nodes (words) and edges (dependencies).

Related Classes/Methods:

ConstituencyGraphConstructor

Specializes in constructing constituency graphs from input text. It parses text to identify constituency structures (e.g., noun phrases, verb phrases) and builds the corresponding graph structure, populating a GraphData instance with phrase-level nodes and their hierarchical relationships.

Related Classes/Methods:

GraphData

Serves as the fundamental data structure for representing a single graph within the system. It encapsulates nodes, edges, and their associated features/attributes, providing a standardized format for graph manipulation and interaction across different components. It is the primary output of this subsystem.

Related Classes/Methods: