awesome-architecture-mds/ai-ml/graph4nlp/Data_Ingestion_Graph_Construction.md at main · CodeBoarding/awesome-architecture-mds

graph LR
    Dataset["Dataset"]
    Vocab["Vocab"]
    DependencyGraphConstructor["DependencyGraphConstructor"]
    ConstituencyGraphConstructor["ConstituencyGraphConstructor"]
    GraphData["GraphData"]
    Dataset -- "delegates text preprocessing to" --> Vocab
    Dataset -- "invokes graph construction methods from" --> DependencyGraphConstructor
    Dataset -- "invokes graph construction methods from" --> ConstituencyGraphConstructor
    Dataset -- "produces" --> GraphData
    Vocab -- "provides mappings used by" --> Dataset
    DependencyGraphConstructor -- "populates" --> GraphData
    ConstituencyGraphConstructor -- "populates" --> GraphData
    GraphData -- "returned to" --> Dataset

Details

The graph4nlp data processing subsystem orchestrates the transformation of raw input data into a graph-based representation suitable for neural network consumption. The Dataset component initiates this pipeline, managing data ingestion and coordinating with Vocab for text preprocessing and DependencyGraphConstructor or ConstituencyGraphConstructor for generating specific graph topologies. The ultimate output of this subsystem is the GraphData object, which encapsulates the structured graph information for downstream model processing. This clear separation of concerns ensures modularity and facilitates the integration of various graph construction techniques.

Dataset

Acts as the primary orchestrator for the data pipeline within this subsystem. It handles raw data ingestion, coordinates preprocessing steps like tokenization and vocabulary building, and invokes specific graph construction modules to generate graph topologies. Its ultimate responsibility is to produce vectorized GraphData instances ready for model consumption.

Related Classes/Methods:

graph4nlp.pytorch.data.dataset.Dataset:236-824

Vocab

Manages the vocabulary for text data. This includes building the vocabulary from raw text, providing mappings between words and numerical indices, and supporting the loading of pre-trained word embeddings. It's a crucial utility for text-to-ID conversion.

Related Classes/Methods:

DependencyGraphConstructor

Specializes in constructing dependency graphs from input text. It parses text to identify syntactic dependency relations and builds the corresponding graph structure, populating a GraphData instance with nodes (words) and edges (dependencies).

Related Classes/Methods:

ConstituencyGraphConstructor

Specializes in constructing constituency graphs from input text. It parses text to identify constituency structures (e.g., noun phrases, verb phrases) and builds the corresponding graph structure, populating a GraphData instance with phrase-level nodes and their hierarchical relationships.

Related Classes/Methods:

GraphData

Serves as the fundamental data structure for representing a single graph within the system. It encapsulates nodes, edges, and their associated features/attributes, providing a standardized format for graph manipulation and interaction across different components. It is the primary output of this subsystem.

Related Classes/Methods:

graph4nlp.pytorch.data.data.GraphData:54-1068

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Details

Dataset

Vocab

DependencyGraphConstructor

ConstituencyGraphConstructor

GraphData

FAQ

FilesExpand file tree

Data_Ingestion_Graph_Construction.md

Latest commit

History

Data_Ingestion_Graph_Construction.md

File metadata and controls

Details

Dataset

Vocab

DependencyGraphConstructor

ConstituencyGraphConstructor

GraphData

FAQ