Skip to content

Latest commit

 

History

History
92 lines (56 loc) · 7.38 KB

File metadata and controls

92 lines (56 loc) · 7.38 KB
graph LR
    Document_Ingestion_Preprocessing["Document Ingestion & Preprocessing"]
    Schema_Definition_Validation["Schema Definition & Validation"]
    LLM_Interaction_Prompt_Engineering["LLM Interaction & Prompt Engineering"]
    Extraction_Orchestration["Extraction Orchestration"]
    Extracted_Data_Management["Extracted Data Management"]
    Output_Serialization["Output & Serialization"]
    Document_Ingestion_Preprocessing -- "Provides cleaned and segmented document text for prompt context." --> LLM_Interaction_Prompt_Engineering
    Schema_Definition_Validation -- "Supplies structured schemas and validation rules for prompt generation." --> LLM_Interaction_Prompt_Engineering
    LLM_Interaction_Prompt_Engineering -- "Sends raw LLM responses for further processing and validation." --> Extraction_Orchestration
    Extraction_Orchestration -- "Sends extracted items for validation against defined schemas." --> Schema_Definition_Validation
    Schema_Definition_Validation -- "Stores validated `Aspects` and `Concepts`." --> Extracted_Data_Management
    Extracted_Data_Management -- "Provides the stored `Aspects` and `Concepts` for conversion into external formats." --> Output_Serialization
    click Document_Ingestion_Preprocessing href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/contextgem/Document_Ingestion_Preprocessing.md" "Details"
    click Schema_Definition_Validation href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/contextgem/Schema_Definition_Validation.md" "Details"
    click Extraction_Orchestration href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/contextgem/Extraction_Orchestration.md" "Details"
    click Extracted_Data_Management href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/contextgem/Extracted_Data_Management.md" "Details"
Loading

CodeBoardingDemoContact

Details

The contextgem architecture is designed as a modular, pipeline-driven framework for LLM-powered information extraction. It begins with the Document Ingestion & Preprocessing component, which transforms raw documents into LLM-ready text. This text, along with dynamically defined schemas from the Schema Definition & Validation component, feeds into the LLM Interaction & Prompt Engineering component to generate and execute LLM queries. The core Extraction Orchestration component then takes the raw LLM output, validates it against the schemas, and manages the flow of extracted data. Validated Aspects and Concepts are stored and managed by the Extracted Data Management component, providing a central source of truth. Finally, the Output & Serialization component prepares the structured data for external consumption. This clear separation of concerns and sequential data flow makes contextgem highly extensible and maintainable, ideal for complex information extraction tasks.

Document Ingestion & Preprocessing [Expand]

Handles reading, cleaning, and segmenting raw documents.

Related Classes/Methods:

Schema Definition & Validation [Expand]

Defines and validates the structure of extracted information using Pydantic models.

Related Classes/Methods:

LLM Interaction & Prompt Engineering

Manages communication with LLMs, including prompt construction and response handling.

Related Classes/Methods:

Extraction Orchestration [Expand]

Coordinates the entire extraction pipeline, from LLM interaction to data validation and storage.

Related Classes/Methods:

Extracted Data Management [Expand]

Serves as the internal repository for validated Aspects and Concepts.

Related Classes/Methods:

Output & Serialization

Converts structured extracted data into various external formats.

Related Classes/Methods: