Skip to content

Latest commit

 

History

History
89 lines (53 loc) · 6.14 KB

File metadata and controls

89 lines (53 loc) · 6.14 KB
graph LR
    User_Interface_UI_Layer["User Interface (UI) Layer"]
    Application_Orchestration_Layer["Application Orchestration Layer"]
    Data_Ingestion_Processing_Layer["Data Ingestion & Processing Layer"]
    Vector_Store_Retrieval_Layer["Vector Store & Retrieval Layer"]
    LLM_Model_Management_Optimization_Layer["LLM Model Management & Optimization Layer"]
    LLM_Inference_Layer["LLM Inference Layer"]
    User_Interface_UI_Layer -- "sends user queries and updates to" --> Application_Orchestration_Layer
    Application_Orchestration_Layer -- "sends LLM responses and status updates back to" --> User_Interface_UI_Layer
    Application_Orchestration_Layer -- "triggers data ingestion in" --> Data_Ingestion_Processing_Layer
    Application_Orchestration_Layer -- "queries for relevant documents from" --> Vector_Store_Retrieval_Layer
    Vector_Store_Retrieval_Layer -- "returns retrieved document chunks to" --> Application_Orchestration_Layer
    Application_Orchestration_Layer -- "constructs and sends prompts to" --> LLM_Inference_Layer
    LLM_Inference_Layer -- "returns generated responses to" --> Application_Orchestration_Layer
    LLM_Model_Management_Optimization_Layer -- "provides optimized LLM engine files to" --> LLM_Inference_Layer
    Data_Ingestion_Processing_Layer -- "provides generated embeddings to" --> Vector_Store_Retrieval_Layer
    click User_Interface_UI_Layer href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/trt-llm-rag-linux/User_Interface_UI_Layer.md" "Details"
    click Application_Orchestration_Layer href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/trt-llm-rag-linux/Application_Orchestration_Layer.md" "Details"
    click Data_Ingestion_Processing_Layer href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/trt-llm-rag-linux/Data_Ingestion_Processing_Layer.md" "Details"
    click LLM_Model_Management_Optimization_Layer href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/trt-llm-rag-linux/LLM_Model_Management_Optimization_Layer.md" "Details"
    click LLM_Inference_Layer href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/trt-llm-rag-linux/LLM_Inference_Layer.md" "Details"
Loading

CodeBoardingDemoContact

Details

The trt-llm-rag-linux application implements a Retrieval-Augmented Generation (RAG) pattern, centered around an Application Orchestration Layer that manages the entire workflow. User interactions are handled by the User Interface (UI) Layer, which sends requests to the orchestrator. The orchestrator then interacts with the Data Ingestion & Processing Layer to prepare data and the Vector Store & Retrieval Layer to retrieve relevant context. For LLM operations, it leverages the LLM Model Management & Optimization Layer to prepare optimized models for the LLM Inference Layer, which performs the actual text generation. This layered approach ensures modularity, performance optimization, and clear separation of concerns for data handling, model management, and inference.

User Interface (UI) Layer [Expand]

Provides the interactive graphical interface for users, handling input, displaying chat history, and managing model/dataset selections.

Related Classes/Methods:

Application Orchestration Layer [Expand]

The central control unit, coordinating data flow, managing application state, processing user requests, and orchestrating the end-to-end RAG pipeline.

Related Classes/Methods:

Data Ingestion & Processing Layer [Expand]

Responsible for acquiring raw data from various sources, parsing it, and preparing it for embedding into the vector store.

Related Classes/Methods:

Vector Store & Retrieval Layer

Manages the storage of document embeddings (using FAISS) and efficiently retrieves relevant document chunks based on user queries.

Related Classes/Methods:

LLM Model Management & Optimization Layer [Expand]

Dedicated to the lifecycle management of LLMs, focusing on their optimization for NVIDIA TensorRT-LLM, including conversion, building, and serialization of LLM engines.

Related Classes/Methods:

LLM Inference Layer [Expand]

Provides the core functionality for interacting with the optimized LLM, responsible for prompt preparation, executing LLM inference, and streaming generated responses.

Related Classes/Methods: