graph LR
User_Interface_UI_Layer["User Interface (UI) Layer"]
Application_Orchestration_Layer["Application Orchestration Layer"]
Data_Ingestion_Processing_Layer["Data Ingestion & Processing Layer"]
Vector_Store_Retrieval_Layer["Vector Store & Retrieval Layer"]
LLM_Model_Management_Optimization_Layer["LLM Model Management & Optimization Layer"]
LLM_Inference_Layer["LLM Inference Layer"]
User_Interface_UI_Layer -- "sends user queries and updates to" --> Application_Orchestration_Layer
Application_Orchestration_Layer -- "sends LLM responses and status updates back to" --> User_Interface_UI_Layer
Application_Orchestration_Layer -- "triggers data ingestion in" --> Data_Ingestion_Processing_Layer
Application_Orchestration_Layer -- "queries for relevant documents from" --> Vector_Store_Retrieval_Layer
Vector_Store_Retrieval_Layer -- "returns retrieved document chunks to" --> Application_Orchestration_Layer
Application_Orchestration_Layer -- "constructs and sends prompts to" --> LLM_Inference_Layer
LLM_Inference_Layer -- "returns generated responses to" --> Application_Orchestration_Layer
LLM_Model_Management_Optimization_Layer -- "provides optimized LLM engine files to" --> LLM_Inference_Layer
Data_Ingestion_Processing_Layer -- "provides generated embeddings to" --> Vector_Store_Retrieval_Layer
click User_Interface_UI_Layer href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/trt-llm-rag-linux/User_Interface_UI_Layer.md" "Details"
click Application_Orchestration_Layer href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/trt-llm-rag-linux/Application_Orchestration_Layer.md" "Details"
click Data_Ingestion_Processing_Layer href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/trt-llm-rag-linux/Data_Ingestion_Processing_Layer.md" "Details"
click LLM_Model_Management_Optimization_Layer href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/trt-llm-rag-linux/LLM_Model_Management_Optimization_Layer.md" "Details"
click LLM_Inference_Layer href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/trt-llm-rag-linux/LLM_Inference_Layer.md" "Details"
The trt-llm-rag-linux application implements a Retrieval-Augmented Generation (RAG) pattern, centered around an Application Orchestration Layer that manages the entire workflow. User interactions are handled by the User Interface (UI) Layer, which sends requests to the orchestrator. The orchestrator then interacts with the Data Ingestion & Processing Layer to prepare data and the Vector Store & Retrieval Layer to retrieve relevant context. For LLM operations, it leverages the LLM Model Management & Optimization Layer to prepare optimized models for the LLM Inference Layer, which performs the actual text generation. This layered approach ensures modularity, performance optimization, and clear separation of concerns for data handling, model management, and inference.
User Interface (UI) Layer [Expand]
Provides the interactive graphical interface for users, handling input, displaying chat history, and managing model/dataset selections.
Related Classes/Methods:
Application Orchestration Layer [Expand]
The central control unit, coordinating data flow, managing application state, processing user requests, and orchestrating the end-to-end RAG pipeline.
Related Classes/Methods:
Data Ingestion & Processing Layer [Expand]
Responsible for acquiring raw data from various sources, parsing it, and preparing it for embedding into the vector store.
Related Classes/Methods:
Manages the storage of document embeddings (using FAISS) and efficiently retrieves relevant document chunks based on user queries.
Related Classes/Methods:
LLM Model Management & Optimization Layer [Expand]
Dedicated to the lifecycle management of LLMs, focusing on their optimization for NVIDIA TensorRT-LLM, including conversion, building, and serialization of LLM engines.
Related Classes/Methods:
LLM Inference Layer [Expand]
Provides the core functionality for interacting with the optimized LLM, responsible for prompt preparation, executing LLM inference, and streaming generated responses.
Related Classes/Methods: