graph LR
Langchain_Reranker_Adapter["Langchain Reranker Adapter"]
LlamaIndex_Reranker_Adapter["LlamaIndex Reranker Adapter"]
BCEmbedding_Reranker_Model["BCEmbedding Reranker Model"]
PDF_Data_Extractor["PDF Data Extractor"]
QA_Dataset_Filter["QA Dataset Filter"]
Datasets["Datasets"]
RAG_Pipelines["RAG Pipelines"]
RAG_Retrieval_Engine["RAG Retrieval Engine"]
Langchain_Reranker_Adapter -- "uses" --> BCEmbedding_Reranker_Model
LlamaIndex_Reranker_Adapter -- "uses" --> BCEmbedding_Reranker_Model
Langchain_Reranker_Adapter -- "integrates with" --> BCEmbedding_Reranker_Model
LlamaIndex_Reranker_Adapter -- "integrates with" --> BCEmbedding_Reranker_Model
PDF_Data_Extractor -- "provides input for" --> RAG_Pipelines
QA_Dataset_Filter -- "processes" --> Datasets
QA_Dataset_Filter -- "refines" --> Datasets
PDF_Data_Extractor -- "feeds into" --> RAG_Pipelines
RAG_Retrieval_Engine -- "supports" --> RAG_Pipelines
RAG_Retrieval_Engine -- "serves" --> RAG_Pipelines
The BCEmbedding project's subsystem focuses on enhancing Retrieval Augmented Generation (RAG) capabilities through specialized reranking and robust data processing. At its core, the BCEmbedding Reranker Model provides advanced document reranking, integrated into various frameworks via Langchain Reranker Adapter and LlamaIndex Reranker Adapter. Data preparation for RAG pipelines begins with the PDF Data Extractor, which processes raw documents, and the QA Dataset Filter, which curates Datasets for quality. The RAG Pipelines component orchestrates the overall RAG process, relying on the RAG Retrieval Engine for efficient document fetching. This architecture ensures high-quality data input, optimized retrieval, and flexible integration with popular AI frameworks.
Integrates the BCEmbedding Reranker Model into Langchain's document processing pipeline, enhancing document relevance through reranking.
Related Classes/Methods:
Adapts the BCEmbedding Reranker Model for use within LlamaIndex's node post-processing, refining retrieved nodes for improved relevance.
Related Classes/Methods:
The core model providing document reranking capabilities, utilized by various framework-specific adapters.
Related Classes/Methods: None
Extracts raw text content from PDF documents, preparing unstructured data for subsequent processing in RAG pipelines.
Related Classes/Methods:
Curates and filters datasets to meet quality requirements for Question-Answering (QA) tasks, ensuring data suitability.
Related Classes/Methods:
Represents collections of raw or processed data, primarily QA datasets, used for evaluation and training within the RAG framework.
Related Classes/Methods: None
The high-level system orchestrating retrieval and generation processes, consuming processed data and leveraging the retrieval engine for RAG tasks.
Related Classes/Methods: None
Provides comprehensive document retrieval capabilities, offering both synchronous and asynchronous interfaces to fetch relevant documents for RAG tasks.
Related Classes/Methods: