Skip to content

Progress-infinitely/InsightData

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

InsightData

A natural language to SQL data analysis agent. Users ask business questions in natural language; the system extracts keywords, retrieves metadata context, generates and validates SQL, executes it, and streams back an LLM-generated summary with tabular results.

Architecture

User Question
    ↓
extract_keywords (jieba segmentation + LLM keyword extraction)
    ↓
┌──────────────────────────────────┐
│  recall_column  │  recall_value  │  recall_metric  │  (parallel)
└──────────────────────────────────┘
    ↓
merge_retrieved_info
    ↓
┌──────────────────────────────────┐
│  filter_table  │  filter_metric  │  (parallel)
└──────────────────────────────────┘
    ↓
add_extra_context → generate_sql → validate_sql
    ↓
execute_sql → [has result] → summarize_result → END
             → [no result]  → END

Each node retrieves schema metadata from MySQL (table structures, column descriptions, metric definitions) and vector search results from Qdrant/ES, then passes enriched context to the LLM for SQL generation. SQL is validated via EXPLAIN before execution. Results are summarized in natural language via LLM before streaming to the frontend.

Tech Stack

Layer Technology
Backend Python >= 3.12 · FastAPI · LangGraph · LangChain
Frontend Vue 3 · Vite
Storage MySQL · Qdrant · Elasticsearch
LLM / Embedding OpenAI-compatible API (DashScope / DeepSeek)
Keyword Extraction jieba (Chinese word segmentation)
ORM SQLAlchemy
Communication Server-Sent Events (SSE)

Project Structure

InsightData/
├── data-agent/                  # Python backend
│   ├── app/
│   │   ├── agent/               # LangGraph state graph and node implementations
│   │   │   ├── graph.py         # Graph definition (node wiring + conditional edges)
│   │   │   └── nodes/           # extract_keywords, recall_*, filter_*, generate_sql, etc.
│   │   ├── api/
│   │   │   ├── routers/         # FastAPI route handlers (query_router with SSE)
│   │   │   └── schemas/         # Request/response Pydantic models
│   │   ├── clients/             # Database + embedding client managers
│   │   ├── conf/                # Config structure definitions
│   │   ├── core/                # Lifespan management, logging, request context
│   │   ├── entities/            # Data entity definitions (column_info, metric_info, etc.)
│   │   ├── models/              # SQLAlchemy ORM models
│   │   ├── prompt/              # Prompt loading logic
│   │   ├── repositories/        # Data access layer (MySQL, Qdrant, ES)
│   │   ├── scripts/             # Meta knowledge initialization scripts
│   │   └── services/            # Business logic (query orchestration)
│   ├── conf/                    # Runtime config (app_config.yaml — gitignored)
│   ├── prompts/                 # LLM prompt templates (8 .prompt files)
│   ├── deployment/              # Docker compose + reference configs
│   ├── main.py                  # Application entry point
│   └── pyproject.toml           # Dependency definitions
├── data-agent-frontend/         # Vue 3 frontend
└── CLAUDE.md

Quick Start

Prerequisites

1. Start infrastructure

cd data-agent/deployment/docker
docker compose up -d
Service Address
MySQL localhost:3307
Elasticsearch localhost:9200
Kibana localhost:5601
Qdrant localhost:6333

2. Configure backend

cd data-agent
cp conf/app_config.sample.yaml conf/app_config.yaml

Edit conf/app_config.yaml:

  • MySQL credentials (db_meta / db_dw)
  • LLM API key and base URL (e.g., DashScope / DeepSeek)
  • Embedding service API key

3. Install dependencies

cd data-agent
uv sync --frozen

4. Initialize meta knowledge base

uv run python -m app.scripts.build_meta_knowledge -c conf/meta_config.yaml

This populates MySQL with table/column/metric definitions, builds Qdrant vector index, and creates ES search index.

5. Start backend

uv run uvicorn main:app --host 0.0.0.0 --port 8000 --reload

API docs: http://localhost:8000/docs

6. Start frontend

cd ../data-agent-frontend
npm ci
npm run dev

Open http://localhost:5173 and start asking business questions.

Example Questions

  • "统计2025年各地区的销售总额" (Total sales by region in 2025)
  • "哪个产品的销量最高" (Which product has the highest sales volume)
  • "各渠道的订单数量对比" (Order count comparison across channels)

Config Files

File Purpose
conf/app_config.yaml Runtime config with credentials (gitignored)
conf/app_config.sample.yaml Public sample config
conf/meta_config.yaml Meta knowledge base schema definitions
deployment/docker/docker-compose.yaml Infrastructure service orchestration

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages