From product requirements to live AWS infrastructure — graph-validated, Terraform-provisioned, GitHub-committed.
Built by Aakash Khepar, Bhavya Nimesh Shah, Aditya Jindal, and Gunbhir · Hackathon project, shipped end-to-end.
CloudForge turns a plain-English product requirements document (PRD) into a deployed AWS stack. It does not ask an LLM to invent an architecture. Instead, a knowledge graph of validated cloud patterns drives every decision — the LLM is used last, only to explain what the graph already derived.
The full pipeline runs in four stages, each streamed live to the browser:
- PRD Refinement — an AI agent extracts NFRs, asks targeted clarifying questions (traffic tiers, SLAs, compliance), and produces a structured requirements JSON.
- Architecture Planning — seven specialised sub-agents traverse a graph of cloud patterns, simulate load, detect failure modes, map compliance, and run deterministic architecture tests. A human-in-the-loop gate lets you review before proceeding.
- Code & Terraform Generation — a multi-subgraph LangGraph pipeline generates
main.tf,variables.tf,outputs.tf, and application code per service, validates everything through TFLint + Checkov +terraform validate, and auto-fixes errors (up to 3 retries). - Deploy — CloudFormation provisions your real AWS resources using temporary STS credentials. Generated code is committed directly to your GitHub repository.
| PRD Refinement | Architecture Planning |
|---|---|
![]() |
![]() |
| Code & Terraform Generation | Live Deploy |
|---|---|
![]() |
![]() |
- Graph-grounded architecture — NFRs are embedded, matched against a validated pattern graph via semantic retrieval, then verified by graph traversal. No architecture is LLM-invented.
- Deterministic rule engine —
arch_rules.pyruns SPOF detection, BFS cascade-risk analysis, and per-hop latency budgeting using a lookup table of 30+ AWS services. These checks are not probabilistic. - Zero persistent cloud credentials — IAM role assumption (STS
AssumeRole) at deploy time; any stored credentials are Fernet-encrypted (AES-128-CBC) and discarded after use. - Full-pipeline SSE streaming — every agent stage — PRD extraction, architecture reasoning, file-by-file code generation, live deploy logs — streams to the browser via Server-Sent Events.
- GitHub-native delivery — generated scaffolds are committed under the user's own identity via GitHub App OAuth. No intermediary cloud storage.
- Provider-agnostic IaC — an intermediate topology graph is rendered through a provider factory (
providers/factory.py); AWS is live today, GCP and Azure are architected in.
| Use Case | User | Outcome |
|---|---|---|
| Prototype a new SaaS backend | Solo developer | Architecture + Terraform + app scaffold in one session |
| Validate a PRD before sprint planning | Product / engineering team | Graph-tested architecture diagram with SPOF and compliance gaps surfaced |
| Generate IaC from an existing design | Platform / infra engineer | Validated Terraform HCL with Checkov CIS-AWS coverage |
| Explore multi-cloud cost tradeoffs | Architect | Provider-agnostic topology, cost estimates per service |
- Extracts non-functional requirements via local Ollama (Qwen)
- Optional web research via TinyFish with DuckDuckGo fallback
- Multi-choice clarification (2–4 options per question + freeform) modelled after GitHub Copilot planning mode
- Up to 6 clarification rounds; outputs structured requirements JSON
- Service discovery — maps NFRs to concrete AWS services, optionally via Terraform MCP
- Load simulation — estimates traffic tiers, scales components accordingly
- Resilience simulation — per-failure-mode blast radius analysis
- Compliance mapping — SOC 2, HIPAA, ISO 27001 gap detection
- Deterministic tests — SPOF, cascade risk (BFS), latency budget; CRITICAL violations trigger an automatic architecture retry (up to 3 times)
- Human-in-the-loop — LangGraph
interrupt()pauses the graph for your review; accept or request changes
- Generates
main.tf,variables.tf,outputs.tfper service - Generates Python / TypeScript application code per service
- Validation subgraph:
terraform fmt→terraform validate→ TFLint → Checkov (CIS AWS Benchmark) - LLM-driven fix loop — up to 3 retries per file before escalating
- Test generation subgraph — produces unit tests per generated module
- Renders topology graph to provider-specific CloudFormation / Terraform
- Temporary credentials via STS
AssumeRole— no stored AWS access keys - Per-resource live status events over SSE (
queued→provisioning→live) - One-click rollback via CloudFormation stack deletion
- Generated code committed to user's GitHub repo under their identity
- JWT access tokens (30 min) + refresh tokens (7 days)
- bcrypt password hashing; Fernet encryption for stored credentials
- SlowAPI rate limiting on all auth endpoints
- Server refuses to start if
JWT_SECRET_KEYis the default placeholder
| Layer | Technology | Purpose |
|---|---|---|
| Frontend framework | Next.js 15 (App Router), React 19, TypeScript | Full-stack web app |
| Canvas | @xyflow/react v12 |
Interactive architecture diagram |
| State | Zustand v5 + persist middleware | Client-side session state |
| Animations | Framer Motion v11 | Stage transitions, live status glows |
| Styling | Tailwind CSS v4 | Utility-first design system |
| Backend framework | FastAPI 0.135+, Python 3.12+ | Async API + SSE streaming |
| Agent framework | LangGraph 1.1+ | Multi-node stateful agent graphs with interrupt() |
| LLM — local | Ollama (Qwen 3.5) | PRD refinement, code generation |
| LLM — cloud | Claude Sonnet via langchain-anthropic |
Architecture planning |
| Graph database | Kuzu + sentence-transformers | Pattern storage, semantic retrieval |
| Database | MongoDB (Motor async) | Session, project, build, deployment persistence |
| IaC validation | TFLint, Checkov, terraform validate |
HCL correctness + security scanning |
| Cloud SDK | Boto3, STS | CloudFormation provisioning, AssumeRole |
| Auth | PyJWT, bcrypt, Fernet (cryptography) |
Token auth + credential encryption |
| Rate limiting | SlowAPI | Auth endpoint protection |
flowchart TD
PRD[User PRD] --> A1
subgraph A1["Agent 1 — PRD Refinement (LangGraph)"]
direction LR
ui[user_input] --> res[research]
res --> ws[web_search]
ws --> res
res --> ig{info_gate}
ig -->|needs input| aw[await_user interrupt]
aw --> res
ig -->|sufficient| pl[plan]
pl --> acc[acceptance]
end
A1 -->|structured requirements JSON| A2
subgraph A2["Agent 2 — Architecture Planner (LangGraph)"]
direction LR
ar[architecture] --> sd[service_discovery]
sd --> as[arch_simulator]
as --> rs[resilience_simulator]
rs --> cp[compliance]
cp --> at{arch_test}
at -->|CRITICAL + retry < 3| ar
at -->|pass| present[present interrupt]
present -->|rejected| ar
present -->|accepted| A2_END([END])
end
A2 --> A3
subgraph A3["Agent 3 — Code & Terraform Generator (LangGraph)"]
direction LR
pi[parse_input] --> tg[tf_generator]
tg --> tv["tf_validation_loop\n(fmt→validate→TFLint→Checkov→fix)"]
tv --> orc[orchestrator]
orc --> asm[assembler]
end
A3 --> DEP
subgraph DEP["Deploy"]
cfn[CloudFormation provisioning]
sts[STS AssumeRole]
gh[GitHub App commit]
sse[SSE live events]
cfn --- sts
cfn --- gh
cfn --- sse
end
-
Submit your PRD — paste a product requirements document into the editor. Agent 1 reads it, optionally searches the web for context, and extracts non-functional requirements (uptime, compliance, latency, traffic).
-
Answer clarifying questions — Agent 1 asks 2–6 targeted questions with predefined options and a freeform "Custom" option. Your answers are folded back into the next research iteration until the agent has enough signal.
-
Architecture is graph-derived — Agent 2 embeds the NFR set, retrieves matching patterns from a Kuzu knowledge graph, and traverses the graph to validate compatibility. Deterministic rules check for SPOFs, cascade blast radius, and latency budget. CRITICAL violations auto-retry the architecture loop.
-
Human review gate — LangGraph's
interrupt()API pauses the pipeline and presents the architecture diagram with a summary of violations and component rationale. Accept it or request changes in plain English. -
Terraform and code are generated and validated — Agent 3 generates HCL and application code per service, then runs
terraform validate, TFLint, and Checkov in a validation subgraph. Errors feed an LLM fix loop (up to 3 retries per file). -
Deploy to your AWS account — temporary STS credentials are obtained at deploy time via IAM role assumption. CloudFormation provisions each resource and streams status events live. Generated code is committed to your GitHub repository under your identity.
- Python 3.12+,
uvpackage manager - Node.js 20+
- MongoDB (local or Atlas)
- Ollama running locally
- AWS account with an IAM role configured for
AssumeRole - Anthropic API key (for Agent 2)
cd backend
# Install dependencies
uv sync
# Pull the local LLM models
ollama pull qwen3.5:latest
ollama serve
# Copy and fill environment variables
cp .env.sample .env
# Required: JWT_SECRET_KEY, FERNET_KEY, ANTHROPIC_API_KEY, MONGODB_URL
# Generate keys
python -c "import secrets; print(secrets.token_hex(32))" # → JWT_SECRET_KEY
python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())" # → FERNET_KEY
# Start the API
uv run uvicorn app.main:app --reload --port 8000cd frontend
npm install
npm run dev # → http://localhost:3000| Variable | Required | Description |
|---|---|---|
JWT_SECRET_KEY |
Yes | 64-char hex secret. Server refuses to start without it. |
FERNET_KEY |
Yes | Fernet key for encrypting stored cloud credentials |
ANTHROPIC_API_KEY |
Yes | Claude Sonnet API key (Agent 2) |
MONGODB_URL |
Yes | MongoDB connection string |
OLLAMA_BASE_URL |
Yes | Ollama endpoint (default: http://localhost:11434) |
QWEN_MODEL |
No | Local model for Agent 1/3 (default: qwen3.5:latest) |
GITHUB_CLIENT_ID |
No | GitHub OAuth App client ID |
GITHUB_CLIENT_SECRET |
No | GitHub OAuth App client secret |
ENABLE_WEB_SEARCH |
No | Enable TinyFish/DuckDuckGo research in Agent 1 |
cd backend
# Standalone smoke test (Agent 1, includes multi-choice option handling)
uv run python -m app.agents.agent1.standalone_smoke_test
# pytest
uv run pytest# 1. Register and login
curl -X POST http://localhost:8000/auth/register \
-H "Content-Type: application/json" \
-d '{"username": "alice", "password": "secret123"}'
TOKEN=$(curl -s -X POST http://localhost:8000/auth/login \
-H "Content-Type: application/json" \
-d '{"username": "alice", "password": "secret123"}' | jq -r .access_token)
# 2. Create a project
PROJECT_ID=$(curl -s -X POST http://localhost:8000/projects/ \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"name": "my-api"}' | jq -r .id)
# 3. Start PRD refinement (SSE stream)
curl -N -X POST "http://localhost:8000/workflows/prd/v2/start/$PROJECT_ID" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"prd_text": "Build a SaaS image API on AWS with multi-tenant auth, SOC 2, 99.9% uptime."}'// NFR constraint extracted
{"type": "constraint", "chip": {"label": "99.9% uptime", "category": "availability"}}
// Clarification questions (with options)
{"type": "questions", "questions_with_options": [{"question": "Expected traffic?", "options": [...]}]}
// Architecture diagram ready
{"type": "complete", "architecture_diagram": {"nodes": [...], "connections": [...]}}
// File generated
{"type": "file", "path": "main.tf", "content": "...", "language": "hcl"}
// Resource status during deploy
{"type": "node_status", "nodeId": "lambda-1", "status": "live"}| Decision | Rationale | Tradeoff |
|---|---|---|
| Knowledge graph for architecture | Eliminates LLM hallucinations on infrastructure decisions; graph traversal is deterministic and auditable | Graph must be kept current as AWS services evolve |
LangGraph with interrupt() |
Native human-in-the-loop without custom state machines; resumes across HTTP requests via MemorySaver checkpoint | Requires thread_id per session; adds checkpoint storage overhead |
| Kuzu for graph storage | Embeddable property graph DB, no separate server; supports Cypher queries | Less ecosystem tooling than Neo4j |
| STS AssumeRole, no stored keys | Eliminates a whole class of credential-leak risk | Requires users to pre-configure a trust policy in their AWS account |
| Fernet for credential storage | AES-128-CBC, well-audited Python-native; keys can be rotated | Symmetric — key compromise exposes all stored credentials |
| Multi-subgraph Agent 3 | tf_validation_loop and code_generation_loop are isolated subgraphs with their own state; cleaner retry semantics |
Requires explicit state mapping at subgraph boundaries |
| Ollama for Agent 1 & 3 | Keeps PRD refinement and code generation free (no API cost) | Cold-start latency on first token; quality ceiling below Claude |
| Provider factory pattern | Topology graph is cloud-agnostic; only the render layer knows the target provider | Currently only the AWS renderer is fully implemented |
Graph-grounded architecture reasoning. Rather than prompting an LLM to "design an architecture," CloudForge retrieves patterns from a structured knowledge graph using semantic similarity on the extracted NFR set, then validates compatibility via graph traversal. The LLM only runs as the final step to produce a human-readable explanation. This makes architecture decisions auditable and reproducible — re-running the same PRD with the same graph produces the same recommendation.
Deterministic rule engine as a hard gate. arch_rules.py implements SPOF detection (single stateful nodes with multiple compute inputs), BFS-based cascade failure propagation, and per-hop latency budget analysis using a lookup table of ~30 AWS service categories. These checks run before any human review and can trigger an automatic architecture retry loop. The engine is purely algorithmic — no LLM involved.
Self-healing IaC generation. Agent 3 runs generated Terraform through a four-stage validation pipeline and feeds failures back to the LLM with structured error context for targeted fixes. The retry loop runs up to three times per file before escalating, making the generation process robust to common HCL mistakes without human intervention.
Zero-credential deploy model. CloudForge never asks for or stores AWS access keys. Deploy-time credentials are obtained exclusively via IAM role assumption (STS AssumeRole), and any intermediate credential material is discarded after the CloudFormation stack call returns. Credentials that must be stored (user-initiated deployments) are encrypted with Fernet and decrypted only at call time.
| Metric | Why It Matters |
|---|---|
| Architecture-to-deploy success rate | Measures end-to-end reliability of the full pipeline |
| Terraform validation pass rate (first attempt) | Indicator of code generation quality before fix retries |
| SPOF / violation detection rate per PRD | Validates usefulness of the deterministic rule engine |
| Mean time from PRD submission to live stack | Core product performance indicator |
| Fix loop retry distribution (0 / 1 / 2 / 3 retries) | Guides investment in LLM prompt quality vs. validation tooling |
- Typed state everywhere — LangGraph state schemas use Pydantic models and
TypedDict; invalid state transitions are caught at compile time. - Fail-fast server startup — FastAPI
lifespanvalidatesJWT_SECRET_KEYandFERNET_KEYbefore accepting any requests. - Structured logging — all agents use Python
logging; deploy events are persisted to thedeploymentsMongoDB collection. - IaC security scanning — Checkov runs the CIS AWS Benchmark against every generated Terraform file before deployment.
- Rate limiting — all auth endpoints are protected by SlowAPI (5 requests/minute for registration).
- Smoke tests —
standalone_smoke_test.pyexercises Agent 1 across multiple cloud providers and option-selection modes without requiring a running server.
- GCP and Azure renderers — the provider factory is in place; completing the GCP and Azure Terraform renderers would make the pipeline fully multi-cloud.
- Persistent graph updates — allow the architecture knowledge graph to be extended with org-specific patterns and past deployment outcomes.
- Cost estimation integration — the
cost_fetchers/module has AWS, GCP, and Azure stubs; wiring real pricing API data into the architecture review step would surface cost tradeoffs before provisioning. - Agent 2 streaming — architecture planning currently resolves in one shot; streaming sub-agent progress (service discovery → simulation → compliance → test) would improve perceived responsiveness.
- Drift detection — compare the generated Terraform state with the actual CloudFormation stack on subsequent sessions and surface configuration drift.
CloudForge was built at a hackathon to address a real friction point: generating cloud infrastructure requires deep expertise, but the decisions are largely pattern-matching against known constraints. By encoding those patterns in a graph and running deterministic validation before any LLM call, CloudForge makes infrastructure generation auditable rather than probabilistic — closer to a compiler than a chatbot.




