CloudForge

From product requirements to live AWS infrastructure — graph-validated, Terraform-provisioned, GitHub-committed.

Built by Aakash Khepar, Bhavya Nimesh Shah, Aditya Jindal, and Gunbhir · Hackathon project, shipped end-to-end.

Overview

CloudForge turns a plain-English product requirements document (PRD) into a deployed AWS stack. It does not ask an LLM to invent an architecture. Instead, a knowledge graph of validated cloud patterns drives every decision — the LLM is used last, only to explain what the graph already derived.

The full pipeline runs in four stages, each streamed live to the browser:

PRD Refinement — an AI agent extracts NFRs, asks targeted clarifying questions (traffic tiers, SLAs, compliance), and produces a structured requirements JSON.
Architecture Planning — seven specialised sub-agents traverse a graph of cloud patterns, simulate load, detect failure modes, map compliance, and run deterministic architecture tests. A human-in-the-loop gate lets you review before proceeding.
Code & Terraform Generation — a multi-subgraph LangGraph pipeline generates main.tf, variables.tf, outputs.tf, and application code per service, validates everything through TFLint + Checkov + terraform validate, and auto-fixes errors (up to 3 retries).
Deploy — CloudFormation provisions your real AWS resources using temporary STS credentials. Generated code is committed directly to your GitHub repository.

Preview

PRD Refinement	Architecture Planning

Code & Terraform Generation	Live Deploy

Highlights

Graph-grounded architecture — NFRs are embedded, matched against a validated pattern graph via semantic retrieval, then verified by graph traversal. No architecture is LLM-invented.
Deterministic rule engine — arch_rules.py runs SPOF detection, BFS cascade-risk analysis, and per-hop latency budgeting using a lookup table of 30+ AWS services. These checks are not probabilistic.
Zero persistent cloud credentials — IAM role assumption (STS AssumeRole) at deploy time; any stored credentials are Fernet-encrypted (AES-128-CBC) and discarded after use.
Full-pipeline SSE streaming — every agent stage — PRD extraction, architecture reasoning, file-by-file code generation, live deploy logs — streams to the browser via Server-Sent Events.
GitHub-native delivery — generated scaffolds are committed under the user's own identity via GitHub App OAuth. No intermediary cloud storage.
Provider-agnostic IaC — an intermediate topology graph is rendered through a provider factory (providers/factory.py); AWS is live today, GCP and Azure are architected in.

Use Cases

Use Case	User	Outcome
Prototype a new SaaS backend	Solo developer	Architecture + Terraform + app scaffold in one session
Validate a PRD before sprint planning	Product / engineering team	Graph-tested architecture diagram with SPOF and compliance gaps surfaced
Generate IaC from an existing design	Platform / infra engineer	Validated Terraform HCL with Checkov CIS-AWS coverage
Explore multi-cloud cost tradeoffs	Architect	Provider-agnostic topology, cost estimates per service

Features

PRD Refinement (Agent 1)

Extracts non-functional requirements via local Ollama (Qwen)
Optional web research via TinyFish with DuckDuckGo fallback
Multi-choice clarification (2–4 options per question + freeform) modelled after GitHub Copilot planning mode
Up to 6 clarification rounds; outputs structured requirements JSON

Architecture Planning (Agent 2)

Service discovery — maps NFRs to concrete AWS services, optionally via Terraform MCP
Load simulation — estimates traffic tiers, scales components accordingly
Resilience simulation — per-failure-mode blast radius analysis
Compliance mapping — SOC 2, HIPAA, ISO 27001 gap detection
Deterministic tests — SPOF, cascade risk (BFS), latency budget; CRITICAL violations trigger an automatic architecture retry (up to 3 times)
Human-in-the-loop — LangGraph interrupt() pauses the graph for your review; accept or request changes

Code & Terraform Generation (Agent 3)

Generates main.tf, variables.tf, outputs.tf per service
Generates Python / TypeScript application code per service
Validation subgraph: terraform fmt → terraform validate → TFLint → Checkov (CIS AWS Benchmark)
LLM-driven fix loop — up to 3 retries per file before escalating
Test generation subgraph — produces unit tests per generated module

Deployment

Renders topology graph to provider-specific CloudFormation / Terraform
Temporary credentials via STS AssumeRole — no stored AWS access keys
Per-resource live status events over SSE (queued → provisioning → live)
One-click rollback via CloudFormation stack deletion
Generated code committed to user's GitHub repo under their identity

Security

JWT access tokens (30 min) + refresh tokens (7 days)
bcrypt password hashing; Fernet encryption for stored credentials
SlowAPI rate limiting on all auth endpoints
Server refuses to start if JWT_SECRET_KEY is the default placeholder

Tech Stack

Layer	Technology	Purpose
Frontend framework	Next.js 15 (App Router), React 19, TypeScript	Full-stack web app
Canvas	`@xyflow/react` v12	Interactive architecture diagram
State	Zustand v5 + persist middleware	Client-side session state
Animations	Framer Motion v11	Stage transitions, live status glows
Styling	Tailwind CSS v4	Utility-first design system
Backend framework	FastAPI 0.135+, Python 3.12+	Async API + SSE streaming
Agent framework	LangGraph 1.1+	Multi-node stateful agent graphs with `interrupt()`
LLM — local	Ollama (Qwen 3.5)	PRD refinement, code generation
LLM — cloud	Claude Sonnet via `langchain-anthropic`	Architecture planning
Graph database	Kuzu + sentence-transformers	Pattern storage, semantic retrieval
Database	MongoDB (Motor async)	Session, project, build, deployment persistence
IaC validation	TFLint, Checkov, `terraform validate`	HCL correctness + security scanning
Cloud SDK	Boto3, STS	CloudFormation provisioning, AssumeRole
Auth	PyJWT, bcrypt, Fernet (`cryptography`)	Token auth + credential encryption
Rate limiting	SlowAPI	Auth endpoint protection

Architecture

flowchart TD
    PRD[User PRD] --> A1

    subgraph A1["Agent 1 — PRD Refinement (LangGraph)"]
        direction LR
        ui[user_input] --> res[research]
        res --> ws[web_search]
        ws --> res
        res --> ig{info_gate}
        ig -->|needs input| aw[await_user interrupt]
        aw --> res
        ig -->|sufficient| pl[plan]
        pl --> acc[acceptance]
    end

    A1 -->|structured requirements JSON| A2

    subgraph A2["Agent 2 — Architecture Planner (LangGraph)"]
        direction LR
        ar[architecture] --> sd[service_discovery]
        sd --> as[arch_simulator]
        as --> rs[resilience_simulator]
        rs --> cp[compliance]
        cp --> at{arch_test}
        at -->|CRITICAL + retry < 3| ar
        at -->|pass| present[present interrupt]
        present -->|rejected| ar
        present -->|accepted| A2_END([END])
    end

    A2 --> A3

    subgraph A3["Agent 3 — Code & Terraform Generator (LangGraph)"]
        direction LR
        pi[parse_input] --> tg[tf_generator]
        tg --> tv["tf_validation_loop\n(fmt→validate→TFLint→Checkov→fix)"]
        tv --> orc[orchestrator]
        orc --> asm[assembler]
    end

    A3 --> DEP

    subgraph DEP["Deploy"]
        cfn[CloudFormation provisioning]
        sts[STS AssumeRole]
        gh[GitHub App commit]
        sse[SSE live events]
        cfn --- sts
        cfn --- gh
        cfn --- sse
    end

How It Works

Submit your PRD — paste a product requirements document into the editor. Agent 1 reads it, optionally searches the web for context, and extracts non-functional requirements (uptime, compliance, latency, traffic).
Answer clarifying questions — Agent 1 asks 2–6 targeted questions with predefined options and a freeform "Custom" option. Your answers are folded back into the next research iteration until the agent has enough signal.
Architecture is graph-derived — Agent 2 embeds the NFR set, retrieves matching patterns from a Kuzu knowledge graph, and traverses the graph to validate compatibility. Deterministic rules check for SPOFs, cascade blast radius, and latency budget. CRITICAL violations auto-retry the architecture loop.
Human review gate — LangGraph's interrupt() API pauses the pipeline and presents the architecture diagram with a summary of violations and component rationale. Accept it or request changes in plain English.
Terraform and code are generated and validated — Agent 3 generates HCL and application code per service, then runs terraform validate, TFLint, and Checkov in a validation subgraph. Errors feed an LLM fix loop (up to 3 retries per file).
Deploy to your AWS account — temporary STS credentials are obtained at deploy time via IAM role assumption. CloudFormation provisions each resource and streams status events live. Generated code is committed to your GitHub repository under your identity.

Setup

Prerequisites

Python 3.12+, uv package manager
Node.js 20+
MongoDB (local or Atlas)
Ollama running locally
AWS account with an IAM role configured for AssumeRole
Anthropic API key (for Agent 2)

Backend

cd backend

# Install dependencies
uv sync

# Pull the local LLM models
ollama pull qwen3.5:latest
ollama serve

# Copy and fill environment variables
cp .env.sample .env
# Required: JWT_SECRET_KEY, FERNET_KEY, ANTHROPIC_API_KEY, MONGODB_URL

# Generate keys
python -c "import secrets; print(secrets.token_hex(32))"          # → JWT_SECRET_KEY
python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"  # → FERNET_KEY

# Start the API
uv run uvicorn app.main:app --reload --port 8000

Frontend

cd frontend
npm install
npm run dev      # → http://localhost:3000

Environment Variables

Variable	Required	Description
`JWT_SECRET_KEY`	Yes	64-char hex secret. Server refuses to start without it.
`FERNET_KEY`	Yes	Fernet key for encrypting stored cloud credentials
`ANTHROPIC_API_KEY`	Yes	Claude Sonnet API key (Agent 2)
`MONGODB_URL`	Yes	MongoDB connection string
`OLLAMA_BASE_URL`	Yes	Ollama endpoint (default: `http://localhost:11434`)
`QWEN_MODEL`	No	Local model for Agent 1/3 (default: `qwen3.5:latest`)
`GITHUB_CLIENT_ID`	No	GitHub OAuth App client ID
`GITHUB_CLIENT_SECRET`	No	GitHub OAuth App client secret
`ENABLE_WEB_SEARCH`	No	Enable TinyFish/DuckDuckGo research in Agent 1

Run Tests

cd backend

# Standalone smoke test (Agent 1, includes multi-choice option handling)
uv run python -m app.agents.agent1.standalone_smoke_test

# pytest
uv run pytest

Usage

API — Start a PRD workflow (SSE stream)

# 1. Register and login
curl -X POST http://localhost:8000/auth/register \
  -H "Content-Type: application/json" \
  -d '{"username": "alice", "password": "secret123"}'

TOKEN=$(curl -s -X POST http://localhost:8000/auth/login \
  -H "Content-Type: application/json" \
  -d '{"username": "alice", "password": "secret123"}' | jq -r .access_token)

# 2. Create a project
PROJECT_ID=$(curl -s -X POST http://localhost:8000/projects/ \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name": "my-api"}' | jq -r .id)

# 3. Start PRD refinement (SSE stream)
curl -N -X POST "http://localhost:8000/workflows/prd/v2/start/$PROJECT_ID" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"prd_text": "Build a SaaS image API on AWS with multi-tenant auth, SOC 2, 99.9% uptime."}'

SSE event types

// NFR constraint extracted
{"type": "constraint", "chip": {"label": "99.9% uptime", "category": "availability"}}

// Clarification questions (with options)
{"type": "questions", "questions_with_options": [{"question": "Expected traffic?", "options": [...]}]}

// Architecture diagram ready
{"type": "complete", "architecture_diagram": {"nodes": [...], "connections": [...]}}

// File generated
{"type": "file", "path": "main.tf", "content": "...", "language": "hcl"}

// Resource status during deploy
{"type": "node_status", "nodeId": "lambda-1", "status": "live"}

Key Decisions

Decision	Rationale	Tradeoff
Knowledge graph for architecture	Eliminates LLM hallucinations on infrastructure decisions; graph traversal is deterministic and auditable	Graph must be kept current as AWS services evolve
LangGraph with `interrupt()`	Native human-in-the-loop without custom state machines; resumes across HTTP requests via MemorySaver checkpoint	Requires thread_id per session; adds checkpoint storage overhead
Kuzu for graph storage	Embeddable property graph DB, no separate server; supports Cypher queries	Less ecosystem tooling than Neo4j
STS AssumeRole, no stored keys	Eliminates a whole class of credential-leak risk	Requires users to pre-configure a trust policy in their AWS account
Fernet for credential storage	AES-128-CBC, well-audited Python-native; keys can be rotated	Symmetric — key compromise exposes all stored credentials
Multi-subgraph Agent 3	`tf_validation_loop` and `code_generation_loop` are isolated subgraphs with their own state; cleaner retry semantics	Requires explicit state mapping at subgraph boundaries
Ollama for Agent 1 & 3	Keeps PRD refinement and code generation free (no API cost)	Cold-start latency on first token; quality ceiling below Claude
Provider factory pattern	Topology graph is cloud-agnostic; only the render layer knows the target provider	Currently only the AWS renderer is fully implemented

Innovation / Notable Work

Graph-grounded architecture reasoning. Rather than prompting an LLM to "design an architecture," CloudForge retrieves patterns from a structured knowledge graph using semantic similarity on the extracted NFR set, then validates compatibility via graph traversal. The LLM only runs as the final step to produce a human-readable explanation. This makes architecture decisions auditable and reproducible — re-running the same PRD with the same graph produces the same recommendation.

Deterministic rule engine as a hard gate. arch_rules.py implements SPOF detection (single stateful nodes with multiple compute inputs), BFS-based cascade failure propagation, and per-hop latency budget analysis using a lookup table of ~30 AWS service categories. These checks run before any human review and can trigger an automatic architecture retry loop. The engine is purely algorithmic — no LLM involved.

Self-healing IaC generation. Agent 3 runs generated Terraform through a four-stage validation pipeline and feeds failures back to the LLM with structured error context for targeted fixes. The retry loop runs up to three times per file before escalating, making the generation process robust to common HCL mistakes without human intervention.

Zero-credential deploy model. CloudForge never asks for or stores AWS access keys. Deploy-time credentials are obtained exclusively via IAM role assumption (STS AssumeRole), and any intermediate credential material is discarded after the CloudFormation stack call returns. Credentials that must be stored (user-initiated deployments) are encrypted with Fernet and decrypted only at call time.

Potential Metrics to Track

Metric	Why It Matters
Architecture-to-deploy success rate	Measures end-to-end reliability of the full pipeline
Terraform validation pass rate (first attempt)	Indicator of code generation quality before fix retries
SPOF / violation detection rate per PRD	Validates usefulness of the deterministic rule engine
Mean time from PRD submission to live stack	Core product performance indicator
Fix loop retry distribution (0 / 1 / 2 / 3 retries)	Guides investment in LLM prompt quality vs. validation tooling

Quality

Typed state everywhere — LangGraph state schemas use Pydantic models and TypedDict; invalid state transitions are caught at compile time.
Fail-fast server startup — FastAPI lifespan validates JWT_SECRET_KEY and FERNET_KEY before accepting any requests.
Structured logging — all agents use Python logging; deploy events are persisted to the deployments MongoDB collection.
IaC security scanning — Checkov runs the CIS AWS Benchmark against every generated Terraform file before deployment.
Rate limiting — all auth endpoints are protected by SlowAPI (5 requests/minute for registration).
Smoke tests — standalone_smoke_test.py exercises Agent 1 across multiple cloud providers and option-selection modes without requiring a running server.

Roadmap

GCP and Azure renderers — the provider factory is in place; completing the GCP and Azure Terraform renderers would make the pipeline fully multi-cloud.
Persistent graph updates — allow the architecture knowledge graph to be extended with org-specific patterns and past deployment outcomes.
Cost estimation integration — the cost_fetchers/ module has AWS, GCP, and Azure stubs; wiring real pricing API data into the architecture review step would surface cost tradeoffs before provisioning.
Agent 2 streaming — architecture planning currently resolves in one shot; streaming sub-agent progress (service discovery → simulation → compliance → test) would improve perceived responsiveness.
Drift detection — compare the generated Terraform state with the actual CloudFormation stack on subsequent sessions and surface configuration drift.

About

CloudForge was built at a hackathon to address a real friction point: generating cloud infrastructure requires deep expertise, but the decisions are largely pattern-matching against known constraints. By encoding those patterns in a graph and running deterministic validation before any LLM call, CloudForge makes infrastructure generation auditable rather than probabilistic — closer to a compiler than a chatbot.

Name		Name	Last commit message	Last commit date
Latest commit History 139 Commits
.claude		.claude
.playwright-mcp		.playwright-mcp
backend		backend
docs/screenshots		docs/screenshots
frontend		frontend
.gitignore		.gitignore
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

CloudForge

Overview

Preview

Highlights

Use Cases

Features

PRD Refinement (Agent 1)

Architecture Planning (Agent 2)

Code & Terraform Generation (Agent 3)

Deployment

Security

Tech Stack

Architecture

How It Works

Setup

Prerequisites

Backend

Frontend

Environment Variables

Run Tests

Usage

API — Start a PRD workflow (SSE stream)

SSE event types

Key Decisions

Innovation / Notable Work

Potential Metrics to Track

Quality

Roadmap

About

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages