Skip to content

ak-asu/cloudforge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

139 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CloudForge

From product requirements to live AWS infrastructure — graph-validated, Terraform-provisioned, GitHub-committed.

Python FastAPI Next.js LangGraph Terraform MongoDB

Built by Aakash Khepar, Bhavya Nimesh Shah, Aditya Jindal, and Gunbhir · Hackathon project, shipped end-to-end.


Overview

CloudForge turns a plain-English product requirements document (PRD) into a deployed AWS stack. It does not ask an LLM to invent an architecture. Instead, a knowledge graph of validated cloud patterns drives every decision — the LLM is used last, only to explain what the graph already derived.

The full pipeline runs in four stages, each streamed live to the browser:

  1. PRD Refinement — an AI agent extracts NFRs, asks targeted clarifying questions (traffic tiers, SLAs, compliance), and produces a structured requirements JSON.
  2. Architecture Planning — seven specialised sub-agents traverse a graph of cloud patterns, simulate load, detect failure modes, map compliance, and run deterministic architecture tests. A human-in-the-loop gate lets you review before proceeding.
  3. Code & Terraform Generation — a multi-subgraph LangGraph pipeline generates main.tf, variables.tf, outputs.tf, and application code per service, validates everything through TFLint + Checkov + terraform validate, and auto-fixes errors (up to 3 retries).
  4. Deploy — CloudFormation provisions your real AWS resources using temporary STS credentials. Generated code is committed directly to your GitHub repository.

Preview

Hero

PRD Refinement Architecture Planning
PRD Arch
Code & Terraform Generation Live Deploy
Build Deploy

Highlights

  • Graph-grounded architecture — NFRs are embedded, matched against a validated pattern graph via semantic retrieval, then verified by graph traversal. No architecture is LLM-invented.
  • Deterministic rule enginearch_rules.py runs SPOF detection, BFS cascade-risk analysis, and per-hop latency budgeting using a lookup table of 30+ AWS services. These checks are not probabilistic.
  • Zero persistent cloud credentials — IAM role assumption (STS AssumeRole) at deploy time; any stored credentials are Fernet-encrypted (AES-128-CBC) and discarded after use.
  • Full-pipeline SSE streaming — every agent stage — PRD extraction, architecture reasoning, file-by-file code generation, live deploy logs — streams to the browser via Server-Sent Events.
  • GitHub-native delivery — generated scaffolds are committed under the user's own identity via GitHub App OAuth. No intermediary cloud storage.
  • Provider-agnostic IaC — an intermediate topology graph is rendered through a provider factory (providers/factory.py); AWS is live today, GCP and Azure are architected in.

Use Cases

Use Case User Outcome
Prototype a new SaaS backend Solo developer Architecture + Terraform + app scaffold in one session
Validate a PRD before sprint planning Product / engineering team Graph-tested architecture diagram with SPOF and compliance gaps surfaced
Generate IaC from an existing design Platform / infra engineer Validated Terraform HCL with Checkov CIS-AWS coverage
Explore multi-cloud cost tradeoffs Architect Provider-agnostic topology, cost estimates per service

Features

PRD Refinement (Agent 1)

  • Extracts non-functional requirements via local Ollama (Qwen)
  • Optional web research via TinyFish with DuckDuckGo fallback
  • Multi-choice clarification (2–4 options per question + freeform) modelled after GitHub Copilot planning mode
  • Up to 6 clarification rounds; outputs structured requirements JSON

Architecture Planning (Agent 2)

  • Service discovery — maps NFRs to concrete AWS services, optionally via Terraform MCP
  • Load simulation — estimates traffic tiers, scales components accordingly
  • Resilience simulation — per-failure-mode blast radius analysis
  • Compliance mapping — SOC 2, HIPAA, ISO 27001 gap detection
  • Deterministic tests — SPOF, cascade risk (BFS), latency budget; CRITICAL violations trigger an automatic architecture retry (up to 3 times)
  • Human-in-the-loop — LangGraph interrupt() pauses the graph for your review; accept or request changes

Code & Terraform Generation (Agent 3)

  • Generates main.tf, variables.tf, outputs.tf per service
  • Generates Python / TypeScript application code per service
  • Validation subgraph: terraform fmtterraform validate → TFLint → Checkov (CIS AWS Benchmark)
  • LLM-driven fix loop — up to 3 retries per file before escalating
  • Test generation subgraph — produces unit tests per generated module

Deployment

  • Renders topology graph to provider-specific CloudFormation / Terraform
  • Temporary credentials via STS AssumeRole — no stored AWS access keys
  • Per-resource live status events over SSE (queuedprovisioninglive)
  • One-click rollback via CloudFormation stack deletion
  • Generated code committed to user's GitHub repo under their identity

Security

  • JWT access tokens (30 min) + refresh tokens (7 days)
  • bcrypt password hashing; Fernet encryption for stored credentials
  • SlowAPI rate limiting on all auth endpoints
  • Server refuses to start if JWT_SECRET_KEY is the default placeholder

Tech Stack

Layer Technology Purpose
Frontend framework Next.js 15 (App Router), React 19, TypeScript Full-stack web app
Canvas @xyflow/react v12 Interactive architecture diagram
State Zustand v5 + persist middleware Client-side session state
Animations Framer Motion v11 Stage transitions, live status glows
Styling Tailwind CSS v4 Utility-first design system
Backend framework FastAPI 0.135+, Python 3.12+ Async API + SSE streaming
Agent framework LangGraph 1.1+ Multi-node stateful agent graphs with interrupt()
LLM — local Ollama (Qwen 3.5) PRD refinement, code generation
LLM — cloud Claude Sonnet via langchain-anthropic Architecture planning
Graph database Kuzu + sentence-transformers Pattern storage, semantic retrieval
Database MongoDB (Motor async) Session, project, build, deployment persistence
IaC validation TFLint, Checkov, terraform validate HCL correctness + security scanning
Cloud SDK Boto3, STS CloudFormation provisioning, AssumeRole
Auth PyJWT, bcrypt, Fernet (cryptography) Token auth + credential encryption
Rate limiting SlowAPI Auth endpoint protection

Architecture

flowchart TD
    PRD[User PRD] --> A1

    subgraph A1["Agent 1 — PRD Refinement (LangGraph)"]
        direction LR
        ui[user_input] --> res[research]
        res --> ws[web_search]
        ws --> res
        res --> ig{info_gate}
        ig -->|needs input| aw[await_user interrupt]
        aw --> res
        ig -->|sufficient| pl[plan]
        pl --> acc[acceptance]
    end

    A1 -->|structured requirements JSON| A2

    subgraph A2["Agent 2 — Architecture Planner (LangGraph)"]
        direction LR
        ar[architecture] --> sd[service_discovery]
        sd --> as[arch_simulator]
        as --> rs[resilience_simulator]
        rs --> cp[compliance]
        cp --> at{arch_test}
        at -->|CRITICAL + retry < 3| ar
        at -->|pass| present[present interrupt]
        present -->|rejected| ar
        present -->|accepted| A2_END([END])
    end

    A2 --> A3

    subgraph A3["Agent 3 — Code & Terraform Generator (LangGraph)"]
        direction LR
        pi[parse_input] --> tg[tf_generator]
        tg --> tv["tf_validation_loop\n(fmt→validate→TFLint→Checkov→fix)"]
        tv --> orc[orchestrator]
        orc --> asm[assembler]
    end

    A3 --> DEP

    subgraph DEP["Deploy"]
        cfn[CloudFormation provisioning]
        sts[STS AssumeRole]
        gh[GitHub App commit]
        sse[SSE live events]
        cfn --- sts
        cfn --- gh
        cfn --- sse
    end
Loading

How It Works

  1. Submit your PRD — paste a product requirements document into the editor. Agent 1 reads it, optionally searches the web for context, and extracts non-functional requirements (uptime, compliance, latency, traffic).

  2. Answer clarifying questions — Agent 1 asks 2–6 targeted questions with predefined options and a freeform "Custom" option. Your answers are folded back into the next research iteration until the agent has enough signal.

  3. Architecture is graph-derived — Agent 2 embeds the NFR set, retrieves matching patterns from a Kuzu knowledge graph, and traverses the graph to validate compatibility. Deterministic rules check for SPOFs, cascade blast radius, and latency budget. CRITICAL violations auto-retry the architecture loop.

  4. Human review gate — LangGraph's interrupt() API pauses the pipeline and presents the architecture diagram with a summary of violations and component rationale. Accept it or request changes in plain English.

  5. Terraform and code are generated and validated — Agent 3 generates HCL and application code per service, then runs terraform validate, TFLint, and Checkov in a validation subgraph. Errors feed an LLM fix loop (up to 3 retries per file).

  6. Deploy to your AWS account — temporary STS credentials are obtained at deploy time via IAM role assumption. CloudFormation provisions each resource and streams status events live. Generated code is committed to your GitHub repository under your identity.


Setup

Prerequisites

  • Python 3.12+, uv package manager
  • Node.js 20+
  • MongoDB (local or Atlas)
  • Ollama running locally
  • AWS account with an IAM role configured for AssumeRole
  • Anthropic API key (for Agent 2)

Backend

cd backend

# Install dependencies
uv sync

# Pull the local LLM models
ollama pull qwen3.5:latest
ollama serve

# Copy and fill environment variables
cp .env.sample .env
# Required: JWT_SECRET_KEY, FERNET_KEY, ANTHROPIC_API_KEY, MONGODB_URL

# Generate keys
python -c "import secrets; print(secrets.token_hex(32))"          # → JWT_SECRET_KEY
python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"  # → FERNET_KEY

# Start the API
uv run uvicorn app.main:app --reload --port 8000

Frontend

cd frontend
npm install
npm run dev      # → http://localhost:3000

Environment Variables

Variable Required Description
JWT_SECRET_KEY Yes 64-char hex secret. Server refuses to start without it.
FERNET_KEY Yes Fernet key for encrypting stored cloud credentials
ANTHROPIC_API_KEY Yes Claude Sonnet API key (Agent 2)
MONGODB_URL Yes MongoDB connection string
OLLAMA_BASE_URL Yes Ollama endpoint (default: http://localhost:11434)
QWEN_MODEL No Local model for Agent 1/3 (default: qwen3.5:latest)
GITHUB_CLIENT_ID No GitHub OAuth App client ID
GITHUB_CLIENT_SECRET No GitHub OAuth App client secret
ENABLE_WEB_SEARCH No Enable TinyFish/DuckDuckGo research in Agent 1

Run Tests

cd backend

# Standalone smoke test (Agent 1, includes multi-choice option handling)
uv run python -m app.agents.agent1.standalone_smoke_test

# pytest
uv run pytest

Usage

API — Start a PRD workflow (SSE stream)

# 1. Register and login
curl -X POST http://localhost:8000/auth/register \
  -H "Content-Type: application/json" \
  -d '{"username": "alice", "password": "secret123"}'

TOKEN=$(curl -s -X POST http://localhost:8000/auth/login \
  -H "Content-Type: application/json" \
  -d '{"username": "alice", "password": "secret123"}' | jq -r .access_token)

# 2. Create a project
PROJECT_ID=$(curl -s -X POST http://localhost:8000/projects/ \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name": "my-api"}' | jq -r .id)

# 3. Start PRD refinement (SSE stream)
curl -N -X POST "http://localhost:8000/workflows/prd/v2/start/$PROJECT_ID" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"prd_text": "Build a SaaS image API on AWS with multi-tenant auth, SOC 2, 99.9% uptime."}'

SSE event types

// NFR constraint extracted
{"type": "constraint", "chip": {"label": "99.9% uptime", "category": "availability"}}

// Clarification questions (with options)
{"type": "questions", "questions_with_options": [{"question": "Expected traffic?", "options": [...]}]}

// Architecture diagram ready
{"type": "complete", "architecture_diagram": {"nodes": [...], "connections": [...]}}

// File generated
{"type": "file", "path": "main.tf", "content": "...", "language": "hcl"}

// Resource status during deploy
{"type": "node_status", "nodeId": "lambda-1", "status": "live"}

Key Decisions

Decision Rationale Tradeoff
Knowledge graph for architecture Eliminates LLM hallucinations on infrastructure decisions; graph traversal is deterministic and auditable Graph must be kept current as AWS services evolve
LangGraph with interrupt() Native human-in-the-loop without custom state machines; resumes across HTTP requests via MemorySaver checkpoint Requires thread_id per session; adds checkpoint storage overhead
Kuzu for graph storage Embeddable property graph DB, no separate server; supports Cypher queries Less ecosystem tooling than Neo4j
STS AssumeRole, no stored keys Eliminates a whole class of credential-leak risk Requires users to pre-configure a trust policy in their AWS account
Fernet for credential storage AES-128-CBC, well-audited Python-native; keys can be rotated Symmetric — key compromise exposes all stored credentials
Multi-subgraph Agent 3 tf_validation_loop and code_generation_loop are isolated subgraphs with their own state; cleaner retry semantics Requires explicit state mapping at subgraph boundaries
Ollama for Agent 1 & 3 Keeps PRD refinement and code generation free (no API cost) Cold-start latency on first token; quality ceiling below Claude
Provider factory pattern Topology graph is cloud-agnostic; only the render layer knows the target provider Currently only the AWS renderer is fully implemented

Innovation / Notable Work

Graph-grounded architecture reasoning. Rather than prompting an LLM to "design an architecture," CloudForge retrieves patterns from a structured knowledge graph using semantic similarity on the extracted NFR set, then validates compatibility via graph traversal. The LLM only runs as the final step to produce a human-readable explanation. This makes architecture decisions auditable and reproducible — re-running the same PRD with the same graph produces the same recommendation.

Deterministic rule engine as a hard gate. arch_rules.py implements SPOF detection (single stateful nodes with multiple compute inputs), BFS-based cascade failure propagation, and per-hop latency budget analysis using a lookup table of ~30 AWS service categories. These checks run before any human review and can trigger an automatic architecture retry loop. The engine is purely algorithmic — no LLM involved.

Self-healing IaC generation. Agent 3 runs generated Terraform through a four-stage validation pipeline and feeds failures back to the LLM with structured error context for targeted fixes. The retry loop runs up to three times per file before escalating, making the generation process robust to common HCL mistakes without human intervention.

Zero-credential deploy model. CloudForge never asks for or stores AWS access keys. Deploy-time credentials are obtained exclusively via IAM role assumption (STS AssumeRole), and any intermediate credential material is discarded after the CloudFormation stack call returns. Credentials that must be stored (user-initiated deployments) are encrypted with Fernet and decrypted only at call time.


Potential Metrics to Track

Metric Why It Matters
Architecture-to-deploy success rate Measures end-to-end reliability of the full pipeline
Terraform validation pass rate (first attempt) Indicator of code generation quality before fix retries
SPOF / violation detection rate per PRD Validates usefulness of the deterministic rule engine
Mean time from PRD submission to live stack Core product performance indicator
Fix loop retry distribution (0 / 1 / 2 / 3 retries) Guides investment in LLM prompt quality vs. validation tooling

Quality

  • Typed state everywhere — LangGraph state schemas use Pydantic models and TypedDict; invalid state transitions are caught at compile time.
  • Fail-fast server startup — FastAPI lifespan validates JWT_SECRET_KEY and FERNET_KEY before accepting any requests.
  • Structured logging — all agents use Python logging; deploy events are persisted to the deployments MongoDB collection.
  • IaC security scanning — Checkov runs the CIS AWS Benchmark against every generated Terraform file before deployment.
  • Rate limiting — all auth endpoints are protected by SlowAPI (5 requests/minute for registration).
  • Smoke testsstandalone_smoke_test.py exercises Agent 1 across multiple cloud providers and option-selection modes without requiring a running server.

Roadmap

  • GCP and Azure renderers — the provider factory is in place; completing the GCP and Azure Terraform renderers would make the pipeline fully multi-cloud.
  • Persistent graph updates — allow the architecture knowledge graph to be extended with org-specific patterns and past deployment outcomes.
  • Cost estimation integration — the cost_fetchers/ module has AWS, GCP, and Azure stubs; wiring real pricing API data into the architecture review step would surface cost tradeoffs before provisioning.
  • Agent 2 streaming — architecture planning currently resolves in one shot; streaming sub-agent progress (service discovery → simulation → compliance → test) would improve perceived responsiveness.
  • Drift detection — compare the generated Terraform state with the actual CloudFormation stack on subsequent sessions and surface configuration drift.

About

CloudForge was built at a hackathon to address a real friction point: generating cloud infrastructure requires deep expertise, but the decisions are largely pattern-matching against known constraints. By encoding those patterns in a graph and running deterministic validation before any LLM call, CloudForge makes infrastructure generation auditable rather than probabilistic — closer to a compiler than a chatbot.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors