Tyson Cung

Posted on Mar 7

Deploy a Production AI Platform on AWS for $100/month

#aws #ai #opensource #typescript

From seven broken Lambda functions to a production AI platform in 8 articles.

That's the journey we've taken together. Functions that couldn't communicate, hit timeout walls, and left users staring at loading spinners. Now you get a complete platform that orchestrates complex workflows, streams real-time updates, and won't bankrupt your startup.

This isn't a toy example. The architecture I'm about to show you serves 1,500+ requests daily, has survived 8 months in production, and handles everything from document analysis to multi-step research tasks.

Time to deploy it.

The Complete Architecture

Before we dive into deployment, here's what we're building:

- Content Classification
The data flow:

API Gateway receives requests, handles auth, enforces rate limits
Gateway Lambda validates requests, checks budgets, routes to appropriate service
ECS Agents orchestrate multi-step workflows using Lambda tools
Lambda Tools perform specific AI tasks (summarize, extract, classify)
DynamoDB tracks usage, manages budgets, stores user data
WebSocket streams real-time updates back to clients

Prerequisites: Bootstrap Your Environment

First, let's set up the deployment environment:

# Install AWS CDK
npm install -g aws-cdk

# Clone the platform
git clone https://github.com/tysoncung/ai-platform-aws.git
cd ai-platform-aws

# Install dependencies
npm install
npm run install:all  # Installs in all packages

# Bootstrap CDK (one time per account/region)
npx cdk bootstrap

# Create environment file
cp .env.example .env

Edit .env with your configuration:

# AWS Configuration
AWS_REGION=us-east-1
AWS_ACCOUNT_ID=123456789012

# AI Provider API Keys
OPENAI_API_KEY=sk-your-openai-key
ANTHROPIC_API_KEY=sk-ant-your-anthropic-key

# Platform Configuration
PLATFORM_ENVIRONMENT=production
COST_TRACKING_ENABLED=true
BUDGET_ALERTS_ENABLED=true

# Monitoring
SLACK_WEBHOOK_URL=https://hooks.slack.com/your-webhook
ALERT_EMAIL=you@company.com

# Security
JWT_SECRET_KEY=your-super-secret-jwt-key
ENCRYPTION_SALT=your-encryption-salt

Local Development Setup

Before deploying to AWS, let's run everything locally with Docker Compose:

# docker-compose.yml
version: '3.8'

services:
  api-gateway:
    build:
      context: ./packages/gateway
      dockerfile: Dockerfile.dev
    ports:
      - "3000:3000"
    environment:
      - NODE_ENV=development
      - DYNAMODB_ENDPOINT=http://dynamodb:8000
      - AGENT_ENDPOINT=http://agent:3001
    depends_on:
      - dynamodb
      - agent

  agent:
    build:
      context: ./packages/agents
      dockerfile: Dockerfile.dev
    ports:
      - "3001:3001"
    environment:
      - NODE_ENV=development
      - LAMBDA_ENDPOINT=http://lambda-tools:3002
    depends_on:
      - lambda-tools

  lambda-tools:
    build:
      context: ./packages/tools
      dockerfile: Dockerfile.dev
    ports:
      - "3002:3002"
    environment:
      - NODE_ENV=development
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}

  dynamodb:
    image: amazon/dynamodb-local:latest
    ports:
      - "8000:8000"
    command: ["-jar", "DynamoDBLocal.jar", "-sharedDb", "-inMemory"]

  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"

Start the local environment:

# Start all services
docker-compose up -d

# Run database migrations
npm run db:migrate:local

# Seed with sample data
npm run db:seed:local

# Test the platform
curl http://localhost:3000/health

CDK Stack Composition

The platform is composed of multiple CDK stacks for better separation of concerns:

// bin/deploy.ts
import { AIGatewayStack } from '../lib/gateway-stack';
import { AIAgentsStack } from '../lib/agents-stack';
import { AIToolsStack } from '../lib/tools-stack';
import { AIMonitoringStack } from '../lib/monitoring-stack';
import { AISecurityStack } from '../lib/security-stack';

const app = new cdk.App();
const env = { 
  account: process.env.CDK_DEFAULT_ACCOUNT, 
  region: process.env.CDK_DEFAULT_REGION 
};

// Security layer (VPC, IAM, KMS)
const securityStack = new AISecurityStack(app, 'AISecurityStack', { env });

// Lambda tools layer
const toolsStack = new AIToolsStack(app, 'AIToolsStack', {
  env,
  vpc: securityStack.vpc,
  securityGroup: securityStack.lambdaSecurityGroup
});

// ECS agents layer
const agentsStack = new AIAgentsStack(app, 'AIAgentsStack', {
  env,
  vpc: securityStack.vpc,
  securityGroup: securityStack.ecsSecurityGroup,
  toolsArns: toolsStack.functionArns
});

// API Gateway layer
const gatewayStack = new AIGatewayStack(app, 'AIGatewayStack', {
  env,
  agentsCluster: agentsStack.cluster,
  agentsService: agentsStack.service,
  toolsArns: toolsStack.functionArns
});

// Monitoring and alerting
new AIMonitoringStack(app, 'AIMonitoringStack', {
  env,
  gatewayApi: gatewayStack.api,
  agentsService: agentsStack.service,
  toolsFunctions: toolsStack.functions
});

Here's the gateway stack implementation:

// lib/gateway-stack.ts
export class AIGatewayStack extends cdk.Stack {
  public readonly api: apigateway.RestApi;

  constructor(scope: Construct, id: string, props: AIGatewayStackProps) {
    super(scope, id, props);

    // DynamoDB tables
    const usageTable = new dynamodb.Table(this, 'UsageTable', {
      tableName: 'ai-platform-usage',
      partitionKey: { name: 'userId', type: dynamodb.AttributeType.STRING },
      sortKey: { name: 'timestamp', type: dynamodb.AttributeType.NUMBER },
      billingMode: dynamodb.BillingMode.ON_DEMAND,
      timeToLiveAttribute: 'ttl'
    });

    const budgetTable = new dynamodb.Table(this, 'BudgetTable', {
      tableName: 'ai-platform-budgets',
      partitionKey: { name: 'userId', type: dynamodb.AttributeType.STRING },
      billingMode: dynamodb.BillingMode.ON_DEMAND
    });

    // Gateway Lambda function
    const gatewayFunction = new lambda.Function(this, 'GatewayFunction', {
      runtime: lambda.Runtime.NODEJS_18_X,
      code: lambda.Code.fromAsset('packages/gateway/dist'),
      handler: 'index.handler',
      timeout: cdk.Duration.seconds(30),
      memorySize: 512,
      environment: {
        USAGE_TABLE_NAME: usageTable.tableName,
        BUDGET_TABLE_NAME: budgetTable.tableName,
        AGENTS_CLUSTER_ARN: props.agentsCluster.clusterArn,
        AGENTS_SERVICE_ARN: props.agentsService.serviceArn,
        TOOLS_ARNS: JSON.stringify(props.toolsArns)
      }
    });

    // Grant permissions
    usageTable.grantReadWriteData(gatewayFunction);
    budgetTable.grantReadWriteData(gatewayFunction);

    // API Gateway
    this.api = new apigateway.RestApi(this, 'AIApi', {
      restApiName: 'AI Platform API',
      description: 'AI Platform REST API',
      defaultCorsPreflightOptions: {
        allowOrigins: apigateway.Cors.ALL_ORIGINS,
        allowMethods: apigateway.Cors.ALL_METHODS,
        allowHeaders: ['Content-Type', 'Authorization']
      }
    });

    // API Gateway integration
    const lambdaIntegration = new apigateway.LambdaIntegration(gatewayFunction);

    // Routes
    const v1 = this.api.root.addResource('v1');

    v1.addResource('complete').addMethod('POST', lambdaIntegration);
    v1.addResource('embed').addMethod('POST', lambdaIntegration);
    v1.addResource('stream').addMethod('POST', lambdaIntegration);

    const agents = v1.addResource('agents');
    agents.addResource('run').addMethod('POST', lambdaIntegration);
    agents.addResource('stream').addMethod('POST', lambdaIntegration);

    // Usage and budget endpoints
    const usage = v1.addResource('usage');
    usage.addMethod('GET', lambdaIntegration); // Get usage stats
    usage.addResource('budget').addMethod('GET', lambdaIntegration);
    usage.addResource('budget').addMethod('PUT', lambdaIntegration);

    // WebSocket API for streaming
    const webSocketApi = new apigatewayv2.WebSocketApi(this, 'StreamingAPI', {
      apiName: 'AI Platform Streaming',
      connectRouteOptions: {
        integration: new apigatewayv2integrations.WebSocketLambdaIntegration(
          'ConnectIntegration',
          gatewayFunction
        )
      },
      disconnectRouteOptions: {
        integration: new apigatewayv2integrations.WebSocketLambdaIntegration(
          'DisconnectIntegration',
          gatewayFunction
        )
      },
      defaultRouteOptions: {
        integration: new apigatewayv2integrations.WebSocketLambdaIntegration(
          'DefaultIntegration',
          gatewayFunction
        )
      }
    });

    new apigatewayv2.WebSocketStage(this, 'StreamingStage', {
      webSocketApi,
      stageName: 'prod',
      autoDeploy: true
    });
  }
}

Step-by-Step Deployment

Now let's deploy everything:

# 1. Validate CDK configuration
npx cdk doctor

# 2. Review what will be deployed
npx cdk diff

# 3. Deploy security stack first
npx cdk deploy AISecurityStack

# 4. Deploy Lambda tools
npx cdk deploy AIToolsStack

# 5. Deploy ECS agents
npx cdk deploy AIAgentsStack

# 6. Deploy API Gateway
npx cdk deploy AIGatewayStack

# 7. Deploy monitoring
npx cdk deploy AIMonitoringStack

# Or deploy everything at once
npx cdk deploy --all

The deployment takes about 15 minutes. You'll see output like:

AIGatewayStack.APIEndpoint = https://abc123.execute-api.us-east-1.amazonaws.com/v1
AIGatewayStack.WebSocketEndpoint = wss://def456.execute-api.us-east-1.amazonaws.com/prod
AIAgentsStack.ClusterName = ai-platform-agents
AIToolsStack.SummarizeFunctionArn = arn:aws:lambda:us-east-1:123456789012:function:summarize

Configure AI Providers

Once deployed, configure your AI provider credentials:

# Store API keys in AWS Systems Manager
aws ssm put-parameter \
  --name "/ai-platform/openai-api-key" \
  --value "sk-your-openai-key" \
  --type "SecureString"

aws ssm put-parameter \
  --name "/ai-platform/anthropic-api-key" \
  --value "sk-ant-your-anthropic-key" \
  --type "SecureString"

# Update the deployed functions with the new parameter names
npx cdk deploy AIToolsStack AIGatewayStack

Testing Your Deployment

Let's test the complete platform:

# 1. Health check
curl https://your-api-endpoint.execute-api.us-east-1.amazonaws.com/v1/health

# 2. Create an API key
curl -X POST https://your-api-endpoint/v1/auth/keys \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Test Key",
    "scopes": ["ai:complete", "ai:embed", "agent:run"],
    "monthlyBudget": 50
  }'

# Returns: {"apiKey": "sk-proj-abc123...", "keyId": "sk-proj-abc"}

# 3. Test completion
curl -X POST https://your-api-endpoint/v1/complete \
  -H "Authorization: Bearer sk-proj-abc123..." \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{"role": "user", "content": "Write a haiku about TypeScript"}],
    "model": "gpt-4",
    "temperature": 0.8
  }'

# 4. Test agent workflow
curl -X POST https://your-api-endpoint/v1/agents/run \
  -H "Authorization: Bearer sk-proj-abc123..." \
  -H "Content-Type: application/json" \
  -d '{
    "type": "research",
    "input": {"topic": "renewable energy trends"},
    "tools": ["search", "summarize", "extract"]
  }'

Dashboard Tour

The platform includes a built-in dashboard at /dashboard. Here's what you'll see:

Usage Overview:

Requests per day/hour
Token consumption by model
Cost breakdown by user
Success/error rates

Real-time Monitoring:

Active agent sessions
Queue depth for tools
Response time percentiles
Error alerts

Budget Management:

Per-user spend tracking
Budget utilization alerts
Cost projections
BYOK vs platform credit usage

System Health:

Lambda cold start metrics
ECS task utilization
DynamoDB performance
API Gateway latency

You can access it at: https://your-api-endpoint/dashboard

Performance Numbers from Production

Here are the real metrics from 8 months running in production:

Latency (P95):

Simple completion: 1.2s
Streaming completion: 180ms to first token
Agent workflow (3 tools): 12s
API Gateway overhead: 45ms
Lambda cold start: 850ms (mitigated with provisioned concurrency)

Throughput:

Sustained: 50 requests/second
Burst: 200 requests/second (before rate limiting)
Agent concurrency: 15 parallel workflows
Tool execution: 100 parallel Lambda invocations

Reliability:

Uptime: 99.8%
Error rate: 0.4%
P99 latency SLA: 5s (met 98.9% of the time)
Budget enforcement accuracy: 99.99%

Cost Optimization Wins:

Response caching: 25% reduction in API calls
Smart model selection: 40% cost reduction (Claude Haiku for summaries)
BYOK adoption: 70% of users, eliminating platform AI costs
Lambda right-sizing: 30% reduction in compute costs

Cost Breakdown: What This Actually Costs

Fixed Infrastructure (Monthly):

API Gateway:           $3.50   (1M requests)
Lambda (Gateway):      $8.20   (compute + requests)
ECS Fargate:          $15.40   (2 tasks avg)
DynamoDB:             $6.80    (usage + budgets)
Application Load Balancer: $16.20
NAT Gateway:          $45.00   (data transfer)
CloudWatch:           $4.30    (logs + metrics)
Route 53:             $0.50    (hosted zone)
----
Total Fixed:          $99.90/month

Variable Costs:

AI API costs: Pass-through with 2% platform markup
Data transfer: $0.09/GB out of AWS
Lambda executions: $0.20 per million requests
DynamoDB reads/writes: $0.25 per million operations

Real customer costs (excluding AI API):

Light usage (500 req/month): $12/month
Medium usage (5K req/month): $35/month
Heavy usage (50K req/month): $120/month

The platform is cost-effective for most use cases. The break-even point vs building your own infrastructure is around 2,000 requests per month.

Cold Start Mitigation

Lambda cold starts were killing our performance. Here's how we solved it:

// Provisioned concurrency for critical functions
new lambda.Function(this, 'GatewayFunction', {
  // ... other config
  reservedConcurrencyLimit: 10,
  provisionedConcurrencyConfig: {
    provisionedConcurrentExecutions: 5
  }
});

// Keep-warm function that pings Lambdas every 5 minutes
new events.Rule(this, 'KeepWarmRule', {
  schedule: events.Schedule.rate(cdk.Duration.minutes(5)),
  targets: [
    new targets.LambdaFunction(gatewayFunction, {
      event: events.RuleTargetInput.fromObject({ warmup: true })
    })
  ]
});

// In Lambda handler - respond quickly to warmup
export const handler = async (event: any) => {
  if (event.warmup) {
    return { statusCode: 200, body: 'warm' };
  }

  // Normal processing...
};

Result: Cold start rate dropped from 23% to 3% of requests.

Open Source Roadmap

This platform is completely open source. Here's what's coming next:

Q2 2026:

[ ] Multi-region deployment support
[ ] GraphQL API alongside REST
[ ] Built-in vector database (Pinecone integration)
[ ] Advanced agent memory management

Q3 2026:

[ ] Kubernetes support (alternative to ECS)
[ ] Multi-tenant isolation improvements
[ ] Advanced cost optimization (spot instances)
[ ] Plugin system for custom tools

Q4 2026:

[ ] Edge deployment (CloudFlare Workers)
[ ] Real-time collaboration features
[ ] Advanced monitoring and observability
[ ] Enterprise SSO integration

Community Requests:

Google Cloud and Azure support
Terraform modules (alternative to CDK)
Python SDK alongside TypeScript
Zapier/Make.com integrations

Contributing and Community

The entire platform is open source under MIT license. Everything I've built, you can use, modify, and improve.

Repositories:

Main platform: github.com/tysoncung/ai-platform-aws
Working examples: github.com/tysoncung/ai-platform-aws-examples

How to help:

Star the repositories - helps others discover the project
Try the full deployment - example 07-full-stack has everything
Report deployment issues - especially AWS region differences
Submit improvements - see CONTRIBUTING.md for guidelines
Share your experience - what are you building with it?

Connect:

Email: tyson@hivo.co
Twitter: @tysoncung

What We Built Together

Eight articles. One complete AI platform.

We started with seven broken Lambda functions. We built:

Agent orchestration that handles complex multi-step workflows without timeouts
TypeScript SDK with perfect IntelliSense, streaming support, and smart error handling
Cost control that prevents $2,847 surprises with budgets and rate limits
Production security with authentication, encryption, and monitoring
One-command deployment that gets you running in under an hour

The platform serves 1,500+ requests daily. It's survived 8 months in production. It's processing everything from document analysis to research workflows. And it's completely open source.

The Hard-Won Lessons

Building production AI infrastructure taught me things tutorials never mention:

Technical truths:

Cost control is life support, not a nice-to-have feature
Lambda excels at tools, fails at orchestration
Streaming looks simple, implementation is brutal
Type safety prevents expensive mistakes at 3AM

Business realities:

Developers pay for great experience, abandon bad APIs
Open source builds trust better than marketing
Production numbers matter more than perfect demos
Failure stories teach more than success posts

Personal discoveries:

Building in public creates accountability
Documentation is your product's face
Shipping beats perfecting every time
Sharing mistakes helps everyone improve

Your Turn

You have everything you need. Real code, real examples, real production lessons. The platform is MIT licensed - use it, improve it, make money with it.

Next steps: