DEV Community

Cover image for Deploy a Production AI Platform on AWS for $100/month
Tyson Cung
Tyson Cung

Posted on

Deploy a Production AI Platform on AWS for $100/month

From seven broken Lambda functions to a production AI platform in 8 articles.

That's the journey we've taken together. Functions that couldn't communicate, hit timeout walls, and left users staring at loading spinners. Now you get a complete platform that orchestrates complex workflows, streams real-time updates, and won't bankrupt your startup.

This isn't a toy example. The architecture I'm about to show you serves 1,500+ requests daily, has survived 8 months in production, and handles everything from document analysis to multi-step research tasks.

Time to deploy it.

The Complete Architecture

Before we dive into deployment, here's what we're building:

AI Platform AWS Architecture
- Content Classification
The data flow:

  1. API Gateway receives requests, handles auth, enforces rate limits
  2. Gateway Lambda validates requests, checks budgets, routes to appropriate service
  3. ECS Agents orchestrate multi-step workflows using Lambda tools
  4. Lambda Tools perform specific AI tasks (summarize, extract, classify)
  5. DynamoDB tracks usage, manages budgets, stores user data
  6. WebSocket streams real-time updates back to clients

Prerequisites: Bootstrap Your Environment

First, let's set up the deployment environment:

# Install AWS CDK
npm install -g aws-cdk

# Clone the platform
git clone https://github.com/tysoncung/ai-platform-aws.git
cd ai-platform-aws

# Install dependencies
npm install
npm run install:all  # Installs in all packages

# Bootstrap CDK (one time per account/region)
npx cdk bootstrap

# Create environment file
cp .env.example .env
Enter fullscreen mode Exit fullscreen mode

Edit .env with your configuration:

# AWS Configuration
AWS_REGION=us-east-1
AWS_ACCOUNT_ID=123456789012

# AI Provider API Keys
OPENAI_API_KEY=sk-your-openai-key
ANTHROPIC_API_KEY=sk-ant-your-anthropic-key

# Platform Configuration
PLATFORM_ENVIRONMENT=production
COST_TRACKING_ENABLED=true
BUDGET_ALERTS_ENABLED=true

# Monitoring
SLACK_WEBHOOK_URL=https://hooks.slack.com/your-webhook
ALERT_EMAIL=you@company.com

# Security
JWT_SECRET_KEY=your-super-secret-jwt-key
ENCRYPTION_SALT=your-encryption-salt
Enter fullscreen mode Exit fullscreen mode

Local Development Setup

Before deploying to AWS, let's run everything locally with Docker Compose:

# docker-compose.yml
version: '3.8'

services:
  api-gateway:
    build:
      context: ./packages/gateway
      dockerfile: Dockerfile.dev
    ports:
      - "3000:3000"
    environment:
      - NODE_ENV=development
      - DYNAMODB_ENDPOINT=http://dynamodb:8000
      - AGENT_ENDPOINT=http://agent:3001
    depends_on:
      - dynamodb
      - agent

  agent:
    build:
      context: ./packages/agents
      dockerfile: Dockerfile.dev
    ports:
      - "3001:3001"
    environment:
      - NODE_ENV=development
      - LAMBDA_ENDPOINT=http://lambda-tools:3002
    depends_on:
      - lambda-tools

  lambda-tools:
    build:
      context: ./packages/tools
      dockerfile: Dockerfile.dev
    ports:
      - "3002:3002"
    environment:
      - NODE_ENV=development
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}

  dynamodb:
    image: amazon/dynamodb-local:latest
    ports:
      - "8000:8000"
    command: ["-jar", "DynamoDBLocal.jar", "-sharedDb", "-inMemory"]

  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"
Enter fullscreen mode Exit fullscreen mode

Start the local environment:

# Start all services
docker-compose up -d

# Run database migrations
npm run db:migrate:local

# Seed with sample data
npm run db:seed:local

# Test the platform
curl http://localhost:3000/health
Enter fullscreen mode Exit fullscreen mode

CDK Stack Composition

The platform is composed of multiple CDK stacks for better separation of concerns:

// bin/deploy.ts
import { AIGatewayStack } from '../lib/gateway-stack';
import { AIAgentsStack } from '../lib/agents-stack';
import { AIToolsStack } from '../lib/tools-stack';
import { AIMonitoringStack } from '../lib/monitoring-stack';
import { AISecurityStack } from '../lib/security-stack';

const app = new cdk.App();
const env = { 
  account: process.env.CDK_DEFAULT_ACCOUNT, 
  region: process.env.CDK_DEFAULT_REGION 
};

// Security layer (VPC, IAM, KMS)
const securityStack = new AISecurityStack(app, 'AISecurityStack', { env });

// Lambda tools layer
const toolsStack = new AIToolsStack(app, 'AIToolsStack', {
  env,
  vpc: securityStack.vpc,
  securityGroup: securityStack.lambdaSecurityGroup
});

// ECS agents layer
const agentsStack = new AIAgentsStack(app, 'AIAgentsStack', {
  env,
  vpc: securityStack.vpc,
  securityGroup: securityStack.ecsSecurityGroup,
  toolsArns: toolsStack.functionArns
});

// API Gateway layer
const gatewayStack = new AIGatewayStack(app, 'AIGatewayStack', {
  env,
  agentsCluster: agentsStack.cluster,
  agentsService: agentsStack.service,
  toolsArns: toolsStack.functionArns
});

// Monitoring and alerting
new AIMonitoringStack(app, 'AIMonitoringStack', {
  env,
  gatewayApi: gatewayStack.api,
  agentsService: agentsStack.service,
  toolsFunctions: toolsStack.functions
});
Enter fullscreen mode Exit fullscreen mode

Here's the gateway stack implementation:

// lib/gateway-stack.ts
export class AIGatewayStack extends cdk.Stack {
  public readonly api: apigateway.RestApi;

  constructor(scope: Construct, id: string, props: AIGatewayStackProps) {
    super(scope, id, props);

    // DynamoDB tables
    const usageTable = new dynamodb.Table(this, 'UsageTable', {
      tableName: 'ai-platform-usage',
      partitionKey: { name: 'userId', type: dynamodb.AttributeType.STRING },
      sortKey: { name: 'timestamp', type: dynamodb.AttributeType.NUMBER },
      billingMode: dynamodb.BillingMode.ON_DEMAND,
      timeToLiveAttribute: 'ttl'
    });

    const budgetTable = new dynamodb.Table(this, 'BudgetTable', {
      tableName: 'ai-platform-budgets',
      partitionKey: { name: 'userId', type: dynamodb.AttributeType.STRING },
      billingMode: dynamodb.BillingMode.ON_DEMAND
    });

    // Gateway Lambda function
    const gatewayFunction = new lambda.Function(this, 'GatewayFunction', {
      runtime: lambda.Runtime.NODEJS_18_X,
      code: lambda.Code.fromAsset('packages/gateway/dist'),
      handler: 'index.handler',
      timeout: cdk.Duration.seconds(30),
      memorySize: 512,
      environment: {
        USAGE_TABLE_NAME: usageTable.tableName,
        BUDGET_TABLE_NAME: budgetTable.tableName,
        AGENTS_CLUSTER_ARN: props.agentsCluster.clusterArn,
        AGENTS_SERVICE_ARN: props.agentsService.serviceArn,
        TOOLS_ARNS: JSON.stringify(props.toolsArns)
      }
    });

    // Grant permissions
    usageTable.grantReadWriteData(gatewayFunction);
    budgetTable.grantReadWriteData(gatewayFunction);

    // API Gateway
    this.api = new apigateway.RestApi(this, 'AIApi', {
      restApiName: 'AI Platform API',
      description: 'AI Platform REST API',
      defaultCorsPreflightOptions: {
        allowOrigins: apigateway.Cors.ALL_ORIGINS,
        allowMethods: apigateway.Cors.ALL_METHODS,
        allowHeaders: ['Content-Type', 'Authorization']
      }
    });

    // API Gateway integration
    const lambdaIntegration = new apigateway.LambdaIntegration(gatewayFunction);

    // Routes
    const v1 = this.api.root.addResource('v1');

    v1.addResource('complete').addMethod('POST', lambdaIntegration);
    v1.addResource('embed').addMethod('POST', lambdaIntegration);
    v1.addResource('stream').addMethod('POST', lambdaIntegration);

    const agents = v1.addResource('agents');
    agents.addResource('run').addMethod('POST', lambdaIntegration);
    agents.addResource('stream').addMethod('POST', lambdaIntegration);

    // Usage and budget endpoints
    const usage = v1.addResource('usage');
    usage.addMethod('GET', lambdaIntegration); // Get usage stats
    usage.addResource('budget').addMethod('GET', lambdaIntegration);
    usage.addResource('budget').addMethod('PUT', lambdaIntegration);

    // WebSocket API for streaming
    const webSocketApi = new apigatewayv2.WebSocketApi(this, 'StreamingAPI', {
      apiName: 'AI Platform Streaming',
      connectRouteOptions: {
        integration: new apigatewayv2integrations.WebSocketLambdaIntegration(
          'ConnectIntegration',
          gatewayFunction
        )
      },
      disconnectRouteOptions: {
        integration: new apigatewayv2integrations.WebSocketLambdaIntegration(
          'DisconnectIntegration',
          gatewayFunction
        )
      },
      defaultRouteOptions: {
        integration: new apigatewayv2integrations.WebSocketLambdaIntegration(
          'DefaultIntegration',
          gatewayFunction
        )
      }
    });

    new apigatewayv2.WebSocketStage(this, 'StreamingStage', {
      webSocketApi,
      stageName: 'prod',
      autoDeploy: true
    });
  }
}
Enter fullscreen mode Exit fullscreen mode

Step-by-Step Deployment

Now let's deploy everything:

# 1. Validate CDK configuration
npx cdk doctor

# 2. Review what will be deployed
npx cdk diff

# 3. Deploy security stack first
npx cdk deploy AISecurityStack

# 4. Deploy Lambda tools
npx cdk deploy AIToolsStack

# 5. Deploy ECS agents
npx cdk deploy AIAgentsStack

# 6. Deploy API Gateway
npx cdk deploy AIGatewayStack

# 7. Deploy monitoring
npx cdk deploy AIMonitoringStack

# Or deploy everything at once
npx cdk deploy --all
Enter fullscreen mode Exit fullscreen mode

The deployment takes about 15 minutes. You'll see output like:

AIGatewayStack.APIEndpoint = https://abc123.execute-api.us-east-1.amazonaws.com/v1
AIGatewayStack.WebSocketEndpoint = wss://def456.execute-api.us-east-1.amazonaws.com/prod
AIAgentsStack.ClusterName = ai-platform-agents
AIToolsStack.SummarizeFunctionArn = arn:aws:lambda:us-east-1:123456789012:function:summarize
Enter fullscreen mode Exit fullscreen mode

Configure AI Providers

Once deployed, configure your AI provider credentials:

# Store API keys in AWS Systems Manager
aws ssm put-parameter \
  --name "/ai-platform/openai-api-key" \
  --value "sk-your-openai-key" \
  --type "SecureString"

aws ssm put-parameter \
  --name "/ai-platform/anthropic-api-key" \
  --value "sk-ant-your-anthropic-key" \
  --type "SecureString"

# Update the deployed functions with the new parameter names
npx cdk deploy AIToolsStack AIGatewayStack
Enter fullscreen mode Exit fullscreen mode

Testing Your Deployment

Let's test the complete platform:

# 1. Health check
curl https://your-api-endpoint.execute-api.us-east-1.amazonaws.com/v1/health

# 2. Create an API key
curl -X POST https://your-api-endpoint/v1/auth/keys \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Test Key",
    "scopes": ["ai:complete", "ai:embed", "agent:run"],
    "monthlyBudget": 50
  }'

# Returns: {"apiKey": "sk-proj-abc123...", "keyId": "sk-proj-abc"}

# 3. Test completion
curl -X POST https://your-api-endpoint/v1/complete \
  -H "Authorization: Bearer sk-proj-abc123..." \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{"role": "user", "content": "Write a haiku about TypeScript"}],
    "model": "gpt-4",
    "temperature": 0.8
  }'

# 4. Test agent workflow
curl -X POST https://your-api-endpoint/v1/agents/run \
  -H "Authorization: Bearer sk-proj-abc123..." \
  -H "Content-Type: application/json" \
  -d '{
    "type": "research",
    "input": {"topic": "renewable energy trends"},
    "tools": ["search", "summarize", "extract"]
  }'
Enter fullscreen mode Exit fullscreen mode

Dashboard Tour

The platform includes a built-in dashboard at /dashboard. Here's what you'll see:

Usage Overview:

  • Requests per day/hour
  • Token consumption by model
  • Cost breakdown by user
  • Success/error rates

Real-time Monitoring:

  • Active agent sessions
  • Queue depth for tools
  • Response time percentiles
  • Error alerts

Budget Management:

  • Per-user spend tracking
  • Budget utilization alerts
  • Cost projections
  • BYOK vs platform credit usage

System Health:

  • Lambda cold start metrics
  • ECS task utilization
  • DynamoDB performance
  • API Gateway latency

You can access it at: https://your-api-endpoint/dashboard

Performance Numbers from Production

Here are the real metrics from 8 months running in production:

Latency (P95):

  • Simple completion: 1.2s
  • Streaming completion: 180ms to first token
  • Agent workflow (3 tools): 12s
  • API Gateway overhead: 45ms
  • Lambda cold start: 850ms (mitigated with provisioned concurrency)

Throughput:

  • Sustained: 50 requests/second
  • Burst: 200 requests/second (before rate limiting)
  • Agent concurrency: 15 parallel workflows
  • Tool execution: 100 parallel Lambda invocations

Reliability:

  • Uptime: 99.8%
  • Error rate: 0.4%
  • P99 latency SLA: 5s (met 98.9% of the time)
  • Budget enforcement accuracy: 99.99%

Cost Optimization Wins:

  • Response caching: 25% reduction in API calls
  • Smart model selection: 40% cost reduction (Claude Haiku for summaries)
  • BYOK adoption: 70% of users, eliminating platform AI costs
  • Lambda right-sizing: 30% reduction in compute costs

Cost Breakdown: What This Actually Costs

Fixed Infrastructure (Monthly):

API Gateway:           $3.50   (1M requests)
Lambda (Gateway):      $8.20   (compute + requests)
ECS Fargate:          $15.40   (2 tasks avg)
DynamoDB:             $6.80    (usage + budgets)
Application Load Balancer: $16.20
NAT Gateway:          $45.00   (data transfer)
CloudWatch:           $4.30    (logs + metrics)
Route 53:             $0.50    (hosted zone)
----
Total Fixed:          $99.90/month
Enter fullscreen mode Exit fullscreen mode

Variable Costs:

  • AI API costs: Pass-through with 2% platform markup
  • Data transfer: $0.09/GB out of AWS
  • Lambda executions: $0.20 per million requests
  • DynamoDB reads/writes: $0.25 per million operations

Real customer costs (excluding AI API):

  • Light usage (500 req/month): $12/month
  • Medium usage (5K req/month): $35/month
  • Heavy usage (50K req/month): $120/month

The platform is cost-effective for most use cases. The break-even point vs building your own infrastructure is around 2,000 requests per month.

Cold Start Mitigation

Lambda cold starts were killing our performance. Here's how we solved it:

// Provisioned concurrency for critical functions
new lambda.Function(this, 'GatewayFunction', {
  // ... other config
  reservedConcurrencyLimit: 10,
  provisionedConcurrencyConfig: {
    provisionedConcurrentExecutions: 5
  }
});

// Keep-warm function that pings Lambdas every 5 minutes
new events.Rule(this, 'KeepWarmRule', {
  schedule: events.Schedule.rate(cdk.Duration.minutes(5)),
  targets: [
    new targets.LambdaFunction(gatewayFunction, {
      event: events.RuleTargetInput.fromObject({ warmup: true })
    })
  ]
});

// In Lambda handler - respond quickly to warmup
export const handler = async (event: any) => {
  if (event.warmup) {
    return { statusCode: 200, body: 'warm' };
  }

  // Normal processing...
};
Enter fullscreen mode Exit fullscreen mode

Result: Cold start rate dropped from 23% to 3% of requests.

Open Source Roadmap

This platform is completely open source. Here's what's coming next:

Q2 2026:

  • [ ] Multi-region deployment support
  • [ ] GraphQL API alongside REST
  • [ ] Built-in vector database (Pinecone integration)
  • [ ] Advanced agent memory management

Q3 2026:

  • [ ] Kubernetes support (alternative to ECS)
  • [ ] Multi-tenant isolation improvements
  • [ ] Advanced cost optimization (spot instances)
  • [ ] Plugin system for custom tools

Q4 2026:

  • [ ] Edge deployment (CloudFlare Workers)
  • [ ] Real-time collaboration features
  • [ ] Advanced monitoring and observability
  • [ ] Enterprise SSO integration

Community Requests:

  • Google Cloud and Azure support
  • Terraform modules (alternative to CDK)
  • Python SDK alongside TypeScript
  • Zapier/Make.com integrations

Contributing and Community

The entire platform is open source under MIT license. Everything I've built, you can use, modify, and improve.

Repositories:

How to help:

  1. Star the repositories - helps others discover the project
  2. Try the full deployment - example 07-full-stack has everything
  3. Report deployment issues - especially AWS region differences
  4. Submit improvements - see CONTRIBUTING.md for guidelines
  5. Share your experience - what are you building with it?

Connect:

What We Built Together

Eight articles. One complete AI platform.

We started with seven broken Lambda functions. We built:

  • Agent orchestration that handles complex multi-step workflows without timeouts
  • TypeScript SDK with perfect IntelliSense, streaming support, and smart error handling
  • Cost control that prevents $2,847 surprises with budgets and rate limits
  • Production security with authentication, encryption, and monitoring
  • One-command deployment that gets you running in under an hour

The platform serves 1,500+ requests daily. It's survived 8 months in production. It's processing everything from document analysis to research workflows. And it's completely open source.

The Hard-Won Lessons

Building production AI infrastructure taught me things tutorials never mention:

Technical truths:

  • Cost control is life support, not a nice-to-have feature
  • Lambda excels at tools, fails at orchestration
  • Streaming looks simple, implementation is brutal
  • Type safety prevents expensive mistakes at 3AM

Business realities:

  • Developers pay for great experience, abandon bad APIs
  • Open source builds trust better than marketing
  • Production numbers matter more than perfect demos
  • Failure stories teach more than success posts

Personal discoveries:

  • Building in public creates accountability
  • Documentation is your product's face
  • Shipping beats perfecting every time
  • Sharing mistakes helps everyone improve

Your Turn

You have everything you need. Real code, real examples, real production lessons. The platform is MIT licensed - use it, improve it, make money with it.

Next steps:

  1. Star the repos - ai-platform-aws and examples
  2. Deploy example 07 - full platform in under an hour
  3. Build something cool - then tell me about it
  4. Share your experience - help others learn from your journey

Get stuck? Email me at tyson@hivo.co or find me on Twitter @tysoncung.

The AI revolution needs better infrastructure. You can build it.

Go.


End of series: "Building an AI Platform on AWS from Scratch". Complete platform and examples at github.com/tysoncung/ai-platform-aws.

Top comments (6)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.