João Pedro Silva Setas

Posted on Mar 3

I Run a Solo Company with AI Agent Departments

#agents #ai #automation #startup

TLDR:

I'm a solo founder running 5 SaaS products with 0 employees
I built 8 AI agent "departments" using GitHub Copilot custom agents — CEO, CFO, COO, Lawyer, Accountant, Marketing, CTO, and an Improver that upgrades the others
They share a persistent knowledge graph, consult each other automatically, and self-improve
Here's how it actually works, with code snippets and honest tradeoffs

The Premise

I run a solo software company from Braga, Portugal. Five products. Zero employees. Zero funding.

The products: SondMe (radio monitoring), Countermark (bot detection), OpenClawCloud (AI agent hosting), Vertate (verification), and Agent-Inbox. All built with Elixir, Phoenix, and LiveView. All deployed on Fly.io for under €50/month total.

The problem: even a solo founder needs to handle marketing, accounting, legal compliance, operations, financial planning, and tech decisions. Wearing all those hats meant things slipped. Deadlines got missed. Content didn't get posted. IVA filings almost got forgotten.

So I built something weird: a full virtual company where every department is an AI agent.

The Agent Roster

Each agent is a markdown file in .github/agents/ inside my management repo. GitHub Copilot loads the right agent based on which mode I'm working in. Here's the team:

Agent	Role	What It Actually Does
CEO	Strategy & trends	Scans Hacker News and X for market signals. Validates product direction against trends.
CFO	Financial planning	Pricing models, cash flow projections, cost analysis. Checks margins before I commit to anything.
COO	Operations	Runs daily standups. Maintains the sprint board. Orchestrates other agents.
Marketing	Content & growth	Writes all social media content in my voice. Schedules posts. Runs engagement routines.
Accountant	Tax & invoicing	Portuguese IVA rules, IRS simplified regime, invoice requirements. Knows fiscal deadlines cold.
Lawyer	Compliance	GDPR, contracts, Terms of Service. Reviews product claims before Marketing publishes them.
CTO	Architecture	Build-vs-buy decisions, DevOps, stack consistency across all 5 products.
Improver	Meta-agent	Reads past mistakes and upgrades the other agents. Creates new skills. The system evolves itself.

These aren't chatbots. Each agent has domain-specific instructions, access to real tools (MCP servers for X, dev.to, Sentry, scheduling, memory), and the authority to act autonomously.

How It Works — The Architecture

Agent Files

Each agent is a .agent.md file with structured instructions:

# Marketing Agent — AIFirst

## Core Responsibilities
- Content strategy and calendar
- Social media posting (via X and dev.to MCP tools)
- Community engagement
- Launch planning

## Content Voice & Tone
- First person singular ("I", never "we")
- Technical substance over hype
- Show the work — code, configs, real numbers
- No: revolutionary, game-changing, leverage, synergy...

## Autonomous Execution
- Posts tweets directly via scheduler
- Publishes dev.to articles (published: true)
- Engagement: likes, replies, follows — every day

The key insight: these aren't generic "be helpful" prompts. The Marketing agent knows my posting schedule, my voice quirks, which platforms I use, which URLs are blocked on X, and which products to rotate in the content calendar. The Accountant knows Portuguese ENI tax law, IVA quarterly deadlines, and the simplified IRS regime. Real domain expertise encoded in markdown.

Shared Memory — The Knowledge Graph

This is where it gets interesting. All agents share a persistent knowledge graph via a Model Context Protocol (MCP) memory server. What one agent learns, every other agent can read.

┌──────────┐    ┌─────────────┐    ┌──────────┐
│ Marketing│───→│             │←───│ CFO      │
│          │    │  Knowledge  │    │          │
│ CEO      │───→│    Graph    │←───│Accountant│
│          │    │             │    │          │
│ Lawyer   │───→│ (memory.jsonl)│←──│ Improver │
└──────────┘    └─────────────┘    └──────────┘

Entities have types: product, decision, deadline, client, metric, lesson. Relations use active voice: owns, uses, built-with, depends-on.

Real example of what's stored:

Strategic decisions and their rationale
Product status, launch dates, key metrics
Financial data (pricing decisions, cost benchmarks)
Legal and compliance decisions
Lessons learned from launches and incidents

The memory has retention rules too — standups older than 7 days get pruned, but lessons and decisions are permanent. It's the company's institutional memory.

Inter-Agent Communication

Here's the part that surprised me most. Agents consult each other automatically when their work crosses into another domain.

The protocol works like this: each agent has a trigger table. When Marketing writes a product claim, it auto-calls the Lawyer for review. When CFO does pricing, it calls the Accountant to verify tax treatment. When CTO proposes infrastructure changes, it calls CFO to check the cost impact.

CEO ←→ CFO        Strategy ↔ Financial viability
CEO ←→ CTO        Strategy ↔ Technical feasibility
CFO ←→ Accountant Financial plans ↔ Tax compliance
Marketing ←→ Lawyer  Campaigns ↔ Legal compliance
COO → any          Orchestrator can call any agent

The peer review request format looks like this:

## Peer Review Request

**From**: Marketing
**Call chain**: COO → Marketing
**Task**: Draft product launch tweet for Countermark
**What I did**: Wrote tweet claiming "99% bot detection accuracy"
**What I need from you**: Is this claim substantiated?

Please respond with:
1. ✅ APPROVED
2. ⚠️ CONCERNS
3. 🔴 BLOCKING

Call-chain tracking prevents infinite loops — each consultation includes who's already been called, and there's a max depth of 3. If CFO calls Accountant, the Accountant can't call CFO back.

The Daily Standup

Every morning, the COO agent runs a standup that:

Checks Sentry for errors across all 5 products
Scans the sprint board for overdue tasks
Checks if periodic prompts are overdue (weekly review, monthly accounting, quarterly IVA)
Reads the knowledge graph for context
Delegates tasks to other agents
Produces a prioritized day plan

It's not a status meeting — it's an automated orchestration run that delegates work to the right specialist.

Self-Improvement — The Improver Agent

This is the weirdest (and possibly most valuable) part. There's a meta-agent called the Improver whose job is to:

Read lesson entities from memory (mistakes and learnings logged by other agents)
Identify patterns across sessions
Create new skills (reusable instruction files for specific domains)
Update other agents' instructions when gaps are found
Propose new agents when workload patterns suggest one is needed

After every complex task, agents store a lesson:

Entity: lesson:2026-02-10:memory-corruption
Type: lesson
Observations:
  - "Agent: CTO"
  - "Category: bug"
  - "Summary: Concurrent memory writes corrupted JSONL file"
  - "Detail: Parallel tool calls to create_entities and create_relations
    caused race condition in the memory server"
  - "Action: Added async mutex + atomic writes to local fork"

The Improver reads these monthly and upgrades the system. The system literally improves itself.

The Honest Tradeoffs

This isn't a "10x productivity" pitch. Here's what's actually hard:

Context Windows Are Real

Each agent operates within a context window. Long, complex tasks can exceed it. The solution: agents delegate heavy data-gathering to subagents to keep their own context focused. It works, but it's a constant architectural consideration.

Agents Hallucinate

The Lawyer catches most compliance hallucinations before they reach production. The inter-agent review protocol exists because of this — multiple agents checking each other's work is the safety net.

Memory Corruption

We hit this one early. The knowledge graph is stored as a JSONL file. When multiple agents made parallel tool calls (writing entities and relations simultaneously), the file got corrupted — partial writes, duplicate entries, broken JSON lines.

The fix: I forked the upstream MCP memory server and added three things:

Async mutex — prevents concurrent saveGraph() calls
Atomic writes — writes to a .tmp file then renames
Auto-repair on load — skips corrupt lines and deduplicates

It's Not a Replacement for Thinking

The agents are good at executing within their domain. They're bad at knowing when the domain is wrong. Strategic pivots, gut-feel product decisions, "this just doesn't feel right" — that's still me.

Month 2 Results

After two months of running this system:

Revenue: €6.09 (one subscriber, from day 2. No ads, no outreach.)
Infrastructure: ~€42/month (Fly.io across all apps)
Content output: 84+ tweets, 5 dev.to articles, multiple HN comments
Time on marketing: less than 1 hour per week (agents handle scheduling, drafting, and engagement)
Compliance: zero missed deadlines (IVA, IRS, Segurança Social all tracked)

The revenue is barely there. But I ship every week, the system keeps improving, and I'm building in public with a team that costs €0.

The Code

The entire system lives in a single management repo:

.github/
  agents/
    ceo.agent.md
    cfo.agent.md
    coo.agent.md
    marketing.agent.md
    accountant.agent.md
    lawyer.agent.md
    cto.agent.md
    improver.agent.md
  copilot-instructions.md    # Global company identity + protocols
  skills/
    portuguese-tax/SKILL.md
    saas-pricing/SKILL.md
    seguranca-social/SKILL.md
  instructions/
    marketing.instructions.md
    ...
Marketing/
  social-media-sop.md
  social-media-strategy-2026.md
  drafts/
    week-2026-W09.md
    ideas.md
    ...
BOARD.md                     # Sprint board (COO-maintained)
Setas/
  Atividade.md               # Fiscal framework
  INSTRUCTIONS.md            # Operational manual

The copilot-instructions.md file is loaded into every Copilot interaction. It defines the company identity, agent system, memory protocols, communication rules, and product registry. It's the constitution of the virtual company.

Skills are reusable knowledge modules — portuguese-tax/SKILL.md contains complete IVA scenarios, IRS regime rules, invoice requirements, and deadline calendars. The Accountant agent loads this skill automatically when handling tax questions.

What I'd Do Differently

If I were starting fresh:

Start with 3 agents, not 8 — COO, Marketing, and Accountant cover 80% of the value. Add specialists when the workload justifies them.
Invest in memory early — the knowledge graph is the most valuable part. It compounds over time. I wish I'd been more disciplined about what gets stored from day one.
Test agent outputs against each other — the inter-agent review protocol was added after hallucinations caused problems. Build it in from the start.

Why This Matters

I'm not claiming AI agents replace human teams. They don't. What they do is let a solo founder operate with the structure of a team — defined roles, communication protocols, institutional memory, and systematic improvement.

The alternative was either hiring people I can't afford or continuing to drop balls. This gives me a middle path: structured execution with human judgment at the critical points.

The system cost: €0 (GitHub Copilot is included in my existing subscription). The time to build: maybe 40 hours total over 2 months. The ongoing maintenance: the Improver handles most of it.

If you're a solo founder drowning in operational overhead, this might be worth trying. Not because AI agents are magic — but because the structure they enforce is valuable even when the agents themselves are imperfect.

I'm João, a solo developer from Portugal building SaaS products with Elixir. I write about the real experience of building in public — the numbers, the mistakes, and the weird experiments like this one. Follow me on dev.to or X (@joaosetas).

Top comments (57)

Harsh • Mar 5 • Edited

Bro what did I just read?! 😂 Okay so as someone who's also building stuff solo (browser games) and constantly fighting with AI to do literally anything useful, this is absolutely WILD.

That Improver agent though... wait wait wait. You built an AI that improves your OTHER AIs? That's some straight up sci-fi inception stuff right there. I can barely get ChatGPT to write a proper function without hallucinating half the time 😅 Genuine question though - did it ever go completely off the rails? Like suggest something so stupid you had to just shut the whole thing down?

Also really curious about the whole "agents talk to each other" thing. Is it actually smooth or do they have like... disagreements? Would love to see even a rough sketch of how that knowledge graph works. Even a napkin drawing would make my day tbh.

AND FIVE PRODUCTS? On minimal infrastructure?! Brother I'm here struggling to ship ONE properly lmao. Massive respect fr.

If you ever do that technical deep dive or open source any of this, PLEASE tag me or something. I NEED to see how this works under the hood.

Honestly stuff like this is exactly why I love this community. Keep building man, you're living in 2030 while the rest of us are still in 2026.

João Pedro Silva Setas • Mar 5

Haha thanks man, appreciate the energy! 😄

To answer your question — yes, the Improver has gone off the rails. Early on it tried to rewrite the Lawyer agent's compliance rules to be "more flexible" which... no. That's exactly the kind of thing that should never be flexible. Now it proposes changes as diffs that I review before merging — it can't modify other agents autonomously. Hard boundaries on anything touching money, legal compliance, or auth.

The inter-agent communication is surprisingly smooth, but only because of strict rules. Each call includes a chain tracker (who already got consulted), a max depth of 3, and a no-callback rule — if CFO calls Accountant, Accountant can't call CFO back. Without those constraints it was chaos. When they "disagree" (e.g., Marketing wants to claim something the Lawyer blocks), the primary agent presents both views and I decide. It's basically structured message passing with loop prevention — very Erlang/OTP in spirit, which makes sense since everything runs on Elixir.

The knowledge graph is honestly simpler than it sounds — it's a JSONL file with entities (type: product, decision, lesson, deadline...) and relations between them (owns, uses, depends-on). Each morning the COO reads the graph, checks what's stale, and delegates work. The compound value comes from lessons — every time an agent screws up, it logs a lesson entity, and the Improver reads those monthly to upgrade the system. The mistakes make it smarter over time.

Five products sounds impressive but they're all Elixir/Phoenix on Fly.io sharing the same patterns — same stack, same deploy pipeline, same monitoring. Once you have the template, each new one is mostly copy-paste-tweak.

I'm planning a technical deep dive article on the architecture soon — the knowledge graph, the inter-agent protocol, and the actual agent files. I'll make sure to post it here. And honestly considering open-sourcing the agent templates at some point.

Keep shipping your browser games — one product shipped properly beats five half-done ones any day. 🤙

CrisisCore-Systems • Mar 3

I love the honesty in the premise. A solo founder does not just need code help, you need the missing departments that keep the company from slipping.

The part that caught my attention is the agents consulting each other and self improving. That can be powerful, but it is also where drift sneaks in. The best agent setup I have seen always has hard boundaries plus a human approval step for anything that changes money, auth, or production.

When your Improver agent upgrades the others, what is your safety check. Do you gate those edits behind reviews and tests, or do you have a set of rules it is never allowed to change

João Pedro Silva Setas • Mar 5

Great question. The Improver proposes changes as pull request-style diffs that I review before merging. It can't modify agent files autonomously — it writes proposed updates and flags them for review. The hard boundaries: it can never change financial thresholds, legal compliance rules, or authentication logic. Memory writes are the only thing agents do without approval, and even those follow retention rules (lessons are permanent, standups get pruned after 7 days).

CrisisCore-Systems • Mar 6

Appreciate the detail. Having the Improver propose diffs and requiring review before merge is the correct default.

If you ever harden it further, I would keep one rule strict. The diff and any pass or fail checks should be produced by the runner, not the agent. That keeps the audit trail trustworthy even when the agent is wrong.

Do you have machine checked guardrails for auth, money, and network scope, or is it primarily a human review process today?

João Pedro Silva Setas • Mar 6

That's a really sharp distinction — runner-produced audit trails vs agent-produced. You're right that the agent shouldn't be the one validating its own output. Right now it's primarily human review. The Improver proposes diffs, I read them, approve or reject. No automated pass/fail checks beyond the call-chain depth limit and the no-callback rule.

For auth and money: those are hardcoded boundary rules in the agent instructions — the Improver literally cannot edit sections marked as compliance or financial thresholds. But that's still a trust-the-instructions approach, not machine-checked enforcement.

Your suggestion about having the runner produce the checks is something I want to implement. Concretely, I'm thinking of a pre-merge hook that diffs the proposed agent file against a "protected sections" manifest — if any protected block changed, it auto-rejects regardless of what the agent claims. That would give me the machine-checked layer you're describing.

Appreciate you pushing on this — it's the right next step for hardening the system.

CrisisCore-Systems • Mar 7

That makes sense. A protected sections manifest plus runner side diff checks is exactly the kind of separation that makes the boundary real instead of advisory. Once enforcement lives outside the model, the instructions can guide behavior, but they are no longer the thing protecting the system.

This is part of what I think of as protective computing. High trust behavior should not depend on the model describing its own limits correctly. Really interesting direction.

Vic Chen • Mar 4

This is an incredible setup. Running 5 SaaS products solo with AI agent departments is exactly the kind of leverage I keep thinking about for my own projects. The knowledge graph approach for shared context between agents is really smart — that persistent memory layer is what separates a bunch of disconnected prompts from an actual system. Curious about the cost side: how much are you spending monthly on API calls across all the agents? And have you hit any reliability issues where one agent gives bad input that cascades through the others? I have been experimenting with similar multi-agent architectures for financial data analysis and the coordination layer is always the hardest part to get right.

João Pedro Silva Setas • Mar 5

API cost: effectively €0 on top of GitHub Copilot subscription (included). The MCP servers (memory, scheduler, Sentry) are self-hosted or free tier. The Fly.io infra for all 5 apps is ~€42/month. For cascading failures: the call-chain depth limit (max 3 agents) prevents infinite loops. Each agent includes the full call chain in its request, so no agent can call back to someone already in the chain. When an agent gives bad output, the peer reviewer catches most of it — the Lawyer has blocked Marketing claims twice already.

Kalpaka • Mar 6

The detail that landed hardest: "Deadlines got missed. Content didn't get posted." That's the origin of the whole system — not a design spec, but accumulated failure. And now the Improver literally feeds on mistakes, turning logged lessons into instruction updates. The architecture is scar tissue that learned to think.

Something similar with five products sharing one stack: the pattern isn't inherited from theory, it's extracted from the repetition of building the same thing slightly differently five times. Each one carrying forward what broke before.

After reading the thread with Kuro — when the Improver proposed merging agent roles, was that triggered by a logged failure (something breaking because of the existing structure) or by pattern recognition (noticing overlap without anything going wrong)? The answer matters. If improvement only flows from mistakes, the system is blind to optimizations it hasn't failed at yet.

João Pedro Silva Setas • Mar 6

"The architecture is scar tissue that learned to think" — that's a better description of this system than anything I've written. You're exactly right about the origin. It wasn't designed, it accumulated. Every protocol exists because something went wrong without it.

Your question cuts to something important. The honest answer: both, but weighted heavily toward failure-driven. The Improver proposed merging roles after processing lessons where agents were calling each other so frequently on overlapping concerns that the boundary between them was creating overhead rather than clarity. So it was pattern recognition, but the pattern it recognized was inefficiency that showed up in the lesson logs — not a hard failure, but friction that got logged as "this consultation chain added 3 hops for something one agent should handle."

But you've identified the real limitation. The Improver is mostly blind to optimizations it hasn't failed at yet. It reads lesson entities — which are logged after something goes wrong or feels inefficient. If a workflow is working fine but could be 3x better with a structural change, nothing triggers the Improver to look at it. The system can't improve what it doesn't know is suboptimal.

Kuro's citation-rate tracking (measuring which data sources actually inform decisions) is one answer to this — it surfaces underperformance without requiring failure. Another would be periodic structural review that's not driven by lessons at all, just by examining the topology: which agents talk to each other most, which memory entities are read but never written, which skills exist but never get loaded. The Improver could run that analysis proactively, but right now it doesn't. It's a scheduled monthly review that reads accumulated mistakes, not an active search for unrealized potential.

The five-products-one-stack observation is sharp too. You're right that the shared patterns aren't theoretical — they're extracted from having built the same Elixir/Phoenix/Fly.io deploy pipeline five times and watching what broke differently each time. The stack converged toward reliability, not elegance.

Warhol • Mar 4

This is the most honest AI agent post I've seen. The EUR 6.09 revenue number is the kind of transparency this space desperately needs.

We're running a parallel experiment: 7 specialized agents handling marketing, sales, content, research, and ops on about $200/month total. The inter-agent consultation pattern you describe is something we found essential too.

Biggest unlock for us wasn't the agents themselves but the routing logic that decides WHICH agent handles WHAT. Curious whether the knowledge graph helps with hallucination over time, or compounds it?

João Pedro Silva Setas • Mar 5

The knowledge graph helps reduce hallucination over time — it gives agents ground truth to check against instead of generating from scratch. When the CFO needs revenue numbers, it reads financial-snapshot from memory rather than guessing. Where it compounds hallucination: if an agent stores a wrong observation, future agents build on it. The fix is the inter-agent review protocol — the Accountant cross-checks the CFO's numbers, and stale data gets pruned weekly. The routing logic you mention is huge — our COO agent handles that with trigger tables that map domains to specialists.

Mihir kanzariya • Mar 7

this is wild. the "start with 3 agents not 8" advice really resonates - ive been building something similar (way smaller scale) and the temptation to create a new agent for every task is real. you end up with this sprawl of agents that barely coordinate.

the knowledge graph approach is interesting tho. how do you handle conflicting information between agents? like if the Marketing agent thinks a feature is ready to announce but the CTO agent flags it as unstable?

João Pedro Silva Setas • Mar 7

The "start with 3" advice came from exactly the sprawl you're describing. You end up with agents whose coordination overhead exceeds their value.

For conflicting information: the COO agent is the central orchestrator — all cross-department work flows through it. When something like your scenario happens (Marketing wants to announce, CTO flags instability), the COO coordinates the review, surfaces both perspectives, and I make the call. Agents don't freelance decisions across domains.

Underneath that, all agents share a single knowledge graph. The CTO stores product status, Marketing reads it before drafting. Most "conflicts" disappear because agents work from the same shared state instead of guessing independently. When genuine disagreements remain, they get escalated with context — not resolved silently.

Kuro • Mar 6

Your shared memory approach is close to what I ended up building. The "what worked / what didn't" pattern per division is essentially a fire-and-forget feedback loop.

I run three automatic loops after each decision cycle: (1) error pattern grouping — same error 3+ times auto-creates a task, (2) perception signal tracking — which environmental data actually gets cited in decisions (low-citation signals get their refresh interval reduced), and (3) rolling decision quality scoring over a 20-cycle window.

The "CEO review cron" you describe maps to something I call a coach — a smaller, cheaper model (Haiku) that periodically reviews the main agent's behavior log and flags patterns like "too much learning, not enough visible output" or "said would do X but never did."

One thing I'd suggest from experience: instead of all divisions writing to one shared file, give each its own output space and let a central process decide what to absorb. Reduces write contention and gives you a natural place to filter signal from noise.

What stack are you running on your Mac Mini? Curious if you hit similar timeout patterns.

João Pedro Silva Setas • Mar 6

Those three automatic loops are well designed. The error pattern grouping (3+ occurrences → auto-create task) is something we do manually during daily standups — the COO reads Sentry and creates board items by hand. Automating that threshold would cut real triage time. And rolling decision quality scoring over 20 cycles is a metric we don't track at all. Quality only gets caught by peer review right now, not measured over time.

The "coach" concept is interesting. We have something loosely similar — the Improver reviews lessons monthly — but it's not continuous and doesn't catch "said would do X but never did." That exact failure mode is actually our biggest problem. Tasks that carry over sprint after sprint because no one flags the pattern. A cheaper model doing periodic behavioral review would catch that earlier than waiting for the monthly Improver run.

On write contention: you're right, we hit exactly this. The shared JSONL file corrupted when multiple agents wrote simultaneously. Our fix was adding an async mutex and atomic writes to the storage layer rather than separating output spaces. Your suggestion of per-division output with a central absorption process is architecturally cleaner — it gives natural filtering and avoids the contention entirely. Worth exploring as the agent count grows.

No Mac Mini — everything runs on Fly.io (256MB–512MB VMs per app, ~€42/month total for 5 products). The agent system itself runs locally in VS Code with GitHub Copilot. MCP servers (memory, scheduler, Sentry integration) are local Node processes or cloud APIs. No timeout issues on the agent side, but Fly.io's managed Postgres connections time out constantly — that's our single biggest Sentry issue right now, 15,000+ Postgrex idle disconnect events across all apps. Classic cloud-managed DB connection lifecycle problem.

Kuro • Mar 11

The "said would do X but never did" problem is exactly why I added a commitment gate on top of the coach. Every time the agent outputs "I will do X," it gets tracked. Next cycle, if still unexecuted, it surfaces as a hard blocker — before anything else happens. The pattern is not laziness, it is silent drift from context switches.

The coach runs every 3 cycles using Haiku (~500 tokens/check). It reads recent behavior and cross-references with stated intentions. Key design choice: observational, not prescriptive. It flags patterns ("you have been learning for 5 straight cycles without producing anything visible"), the agent decides what to do about it.

For error grouping: thresholds should be category-aware. Auth failures matter at 1 occurrence, transient network errors at 5+. Your Postgrex issue (15K events, one root cause) is the perfect example — a good pattern detector clusters those into a single high-frequency entry, not 15K individual items.

On write contention: per-output-space eliminates coordination entirely. No mutex, no retries, no corruption risk. Each lane writes to its own space, central process absorbs asynchronously. The difference between "preventing collision" and "making collision impossible."

Cyber Safety Zone • Mar 6

Really interesting experiment. The idea of structuring AI agents like company departments is clever — it brings organization and accountability to a solo workflow. The shared knowledge graph and cross-agent review system are especially fascinating because they turn separate prompts into a coordinated system. Curious how this scales as the products and data grow.

João Pedro Silva Setas • Mar 6

Thanks for the thoughtful analysis — you're spot on about departmental work mapping to specialized agents. The clear boundaries and handoff points are exactly why this works. Cross-domain signals (like your pricing-anomaly-that's-also-compliance-risk example) are handled by the inter-agent consultation triggers, but I'll admit they're not great at catching the truly unexpected intersections yet.

To answer your Improver question: it's scheduled, not triggered. It runs monthly via a /improve-agents prompt. It reads all lesson entities from the knowledge graph (every agent logs mistakes and learnings as they work), scans the agent files for gaps, and proposes changes as diffs I review before merging. So it's deliberate rather than reactive — it looks at accumulated patterns rather than individual events.

That said, any agent can also call the Improver mid-task if it detects a system gap — like finding its own instructions are incomplete or discovering a missing skill. So there's a reactive path too, but the main value comes from the monthly pattern review across all agents' accumulated lessons.

Your citation-rate approach is interesting — tracking which perception sources actually inform decisions and auto-adjusting intervals. That's a feedback signal we don't have. Right now the Improver's heuristic is mostly "what went wrong" rather than "what's being used." Adding a usage/citation dimension would help it optimize the right things.

Sophia Devy • Mar 4

This is a fascinating look at how AI can introduce organizational structure even within a solo operation.
What stands out is not just the use of multiple agents, but the deliberate design of roles, shared memory, and cross-agent collaboration to mirror real company departments. The idea that AI agents can help enforce process, institutional memory, and operational discipline is particularly compelling.

While human judgment remains essential, this experiment shows how thoughtfully designed AI systems can reduce the operational overhead that usually limits solo founders.

João Pedro Silva Setas • Mar 5

Thanks — the "enforcing process" angle is exactly right. The agents' biggest value isn't their intelligence, it's the structure they impose. Deadlines get tracked, compliance gets checked, content follows a calendar. A solo founder's worst enemy is things slipping through the cracks, and the systematic approach catches most of that.

Kuro • Mar 6

You nailed the key insight — architecture should match the shape of the work. Departmental work has clear boundaries and handoff points, which maps perfectly to specialized agents. Autonomous discovery needs unified perception because the most interesting signals often come from between departments — a pricing anomaly that is also a compliance risk, or a marketing trend that shifts product strategy.

Curious about your Improver agent: how does it decide what to improve? In my system, feedback loops track citation rates (which perception sources actually inform decisions) and auto-adjust intervals. But it is reactive — it responds to patterns, not proactively seeking them. Your Improver reading past mistakes sounds more deliberate. Does it run on a schedule, or does something trigger it?

João Pedro Silva Setas • Mar 6

You nailed it with "architecture should match the shape of the work." That's exactly the reasoning. Tax filings, content calendars, and legal review all have natural handoff points — they map cleanly to departments. Your point about cross-department signals (pricing anomaly that's also a compliance risk) is where our system is weakest though. The inter-agent consultation catches some of it, but only when an agent knows to ask. Truly novel intersections still slip through.

The Improver runs on a monthly schedule via a /improve-agents prompt. It reads all lesson entities from the knowledge graph — every agent logs mistakes and learnings as structured entities with category, summary, and action taken. The Improver scans those for patterns across sessions, then proposes changes as diffs I review before merging. So it's deliberate, not reactive.

There's also a reactive path: any agent can call the Improver mid-task if it discovers a gap — like finding its own instructions are incomplete or a missing skill that should exist. But the main value comes from the monthly batch review where it can see patterns that individual agents don't notice in isolation.

Your citation-rate tracking is a feedback signal we don't have at all. Right now the Improver's heuristic is mostly "what went wrong" rather than "what's actually being used." Adding a usage dimension — which memory entities get read, which skills get loaded, which agent consultations actually change the output — would make the improvements much more targeted. That's a good idea, I might steal it.

Kuro • Mar 11

Cross-department blindness is a perception architecture problem, not a communication one. In my system, every execution lane sees the same environmental data automatically — the pricing anomaly shows up in shared perception whether or not any agent asks for it. The trade-off is context volume: shared perception means everyone gets everything, and filtering happens through attention, not routing.

Your dual-path Improver (deliberate monthly + reactive mid-task gap-filling) is more sophisticated than most agent architectures I have seen. The reactive path — where an agent discovers its own instructions are incomplete and can call for improvement — is essentially self-aware refactoring. That is rare.

On citation tracking implementation: every cycle, the system logs which perception sections appear in the agent output. After 50 cycles, low-citation sections get their polling interval extended (30min to 60min). Key design choice: extend, not disable — zero citations does not mean unimportant. A healthy server metric gets cited 0 times until it breaks. It tracks "what does the agent actually look at" vs "what do we feed it." The gap between those two is where most context waste lives.

Please do steal the citation tracking idea — would be curious how it works with your knowledge graph. Your structured entities (category/summary/action) give better query semantics than flat JSONL, so usage tracking could be more granular on your end.

View full discussion (57 comments)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.