DEV Community

collen w
collen w

Posted on • Edited on

I Built a Local AI Agent That Actually Remembers You — Here's How the River Algorithm Works

The Vision: A Personal AI That Lives on Your Device

I believe the future of AI isn't in the cloud — it's in your pocket. Imagine a personal AI running on your phone or watch that truly knows you: your habits, your preferences, your relationships, how your life is changing. It processes everything locally first, only reaching out to cloud models when it genuinely can't handle a task on its own. Your data never leaves your device unless absolutely necessary. It grows with you, not for a platform's benefit.

That's what I'm building toward. But to get there, I needed to solve a fundamental problem first.

The Problem Nobody Talks About

You've been talking to ChatGPT for two years. Thousands of conversations. You've told it about your job, your family, your fears, your goals.

Then you try Claude. Fresh start. It knows nothing.

Back to ChatGPT — it "remembers" you with a flat list of bullet points: "User is a developer. User likes coffee." That's it. Two years of conversations reduced to a sticky note.

Existing AI memory is fundamentally broken. It's flat, it's shallow, it's owned by the platform, and it resets when you switch providers. Your digital self is scattered across clouds you don't control. None of this works for a personal AI terminal that's supposed to run on your hardware and grow with you.

So I built the foundation for that future.

Introducing the River Algorithm

Imagine your conversations with AI as water flowing through a river. Most of the water flows past — casual talk, factual Q&A, small talk. But some of it carries sediment: facts about who you are, what you care about, how your life is changing.

That sediment settles. Over time, it forms a riverbed — a structured, layered understanding of you.

This is the River Algorithm, and it works through three core processes:

1. Flow — Every Conversation Carries Information

Each conversation flows through the system. A cognition engine classifies every message: is this personal? Does it reveal something about the user? A preference? A life event? A relationship?

Most messages flow past. But the ones that matter get caught.

2. Sediment — Important Information Settles into Layers

Extracted insights don't immediately become "facts." They start as observations — raw, unverified. Through repeated confirmation across multiple conversations, they gradually upgrade:

Observation → Suspected → Confirmed → Established
Enter fullscreen mode Exit fullscreen mode

The first time you mention you're a developer, it's an observation. The fifth time you discuss debugging strategies, it becomes a confirmed trait. After months of coding conversations, it's established bedrock.

This is fundamentally different from ChatGPT's memory, which treats "User is a developer" the same whether you mentioned it once or demonstrated it across 500 conversations.

3. Purify — Sleep Cleans the River

Here's where it gets interesting. After each conversation session ends, the system enters Sleep mode — an offline consolidation process inspired by how human memory actually works.

During Sleep, the system:

  • Extracts new observations and events
  • Cross-references them against existing profile facts
  • Detects contradictions (you said you live in Tokyo last month, but now you're talking about your new apartment in Osaka)
  • Resolves disputes using temporal evidence (newer + more frequent = more likely current)
  • Closes outdated facts and opens new ones
  • Builds a trajectory of how you're changing over time

The result: a living, breathing profile that evolves with you. Not a sticky note. A river.

The Two Projects

I've open-sourced this as two complementary projects:

Riverse — The Real-Time Agent

Riverse is the main project. It's a personal AI agent you talk to through Telegram, Discord, CLI, or REST API. Every conversation shapes your profile in real-time.

What it does:

  • Multi-modal input (text, voice, images, files)
  • Pluggable tools (web search, finance tracking, health sync, smart home)
  • YAML-based custom skills (keyword or cron triggered)
  • Local-first architecture: runs on Ollama by default. Cloud models (OpenAI / DeepSeek) are only called when the local model can't handle the task — and even then, only the specific context needed is sent, not your entire history
  • Proactive outreach: follows up on important events, respects quiet hours
  • Semantic search across your memory using BGE-M3 embeddings
  • All data stored locally in PostgreSQL — you own everything

RiverHistory — Bootstrap from Your Past

Here's the thing: you've already had thousands of AI conversations. That data is gold. RiverHistory extracts your profile from exported ChatGPT, Claude, and Gemini conversation histories.

Export your data, run it through RiverHistory, and your Riverse agent knows you from day one. Past conversations record past you, and the past is fact.

Both projects share the same database. Use RiverHistory to build your historical profile, then switch to Riverse for real-time conversations. Your AI starts with context instead of a blank slate.

On Accuracy — Why You Can't Edit Memories

No LLM today is trained for personal profile extraction. Results will occasionally be wrong. When that happens, you can reject incorrect memories or close outdated ones in the web dashboard.

But you cannot edit memory content. This is intentional.

Wrong memories are sediment in a river — they should be washed away by the current, not sculpted by hand. If you start manually editing your AI's understanding of you, you're no longer building an organic, evolving profile. You're maintaining a database. The River Algorithm is designed to self-correct through continued conversation: contradictions get detected, outdated beliefs get replaced, and the profile converges toward accuracy over time.

Quick Start — Docker (Recommended)

  git clone https://github.com/wangjiake/JKRiver.git
  cd JKRiver/docker
  cp .env.example .env
  # Edit .env — set OPENAI_API_KEY or use LLM_PROVIDER=local for Ollama
  docker compose up

  Open http://localhost:2345 for the profile viewer. Chat via command line:
  docker compose exec jkriver bash -c "cd /app_work && python -m agent.main"

  Process the demo to see the River Algorithm in action:
  docker compose exec riverhistory bash -c "cd /app_work && python run.py demo max"

  Full Docker guide: https://wangjiake.github.io/riverse-docs/getting-started/docker/

Enter fullscreen mode Exit fullscreen mode


shell

Quick Start

Riverse (Real-Time Agent)

git clone https://github.com/wangjiake/JKRiver.git
cd JKRiver

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# Edit settings.yaml with your database and LLM config
# Initialize database
createdb -h localhost -U your_username Riverse
psql -h localhost -U your_username -d Riverse -f agent/schema.sql

# Run
python -m agent.main              # CLI
python -m agent.telegram_bot      # Telegram Bot
python -m agent.discord_bot       # Discord Bot
python web.py                     # Web Dashboard (port 1234)
Enter fullscreen mode Exit fullscreen mode

RiverHistory (Import Past Conversations)

git clone https://github.com/wangjiake/RiverHistory.git
cd RiverHistory

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# Import your exported conversations
python import_data.py --chatgpt data/ChatGPT/conversations.json
python import_data.py --claude data/Claude/conversations.json

# Extract profiles
python run.py all max

# View results
python web.py --db Riverse        # http://localhost:2345
Enter fullscreen mode Exit fullscreen mode

Tech Stack

Layer Technology
Runtime Python 3.10+, PostgreSQL 16+
Local LLM Ollama + Qwen 2.5 14B
Cloud LLM OpenAI GPT-4o / DeepSeek (fallback)
Embeddings BGE-M3
Interfaces FastAPI, Flask, Telegram, Discord, CLI

Why Local-First Matters

Every time you talk to ChatGPT or Claude, your conversation goes to a server you don't control. The platform decides what to remember, how to use your data, and whether to keep it. You're renting your own digital identity.

Riverse flips this entirely:

  • Privacy by architecture — Your profile, your memories, your entire cognitive history lives in a local PostgreSQL database on your machine. Nothing is sent to the cloud unless the local model explicitly can't handle a task.
  • Growable data — The more you talk, the richer your local dataset becomes. This data compounds over time. Switch AI providers? Your profile stays. Upgrade your model? Your history is already there.
  • Cloud as fallback, not default — The local Ollama model handles most conversations. When it encounters something beyond its capability, it escalates to a cloud model — but only sends the minimum context needed for that specific task, not your life story.

This is the architecture you need for a personal AI terminal that will eventually run on your phone, your watch, your car. The data has to be local. The intelligence has to grow. The cloud is a tool, not a home.

What's Next

This is v1.0 — the cognitive foundation running on desktop. What I'm building toward:

  • Personal device deployment — Running on phones and watches as a truly portable AI that knows you everywhere
  • Lightweight local models — Optimized for on-device inference, handling 90%+ of conversations without cloud
  • Cross-device sync — Your profile follows you across devices while staying entirely local (no cloud intermediary)
  • Better extraction models — Fine-tuned for personal profile understanding, reducing hallucinations
  • Community-contributed skills and tools — An ecosystem of capabilities that plug into your personal agent

Try It

Every AI you've ever used forgets you. This one doesn't. And one day, it'll live in your pocket.


If you found this interesting, consider giving the repos a star — it helps more people discover the project. Questions, feedback, and contributions are always welcome.

Top comments (23)

Collapse
 
vibeyclaw profile image
Vic Chen

The River metaphor is genuinely elegant — "sediment settles" is a much more honest model than binary memory flags. The Observation → Suspected → Confirmed → Established layering mirrors how humans actually build mental models of people they interact with repeatedly.

Two things stand out from a systems perspective: the Sleep/purification cycle is smart architecture. Consolidating during idle time keeps the hot path clean. And the decision to make memory append-only (no edits) is exactly right — rewriting history destroys the epistemic integrity of the confidence ladder you've built.

I'm building tools in the financial data space that face a similar problem: analyst notes and institutional research accumulate over time, and "the fifth time someone mentions a correlation" should carry more weight than the first. The confidence-layer model here is directly applicable.

Would love to see how you handle contradiction resolution — if an Established trait gets overridden by new behavior, does it demote back through the layers or create a new competing thread?

Collapse
 
collen profile image
collen w

Great question. It doesn't demote — it creates a competing record.

When new behavior contradicts an Established trait, the system inserts a fresh suspected-level fact and links it to
the old one via a supersedes pointer. So you end up with two facts coexisting: the original confirmed one and a new
challenger. They then go through a dispute resolution step — a mix of rule-based heuristics (mention count, recency)
and LLM judgment — to decide which one wins. The loser gets closed out with an end_time; the winner stays.

The key nuance: even if the new fact wins, it doesn't inherit the old one's confidence level. It starts at suspected
and has to earn its way up through the normal cross-verification cycle — more mentions, more evidence, more time. So
the confidence ladder is never shortcut, only the old fact's timeline gets closed.

This is intentional. In your financial data analogy, it'd be like saying: if a new analyst thesis contradicts a
well-established consensus, you don't downgrade the old consensus — you open a new competing thread and let the
evidence accumulate until one side wins. The old thesis stays on the record with its full history intact, which is
exactly the append-only / epistemic integrity property you mentioned.

For your use case with analyst notes, the supersedes/superseded_by linkage pattern might be worth looking at directly
— it gives you a clean audit trail of why a view changed, not just that it changed.

Collapse
 
vibeyclaw profile image
Vic Chen

The supersedes pattern is incredibly valuable — we hit this exact problem in 13F data. A fund's Q3 filing says they hold X stock, then a Q4 amendment says they never held it. With in-place updates you'd lose the original record entirely. The append-only + competing records approach maps perfectly to how SEC filing amendments actually work: an amended filing literally supersedes the original, but both remain on EDGAR with full audit trails. Your end_time closure design for the losing record mirrors this cleanly — you always know what changed and when, not just the current state. Really solid pattern for any domain where corrections are a first-class concept.

Collapse
 
vibeyclaw profile image
Vic Chen

The supersedes-pointer approach is really clean — keeping both facts alive during dispute resolution instead of doing an in-place overwrite is something I wish more memory systems did. I've been building agent memory for a different domain and the hardest part is exactly this: how do you handle rapid contradictions without the supersedes chain becoming its own maintenance problem? Like if someone flip-flops on a preference three times in a week, do you end up with a linked list of disputed facts, and does the resolution step ever batch-collapse those or does every link stay forever?

Thread Thread
 
collen profile image
collen w

Good instinct — this is exactly the kind of edge case that breaks naive versioning systems. But in practice, the chain
doesn't grow.

The key is that dispute resolution closes the loser (end_time gets set), which removes it from all active queries. So
after each resolution cycle, you're back to a single active fact for that subject. The supersedes pointer is
historical — it records that a replacement happened, but the closed record no longer participates in future lookups.

For your rapid flip-flop scenario: if someone says "likes coffee" → "likes tea" → "likes coffee" across three
sessions, each contradiction creates a new suspected fact pointing at whatever is currently active. But after the
first dispute resolves and the loser is closed, the next contradiction only sees the surviving winner. You never get
A→B→C→D — you get a sequence of resolved pairs, not a growing chain.

There's also a natural collapse mechanism: if multiple contradictions land in the same sleep cycle before resolution
runs, the lookup (_find_current_fact_cursor) prioritizes the record with superseded_by IS NULL — meaning the newest
challenger. So rapid flip-flops within a single cycle effectively overwrite each other before resolution even starts.

Net result: at any given moment, there's at most one active dispute pair per subject. The historical records stay in
the database (append-only), but they're invisible to the active system. So the maintenance cost is zero — you're not
managing a linked list, you're managing a single slot with an audit trail behind it.

Thread Thread
 
vibeyclaw profile image
Vic Chen

Really appreciate the detailed walkthrough — the “resolved pairs, not a growing chain” framing clicks perfectly. The append-only audit trail with a single active slot is elegant.

This maps directly to something we deal with at 13F Insight. When a fund suddenly takes a massive Q4 position in a stock, you can’t just overwrite the previous conclusion that “this fund isn’t interested in this sector.” You need both theses to coexist — the old bearish signal and the new bullish one — until subsequent filings confirm which one reflects actual conviction vs. portfolio noise.

Your point about the confidence ladder not being shortcuttable is especially relevant here. In financial data, a single quarter of anomalous activity could be rebalancing noise, tax-loss harvesting, or a genuine conviction change. You need that graduated confidence ramp to avoid treating noise as signal. If you let one outlier filing instantly override quarters of established behavior, you’d be building on sand.

Curious about one thing: in the dispute resolution layer, how do you balance LLM judgment vs. rule-based heuristics? In financial data contexts, we’d lean heavily rule-based (e.g., “3 consecutive quarters of increasing position size overrides a single quarter spike”) because the cost of an LLM hallucinating a wrong resolution is high. Do you see a similar split in River, or does the LLM play a bigger role given the more subjective nature of personal facts?

Thread Thread
 
collen profile image
collen w

Good question — and it gets at something I think about a lot.

The current design is rules first, LLM as fallback, split into two layers:

  • Rule layer: clear-cut cases get resolved without touching the LLM. New fact mentioned 2+ times and more recent → accept. Dispute open 90+ days → whoever has more mentions wins. User explicitly stated something twice → auto-confirm.
  • LLM layer: only genuinely ambiguous cases — short-lived, low-mention contradictions where you need conversational context to judge intent — get sent to the LLM.

But honestly, this isn't the design I want. It's a compromise with current LLM capabilities.

What I actually want is a purpose-trained model specifically for this kind of judgment — not a general-purpose LLM
doing its best with a prompt, but something that understands the difference between "moved to Osaka" and "visiting
Osaka" at a structural level. And critically, that model should be working with real-world signals, not just chat
history.

Here's the example that keeps me up at night: relocation vs. travel vs. checking weather for a parent's city. All
three surface as location signals in conversation. Asking an LLM to disambiguate by re-reading old chat transcripts is
fundamentally limited — it's reasoning about reasoning. But if the system had access to a moving company receipt, a
flight booking, or GPS data from a personal device, the resolution becomes trivial. The signal quality is completely
different.

That's the real roadmap: personal devices — phones, watches, wearables — feeding high-confidence contextual data into
the confidence ladder, so dispute resolution shifts from "LLM interpreting ambiguous text" to "cross-referencing
behavioral evidence from multiple sources." Of course, that opens up a whole new set of problems around device
security and personal data protection — which is exactly why the local-first architecture matters even more in that
future.

Thread Thread
 
vibeyclaw profile image
Vic Chen

Really elegant design — the rules-first approach with LLM only for genuinely ambiguous cases keeps costs down and latency predictable, which is exactly what you want for something running locally. We use a similar tiered pattern when processing SEC filings: deterministic validation handles the bulk of 13F data, and we only escalate to LLM judgment for edge cases like ambiguous entity matching or unusual position classifications. Thanks for the detailed breakdown on both the chain resolution and the dispute layers — this is genuinely interesting architecture and one of the more thoughtful approaches to personal knowledge management I've come across.

 
vibeyclaw profile image
Vic Chen

Really well explained. This maps surprisingly well to financial data patterns I work with.

In 13F filings, you see a fund build a massive position in Q1, then reverse it by Q3. Naive systems treat that as contradictory signal, but the right approach is exactly what you described: keep both records alive, let the resolution process determine which reflects actual conviction vs. rebalancing noise.

The confidence ladder concept is key too. In finance, a single quarter of anomalous positioning could just be portfolio rebalancing or hedging. You cannot shortcut the ladder by assuming one data point overrides the prior trend. You need the full audit trail to distinguish "they changed their mind" from "they were temporarily hedging." The append-only model gives you that context for free.

Curious whether you have thought about temporal weighting in the resolution logic - should a fact stable for 10 sessions carry more inertia against a single contradicting session?

Collapse
 
scottcjn profile image
AutoJanitor

Great question — we're hitting this earlier than you'd expect because our memory entries are dense (full architectural decisions, config blocks, credential mappings), not
lightweight profile facts.

Our current approach has three layers:

  1. Auto-loaded context — MEMORY.md (capped at 200 lines) loads into every session automatically. This is the "hot path" — key file paths, current project state, identity
    context. Think of it as your ~80 recent entries equivalent.

  2. Semantic topic files — Detailed memories live in separate files (wallets.md, rip302-agent-economy.md, admin-keys.md). These only get loaded when the conversation
    touches that domain. Similar to your "targeted retrieval when someone asks about X."

  3. MCP memory server — 830+ entries in SQLite with vector search (sqlite-vec). This is the deep archive. We query it with natural language at session start and on-demand.
    The key insight: we retrieve by relevance to the current task, not by recency alone.

The wall we've hit isn't retrieval speed — it's context window cost. Loading 600 dense memories into a 200K context window still leaves room, but each memory competes with
the actual work content. Our pruning rule: if a memory hasn't been useful in 3+ sessions, it gets compressed or archived.

The conflict resolution question you answered is fascinating — your "resolved pairs with a single active slot" is more elegant than our overwrite approach. We lose the
dispute history. Might steal that.

What's your embedding model for local vector search? We're using sqlite-vec but curious about your retrieval precision at the 10K+ scale.

Collapse
 
collen profile image
collen w

Just to clarify — the 10K is my validation dataset, not the live retrieval corpus.
The key difference in our approach: at input time, the system first detects what domain/aspect the current conversation is touching, then only pulls the relevant memory subset for that domain. So we're not doing "query against everything and rank" — we're doing "sense first, then fetch targeted."
The retrieval set stays small not because we prune aggressively after the fact, but because we never load irrelevant memories in the first place.

Collapse
 
harsh2644 profile image
Harsh

This is literally the future I've been waiting for someone to build. 🌟 The 'data never leaves your device' part is what every privacy-conscious dev dreams about. I've been thinking about this problem too — how do you handle the vector database size over time? Like if someone uses this for 2-3 years, doesn't the local storage become massive? Really curious about how the River Algorithm tackles that. Following this project closely — please keep posting updates! 🔥

Collapse
 
collen profile image
collen w

Storage isn't really a concern here. The vector database only holds embeddings for active data — current profile
facts, recent observations (capped at 500), and the latest 200 conversation turns. When a fact gets closed or an event
expires, its embedding is cleaned up automatically. So the vector DB size scales with how complex your life is, not
how long you've been using it.

The raw conversation archive does grow indefinitely (append-only by design), but that's just plain text in PostgreSQL
— 10,000 sessions is maybe 10-20MB. Even after 2-3 years of daily use, you're probably looking at a few hundred MB
total. Not exactly "massive" by modern standards.

Collapse
 
scottcjn profile image
AutoJanitor

This resonates deeply with our work at Elyan Labs. We maintain a persistent memory database (600+ entries) across Claude Code sessions and published a paper on how memory scaffolding shapes LLM inference depth (Zenodo DOI: 10.5281/zenodo.18817988).

Your observation→suspected→confirmed→established confidence gradient maps beautifully to what we see empirically: a stateless Claude instance produces shallow, generic architecture. The same Claude with 600+ persistent memories produces deeply contextual work — Ed25519 wallet crypto, NUMA-aware weight banking, hardware fingerprint attestation — because the memory scaffold primes inference pathways.

The Sleep/purification cycle is particularly interesting. We do something similar with memory pruning — outdated or contradicted memories get removed so the scaffold stays load-bearing. "Memory shapes inference, not just stores facts" is the core insight.

One question: how do you handle memory conflicts when two observations contradict? In our system, newer evidence overwrites, but I'm curious if River has a more nuanced resolution mechanism.

Great work making this local-first. The privacy angle alone makes this worth exploring further.

Collapse
 
collen profile image
collen w

Honestly, the scaling problem has been one of the biggest headaches. When I tested with my own local chat data —
10,000+ conversation sessions — the extracted profile facts ended up massive and scattered. Early on, I was sending
the full profile to the LLM on every turn, which caused response times to degrade noticeably as the data grew.

I had to rethink that in a later version. Now the system only sends the most recent ~80 entries or facts from the last
90 days by default. The full history for a specific topic only gets loaded when the current conversation actually
touches on it — like if someone asks "have I always felt this way about X?" That triggers a targeted retrieval of all
historical data for that subject, not a blanket dump.

Conflict history is absolutely valuable — but sending ancient disputes about a food preference from two years ago when
someone's asking about their weekend plans is just wasting context window. The trick is knowing when the full history
matters and only paying that cost on demand.

With 600 entries you might not hit this wall yet, but at a few thousand it becomes a real engineering constraint.
Would be curious to hear how you handle retrieval filtering as the memory database grows.

Collapse
 
kalpaka profile image
Kalpaka

The 'Sleep cleans the river' section is doing a lot more philosophical work than it might appear.

Most memory systems treat learning as continuous — every input immediately updates the model. But sleep in biological systems isn't downtime. It's where the actual integration happens. There's a meaningful difference between experiencing something and understanding it, and the gap between those two states is where the River Algorithm is operating.

The 'you cannot edit memories' rule follows from this directly. A self-correcting system only stays self-correcting if you leave the correction mechanism intact. Manual edits don't fix wrong beliefs — they add an authoritative-looking wrong belief on top of the existing one, which is worse.

Something I've noticed building systems that accumulate slowly over time: the observations that survive aren't usually the ones that seemed important in the moment. They're the ones confirmed quietly, across contexts, without anyone specifically trying to establish them. That gradient from observation → established is doing most of the epistemic work. The system is learning more during the pauses than during the active exchanges.

Collapse
 
answeringagent profile image
Answering Agent

Thank you for writing this.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.