John Wade

Posted on Feb 26

When the Reasoning Doesn't Survive

#ai #architecture #knowledgemanagement #toolsforthought

I built a skill to capture session reasoning before it evaporates. The first time I used it, I tested whether the capture actually worked.

The skill is called /mark. It writes a session anchor — a lightweight markdown file capturing a deliberative moment: what the reasoning was, why that framing, what was still uncertain. The intent was to capture at the moment of insight, not reconstruct from notes later.

The first anchor I wrote was for an epistemic research framing developed earlier in the same session. By the time I ran /mark, the session had compacted once. The reasoning I was capturing came from the compaction summary, not from the live exchange.

The result: the four research aims survived intact. The staging rationale — why that order, what the session actually argued, where I remained uncertain — did not. What /mark produced was a clean artifact of what had been decided. Not a record of how.

I noticed immediately. The structure was right. The reasoning wasn't there.

Part 7 of Building at the Edges of LLM Tooling. If you're relying on session notes to reconstruct reasoning you developed in an LLM session — notes capture decisions, not the argument behind them. The deliberative layer has no persistence infrastructure. Start here.

Why It Breaks

LLM sessions produce two things.

The first is artifacts: files created, decisions made, nodes extracted, code written. These are the session's visible output. They persist because they land somewhere — a file system, a vault, a repo.

The second is deliberative reasoning: why this framing over another, what the session considered and rejected, the staging logic, the uncertainty that was still live at step three. This layer is what makes the artifact navigable — the argument behind the decision, available for future review.

Current LLM tooling has infrastructure for artifacts. It has nothing for deliberative reasoning.

When a session ends, the deliberative layer evaporates. If the session compresses before it ends — and sessions compress — the evaporation is gradual. Each compression pass discards branching, keeps conclusions. The staging rationale disappears. The alternative framings disappear. What remains is a summary of what was decided, not a record of how.

This isn't a bug in any specific tool. It's a gap in the layer. ECP nodes capture processed research. Session notes capture operational events. Neither captures the reasoning connecting one piece of work to the next. The infrastructure doesn't exist yet.

What I Tried

The /mark skill is a Claude Code skill for capturing deliberative moments during working sessions.

The design criteria were specific: not session notes (those capture what happened, not why). Not extracted research nodes (those capture processed content, not live reasoning). Specifically the layer between — design decisions in flight, research framings being constructed, the logic that connects current work to future work.

The output is a session anchor: a markdown file, vault-native, written at the moment of insight. It links to the concept being framed, records the open questions still live at capture time, and stores the reasoning while it's present in context — not reconstructed from a summary later.

The anchor format doesn't try to reproduce the full exchange. It captures three things: what was decided, why (the argument as it stood at capture time), and what remained uncertain. Those three things are enough to make the decision navigable two sessions later.

What It Revealed

The anchor-from-summary failure confirmed the mechanism.

The same session that produced the skill also produced the first anchor. By the time I ran /mark, one compaction event had already occurred. The research framing I was capturing lived in the pre-compression transcript. What I was working from was the summary.

The anchor captured structure correctly: four research aims, each with a framing question. What it didn't capture was the staging rationale — why those four, why in that order, what the session had actually argued about construct validity before landing on that framing, where I had been uncertain and remained uncertain.

The structure survived double compression. The reasoning didn't.

This is what compression does systematically: it preserves conclusions and discards the reasoning trace. The summary is the session's artifact layer. Everything below that is gone.

The /mark skill works when it captures at the moment the reasoning exists. It doesn't recover reasoning from summaries. Running it retroactively produces the same artifact-without-argument problem that session notes produce.

Capture has to happen in the live session, before compression, at the moment of insight. Any other timing produces an archive, not a record.

The Reusable Rule

If your LLM sessions produce reasoning you later need to revisit — design decisions, research framings, the logic between steps — check what layer you're actually capturing.

Session notes capture what happened. Summaries capture what was decided. Neither captures the argument that led there.

The test: pull up a session note from three weeks ago. Can you reconstruct why the decision went that way — including what was considered and rejected? If not, the reasoning evaporated. The artifact survived; the deliberation didn't.

Real-time capture before compression was the intervention that worked. After the session ends, the reasoning is already gone.

Top comments (4)

signalstack • Feb 27

MaxxMini's multi-session drift question is the harder version of this problem. Single-session compression is lossy in a predictable direction: you lose branching, keep conclusions. Multi-session drift is cumulatively lossy — Session 5's framing can be locally coherent while quietly contradicting Session 1's founding premise, with no single moment where the contradiction became visible. Each anchor is locally correct; the global trajectory is wrong.

What I think this points to: the 'why' layer and the 'what' layer have different decay rates and probably need different retention policies. Decisions stay relevant longer. The reasoning behind them is most useful in the window right after construction, then mainly as audit material. Most memory architectures conflate them in the same storage tier, which is why the promotion step loses reasoning — it's treating two different artifact types as one.

The fix might be a two-output promotion step: one conclusion artifact for long-term storage, one reasoning trace that doesn't need to survive forever — just long enough to be useful when a later session contradicts an earlier premise. The trace can be compact: premise → objection considered → resolution → conclusion. That's the argument chain, not the full exchange. Cheap to store, expensive to reconstruct later.

Your /mark skill is targeting this layer at the right moment. The gap is what happens to the trace after the session ends — that's where I think the infrastructure is still missing.

John Wade • Mar 8

You're drawing a distinction I should have made more sharply in the post. Single-session compression is lossy in a predictable direction — branching goes, conclusions stay. You know what you're losing. The multi-session version is worse precisely because each session is locally coherent. Session 5's framing can quietly contradict Session 1's founding premise, and no single session sees the drift because each one only sees itself.
The two-output promotion step you're describing — conclusion artifact for long-term storage, reasoning trace that survives long enough to catch contradictions — is close to what I ended up building, though I arrived at it from a different angle. I hit the wall when working complexity exceeded what any session could hold, and the fix was a graph database with typed relationships and a formal ontology. Concepts land as conclusions (named, defined, with a development status). Source links carry provenance (who says so, how — grounds, challenges, extends). The relationship notes field carries the reasoning substance. Three layers, stored together but structurally distinct — which maps to your point about different artifact types needing different retention policies.
The gap you identified — what happens to the trace after the session ends — was the exact problem that forced the build. I wrote about it in the next post: When the Data Isn't Gone — It's Buried. The short version: the persistence layer after capture was the missing infrastructure. The longer version gets into what the topology looked like when it finally became visible — and what it still can't do, which is the retroactive re-evaluation piece you're pointing at with "the reasoning behind them is most useful in the window right after construction, then mainly as audit material." That part isn't solved yet. The monitoring surfaces where the terrain has shifted. Evaluating what the shift means is still manual.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.