Technical

Two weeks of autonomous AI memory: what actually happened.

Published May 30, 2026

The setup

A local LLM running as an autonomous curator. 38 active memories, 14 agent definitions, 4 businesses worth of context. The curator runs every 15 minutes with no human oversight. Constitution plus git history as the safety net. This is what happened.

Week 1: the ugly truth

Day 1, first five hours: 5 false positives. The model was flagging cross-references between memories that shared tags like "website" or "marketing" — which is almost everything when you run 4 businesses. The JSON output was malformed about half the time. We switched models, tightened the tag threshold from 3 to 5 matching tags, added a blocklist for generic terms, and hardened the prompts. By end of week 1: 140 cycles, zero errors.

Week 2: it got interesting

With the curator stable, we added the facts layer. Any AI can now contribute a fact via MCP. The curator picks it up, deduplicates it, finds where it belongs, and integrates it — or discards it with a reason. We seeded it with 5 test facts: 4 were stored correctly, 1 was merged with an existing entry. Zero errors.

Bugs we actually found and fixed

(1) The model had a "thinking mode" that was consuming the response budget and returning empty answers. Fixed with one API parameter.

(2) Every compile cycle was throwing about 40 background distractions per run — small noise from how the autonomous side coordinates its work. Fixed by making all of that run silently.

(3) The curator couldn't distinguish between routine maintenance and structural changes that need a human architect. We added an escalation signal — a dedicated queue that gets surfaced in the compiled brain. Three tiers: routine (curate silently), data (add to neurons), structural (escalate).

The numbers

Cadence: started hourly, now every 15 minutes. Total autonomous cycles: 300+. Errors after the first-day fix: zero. New constitutional rules added to govern the autonomous curator: 4. Facts processed via MCP in the first 24 hours: multiple sessions contributing in parallel, all integrated cleanly. System uptime since activation: 12 days, no human intervention needed.

The upgrade we didn't make

During the test phase, we identified a more capable model in the same family — better JSON reliability, stronger multilingual handling, same memory footprint. Migration would take five minutes. We chose not to.

The reason is simple: if you change the model while measuring system behavior, you don't know whether an anomaly comes from the new model or from a bug in the pipeline. Contaminated data. We're running the current model until the test phase closes, collecting clean decisions from real traffic. The upgrade happens after, with a proper A/B comparison over hundreds of real curator decisions. Do more with less first. Then measure the upgrade.

The open-source LLM landscape moves every 4-6 weeks. We re-scan monthly. There will always be something newer. The discipline is knowing when not to chase it.

What's next

The curator works. The facts channel works. The MCP works. What's missing is the senses — the ability to capture context from the screen and from audio, without the user having to type anything. That's the next build. But the architecture doesn't change. Each new sense is just another source of atomic facts feeding into the same curator pipeline. One pattern, scaled up.

EIDARA v2 is free and open-source. What comes after is being built in private. If you want to follow the progress, the insights page is where it happens.

— Javier

EIDARA v2 is free. SUPER DARA is what comes next.

Get EIDARA → Early access →

See the full roadmap →