The moment that defines Ferni isn't the AI's intelligence. It's when Ferni remembers something you mentioned once, weeks ago, in passing.
"You mentioned your mom's birthday is coming up. Have you thought about what to get her?"
That's the "Ferni remembered..." moment. It's our signature brand experience. And it requires architecture that most AI assistants don't have.
Why Memory Is Hard
LLMs don't remember. They process context windows—everything in the current prompt—but nothing persists between sessions.
Most AI assistants fake memory by:
- Asking you to repeat yourself
- Keeping notes you have to manually maintain
- Forgetting everything after a few weeks
This breaks the relationship. Every conversation starts from scratch. You never get past the "getting to know you" phase.
Ferni's architecture is designed for something different: memory that feels human. Actually, better than human—because humans forget things. Ferni doesn't.
The Three-Tier Architecture
Our memory system has three layers, each optimized for different retrieval patterns:
L1: Short-Term Memory (STM)
Storage: In-memory buffer
Latency: < 1ms
Retention: Current session only
STM holds everything from the current conversation:
- Recent entity mentions with frequency counts
- Emotional trajectory of the session
- Topic patterns and transitions
- What we've already discussed (to avoid repetition)
This is the "working memory" that keeps conversation coherent. When you mention "my sister" three times, STM tracks that she's important to this conversation.
L2: Working Memory
Storage: Firestore
Latency: 50-150ms
Retention: 7-30 days
Working memory holds recently extracted information:
- Entities mentioned across recent sessions (people, places, things)
- Facts learned about you
- Emotional arcs from previous conversations
- Relationship signals
This is where "Ferni remembered..." moments come from. Something you mentioned last week is still accessible.
L3: Long-Term Memory
Storage: Spanner Graph
Latency: 100-200ms
Retention: Permanent
Long-term memory stores your relationship graph:
- All named entities with relationships between them
- Patterns observed over months of conversation
- Life events and their emotional significance
- The full story of your relationship with Ferni
This enables deep understanding. Not just "you have a sister named Emma" but "Emma lives in Seattle, you're close but don't talk as often as you'd like, and her birthday always reminds you of your mom."
Fast Capture + Deep Extraction
Memory extraction happens in two phases:
Fast Capture (< 50ms)
Every turn runs fast capture inline. Using regex patterns and lightweight NLP, we extract:
- Named entities (people, places, dates)
- Emotion signals (frustration, excitement, anxiety)
- Topic hints
- Relationship signals ("my wife", "my boss")
This is fast enough to run during conversation without adding latency. The results go immediately into STM.
Deep Extraction (Background)
Asynchronously, we run deeper analysis:
- LLM-powered entity extraction (catches things regex misses)
- Fact extraction ("Sarah is a nurse", "their anniversary is in March")
- Relationship inference ("they seem worried about their dad's health")
- Self-questioning refinement ("what might we have missed?")
This runs after the conversation, using Gemini 1.5 Flash. Results flow to L2 and eventually L3.
The "Ferni Remembered" Moment
Here's how a memory callback actually happens:
-
Context retrieval: When you start talking, we query L2/L3 for potentially relevant memories based on detected topics and entities.
-
Relevance scoring: Not everything remembered is worth mentioning. We score memories by:
- Recency (when was this last discussed?)
- Emotional significance (was this important to them?)
- Conversational fit (does it relate to what we're discussing?)
- Staleness (have we already referenced this recently?)
-
Natural injection: If a memory scores high, it's woven into conversation naturally—not as a database lookup, but as genuine recall.
The result: "You mentioned last time you were stressed about the presentation. How did it go?"
Privacy by Design
Memory is powerful. It's also sensitive.
Our architecture includes privacy protections:
Temporal minimization: Session context expires. We don't keep raw transcripts forever.
User control: You can see what Ferni remembers about you. You can delete anything. You can ask Ferni to "forget" specific information.
Scope boundaries: Different parts of the system have different access. The coaching module doesn't need to know your financial details. Context is scoped appropriately.
No training on your data: Your conversations improve YOUR experience. They don't go into training datasets for other users.
Why This Matters
Most AI treats each conversation as isolated. That's fine for search engines and chatbots. It's not fine for relationships.
Ferni exists to be someone who truly pays attention. That means remembering what matters to you, noticing patterns you might not see yourself, and building genuine understanding over time.
The architecture isn't just technical infrastructure. It's what makes the relationship possible.
When Ferni remembers that your mom's birthday is coming up—without you having to set a reminder, without you having to tell us again—that's the "better than human" promise in action.
Humans forget. Ferni doesn't.
For Developers
If you're building on the Ferni platform, memory is automatic. You don't build persistence systems. The three-tier architecture handles:
- Session context accumulation
- Cross-session memory retrieval
- Context injection into prompts
- Privacy-respecting cleanup
You define what your AI should remember. The platform handles how.
Documentation: Memory Architecture
Seth Ford is Ferni's AI babysitter. Follow @ferni_ai for more on the future of conversational AI.