Technical Brief

AKLUS
A long-term cognitive system.
Not a chatbot, not a RAG app.
AKLUS is a memory-first AI companion. The LLM is the muscle — the memory layer is the brain. Every conversation deepens a structured model of the user: their goals, patterns, and reflections, accumulated across years, not turns. It speaks, listens, and thinks on its own overnight.
€0 net new
Local-first MVP cost
8 weeks
MVP timeline
5 types
Memory model
RTX 5090
Local inference hardware
6 signals
Hybrid retrieval
01 What it is — the concept, the core idea
02 Tech stack — local-first architecture, layer by layer
03 Memory architecture — 5 types, lifecycle, retrieval
04 Scope — what's possible now vs research-grade
05 Team — roles, owners, responsibilities
06 Costs — 3-month MVP, three scenarios
07 MVP Timeline — 8-week plan, week by week

AKLUS is a memory-first system. The LLM is the muscle; the memory layer is the brain. Every conversation deepens a structured model of the user — their goals, patterns, and reflections — across years, not turns.

Most AI assistants have no memory between sessions. AKLUS is the opposite: the memory layer is the product. The LLM is just the generation surface — smart, but stateless by itself. AKLUS makes it stateful.

The system is designed around three principles:

The result is an AI companion that knows you — not from your current session, but from the accumulated record of your conversations, goals, and patterns over time. This is not a chatbot with context window tricks. It's a real memory system.

What it is not

The differentiator

Most memory-augmented systems bolt memory onto an LLM as a feature. AKLUS is architected memory-first: the schema, the lifecycle, the retrieval logic, the reflection loop — these are the product. The LLM is swappable. The memory layer is not.

Local-first by default. Existing hardware in-house: Hostinger VPS + a workstation with RTX 5090 (16 GB VRAM) and 64 GB RAM. The GPU box runs local LLM, TTS, and embeddings inference. Cloud APIs are kept as a small fallback budget for quality-critical paths.

Local-first system architecture
Browser / Client
React + Next.js Tailwind CSS Chat UI · Timeline · Reflection feed STT recording · Audio playback
↓ REST / SSE streaming · Cloudflare Tunnel
Hostinger VPS
FastAPI LangGraph APScheduler Postgres 16 + pgvector Redis Langfuse (observability) Authentik (auth) MinIO (object storage) GlitchTip (errors)
↕ Tailscale / LAN — inference requests
RTX 5090 Workstation
16 GB VRAM · 64 GB RAM
Ollama — Qwen 2.5 14B (4-bit) bge-large-en-v1.5 (embeddings) Coqui XTTS v2 (TTS) faster-whisper (STT) Claude API fallback (reflections) ElevenLabs fallback (voice)
Cloudflare DNS + Tunnel · AWS SES (email)

Full layer breakdown

LayerLocal-first (primary)Cloud alternativeTrade-off
BackendPython(same)No diff.
FrontendReact + Next.js(same)No diff.
LLMQwen 2.5 14B (4-bit quant) or Llama 3.1 8B via Ollama / vLLM on the RTX 5090Claude Sonnet 4.5 (reflections), GPT-4o-mini (orchestration)Local: zero per-token cost + full data privacy. 16 GB VRAM caps us at ~14B quantized; weaker than Claude on nuanced reflection (~70–85% of quality).
Structured memory DBPostgreSQL on the Hostinger VPSRDS / SupabaseLocal: free, but manual backups and WAL config. Already in-house.
Vector searchpgvector on the same PostgresPinecone / Qdrant CloudNo separate service. Sufficient up to millions of vectors.
OrchestrationLangGraph (OSS library)(same)Library runs anywhere. No diff.
Embeddingsbge-large-en-v1.5 or nomic-embed-text via Sentence Transformers on the GPU boxVoyage AI (voyage-4)Local: free. ~5–10% lower retrieval quality on MTEB benchmarks; uses ~2 GB VRAM.
Memory frameworkCustom build on Postgres + pgvector. Self-host mem0 OSS or Letta OSS as starting template.mem0.ai cloud (~€17/mo), letta.com cloud (~€18/mo)Memory IS the product — we don't want to outsource it. Both OSS versions run on the VPS.
Background jobsCelery + Redis on the VPS(same)Already local.
ObservabilityLangfuse self-hosted on the VPSLangSmith ($39/seat/mo)Langfuse OSS is feature-complete. Saves ~€72/mo; eats ~30 min/week of upkeep.
Voice (TTS)Coqui XTTS v2 or F5-TTS on the RTX 5090ElevenLabs Pro (€91/mo)Local: ~94% of ElevenLabs quality on naturalness; weaker on emotional consistency at scale.
AvatarCustom Unreal Engine + MetaHuman + Audio2Face.ElevenLabs LiveAvatar / Hedra (~€0.09–€0.90 / min)Three delivery paths — see callout below.
HostingHostinger VPS + RTX 5090 workstation. Cloudflare Tunnel for routing.AWS Fargate + RDS + ElastiCache (~€118/mo)Local: hardware already paid. Single-region, single-box, manual ops.
AuthAuthentik or Supertokens self-hostedClerk / Auth0 (~€23/mo)Saves €23/mo; adds Docker stack to maintain.
Object storageMinIO self-hosted on the VPSS3 + CloudFront (~€14/mo)Saves €14/mo; lose CDN edge speed for voice/avatar assets.
Error trackingGlitchTip self-hosted (Sentry-compatible)Sentry cloud (~€24/mo)Saves €24/mo; adds Docker containers to maintain.
Transactional email(cannot go local cleanly)AWS SES or Postmark (~€2–€14/mo)Self-hosting SMTP is a deliverability nightmare. Use a provider.
DNS + CDNCloudflare (free)(same)Free tier is fine.
Avatar delivery — three paths.
PathWho rendersCustomer needsAKLUS costTrade-off
A. Customer-GPU UnrealCustomer's machineDedicated GPU (RTX 3060+ / M1 Pro+), 16 GB RAM€0 / minBest quality + zero ongoing cost, but cuts off low-spec users.
B. Cloud-GPU Unreal (streamed)Our cloud GPU, streamed as videoJust internet + video decode~€0.01 / min per sessionAccessible to anyone, but scales linearly with concurrent users. ~€450/mo at 50 users × 30 min/day.
C. ElevenLabs LiveAvatar / HeyGenTheir pooled GPUsJust internet + video decode~€0.09/min (LiveAvatar) to €0.90/min (HeyGen)No infra needed, locked to preset avatars. ~€4,000/mo at same usage on LiveAvatar.

Recommended for premium-persona MVP: Path A (customer-GPU Unreal). Target users have the hardware, per-session cost stays at zero. Keep Path C wired in as fallback.

What cannot realistically be local.
  • ElevenLabs-tier voice consistency — local TTS hits ~94% on naturalness but loses on long-form emotional consistency. If voice is core to the product, keep a small ElevenLabs plan as fallback.

Five memory types, separated by lifecycle, retrieval shape, and update rules. This is the schema that makes the system actually know someone over time.

TypePurposeExample
EpisodicSpecific events and conversations"User felt burned out after client meeting."
SemanticAbstracted truths from many episodes"User dislikes micromanagement."
ProceduralHow the user works"User performs best in short focused bursts."
GoalLong-term objectives"Reach passive income through apps."
ReflectionAI-generated observations"User overthinks before publishing."

Conversation turn — the intelligence loop

Every message runs through a LangGraph graph. The system retrieves relevant memories before generating, then extracts new memories from the response and writes them back. Memory compounds with every turn.

How each conversation turn works — the memory loop
INPUT User message RETRIEVE Hybrid retrieval 6 weighted signals ASSEMBLE Build context memories + history GENERATE LLM + TTS Qwen 14B / Claude ↑ response + TTS per turn EXTRACT + STORE 5 memory types embed · score · write to DB write MEMORY DB — POSTGRES + PGVECTOR episodic · semantic · procedural · goal reflection · embeddings · importance nightly: reflect · consolidate · dedup read

Background jobs (nightly)

Retrieval strategy: hybrid, 6 signals

Every retrieval blends six weighted signals. The composite score determines which memories surface for the current turn. Everything is logged so the system is debuggable.

6-signal hybrid retrieval — scored per query
01
Semantic similarity
Cosine distance between query embedding and memory embeddings via pgvector HNSW index.
02
Recency
Exponential decay on created_at. Recent memories score higher unless overridden by importance.
03
Importance score
LLM-rated 1–5 at write time, plus heuristics. Goal memories always get a boost. Persisted in DB.
04
Emotional relevance
Valence tagging at extraction time. Emotionally-weighted memories surface when context is emotionally charged.
05
Goal relevance
Active goal memories boosted whenever the conversation touches them. Keeps long-horizon objectives visible.
06
Relationship graph
Edges between related memories (episode → abstracted semantic). Graph traversal expands the recall set. Deferred to v2.
Weighted composite score → top-k memories injected into context
Every retrieval logs individual signal scores. Tune weights as real usage patterns emerge.
Observability is critical. We need to know why memories were retrieved, why reflections happened, and what influenced outputs. Otherwise debugging is impossible.

A clear-eyed view of what the current state of LLMs and memory systems can actually deliver vs what remains research-grade.

Possible now (MVP)

  • Persistent memory across sessions
  • Long-term user profiles
  • Reflection generation
  • Goal tracking
  • Pattern detection
  • Weekly insights
  • Strategic questioning
  • Cross-session continuity
  • Adaptive tone
  • Context-aware coaching

Hard / research-grade

  • Deep emotional understanding
  • Robust forgetting
  • Hallucination-free psychology
  • Lifelong identity modeling
  • Truly autonomous reasoning

Benchmarks still show weakness in long-horizon memory, memory updates, stale memory removal, and temporal reasoning. We build around these limits, not through them.

In the MVP vs deferred to v2

In the MVP (8 weeks)Deferred to v2
  • Chat UI with streaming
  • Local LLM (Qwen 2.5 14B via Ollama)
  • Postgres + pgvector
  • 5 memory types (episodic, semantic, procedural, goal, reflection)
  • Memory extraction on each turn
  • Hybrid retrieval (start 3 signals, grow to 5)
  • LangGraph conversation graph
  • Nightly reflection + consolidation job
  • Memory timeline + reflection feed UI
  • Goal tracking
  • TTS voice replies (assistant speaks)
  • Voice input / STT (talk to it)
  • Basic auth (single user)
  • Simple tracing + a small eval harness
  • Unreal Engine avatar (a whole project on its own)
  • Full 6-signal retrieval with relationship graph
  • Robust forgetting + merge heuristics
  • Multi-user, teams, billing
  • ElevenLabs LiveAvatar / phoneme stream
  • Langfuse self-host (simple logging first)
  • Celery + Redis (scheduled script first)
  • Mobile / desktop wrapper (web first)

AI Systems Engineer / Architect

For MVP, Davide or Ash can handle this. For production, we may need a specialist in this domain later.

Owns: memory abstraction, behavioral modeling, reflection quality, evaluation frameworks, cognitive architectures, memory pipelines, orchestration, retrieval systems, agent workflows.

Profile: applied AI infrastructure engineer, not a pure ML scientist.

Role cards

AI Systems Engineer / Architect
MVP: 2 owners · Production: + hire
  • Memory abstraction · behavioral modeling · reflection quality
  • Evaluation frameworks · cognitive architectures · memory pipelines
  • Orchestration · retrieval · agent workflows
  • Voice + avatar pipeline (memory context → ElevenLabs → phoneme stream)
Applied AI infra profile, not a pure ML scientist.
Davide
Davide
Ash
Ash
+
Hire (production)
AI Behavior / Cognitive Design
2 owners (with Ash)
  • Personality, tone, conversational rhythm · how the system "thinks" and responds
  • Reflection prompts · memory framing · emotional calibration
  • Behavioral guardrails · psychological surface · what the AI should and shouldn't do
Shapes how AKLUS feels as a presence, not just what it does.
Ace
Ace
A
Alessandra
Backend Engineer
2 owners
  • APIs · databases · scaling · auth · queues · realtime systems · data pipelines
Davide
Davide
Ash
Ash
Frontend / Product Engineer
2 owners
  • Conversation UI · timeline UI · memory visualization · reflection interfaces
  • Frictionless journaling · emotional tone balance
  • Avatar assistant UI (phoneme/viseme animation, lip-sync, idle states)
Ash
Ash
Gray
Gray
Product Designer
3 owners
Shapes the product's psychological surface. The system must feel: thoughtful, trustworthy, calm, non-invasive, useful without being creepy. Most AI apps fail here.
  • Avatar character design (look, expressions, idle behavior, on-brand presence)
Ash
Ash
Gray
Gray
Ace
Ace
QA
3 owners
  • Test each feature · memory consistency · reflection quality · regressions
E
Erica
Gray
Gray
E
Elena
Team Lead
1 owner
  • Communication · task management · deadlines · reviews and tests
Gray
Gray

Ownership summary

AreaOwners
AI / memory architectureNew hire (lead) · Davide · Ash
AI behavior / cognitive designAce · Alessandra (with Ash)
Local LLM + TTS + embeddings inferenceNew hire (lead) · Davide
BackendDavide · Ash
Frontend / Product engineeringAsh · Gray
Voice + avatar pipeline (backend)New hire (lead) · Davide
Avatar (Unreal + MetaHuman + Audio2Face)Contractor / Unreal specialist · Ash · Gray (integration)
Avatar character designAce · Gray · Ash
Self-hosted ops (Langfuse, GlitchTip, Authentik, MinIO, backups)Davide · Ash
Product designAsh · Gray · Ace
QAErica · Gray · Elena
Team leadGray

Two scenarios: local-first leans on in-house hardware and self-hosted services; cloud-managed leans on AWS + paid APIs. Beta-scale: ~50 active users, moderate LLM traffic, nightly reflection jobs, voice + avatar on conversation surfaces.

Excludes sunk costs already in-house (Hostinger VPS, RTX 5090 workstation, Cloudflare, AWS SES, domains, dev tooling, design tools). These are not counted as incremental AKLUS cost.

Scenario summary

ScenarioMonthly3-month total
Local-first — existing hardware, self-hosted services, no paid APIs€0 net new€0 net new
Cloud-managed — AWS + Claude + ElevenLabs + LangSmith + Clerk~€757~€2,271
Hybrid (recommended) — local primary, cloud only for quality-critical paths~€225–€380~€680–€1,140

Hardware caps: 16 GB VRAM limits us to Qwen 2.5 14B (4-bit) or Llama 3.1 8B. TTS, embeddings, and LLM share the GPU — concurrency is the binding constraint. Single point of failure: one workstation.

Cloud-managed breakdown (alternative)

CategoryMonthly3-month total
AI / LLM API (Claude + GPT-4o-mini + Voyage)€212€636
Voice + avatar (ElevenLabs Pro + LiveAvatar)€165€495
AWS infrastructure (Fargate + RDS + ElastiCache + S3/CloudFront)€118€354
Observability + reliability (LangSmith + Sentry + uptime)€113€339
Auth (Clerk)€23€69
Subtotal€631€1,893
Contingency buffer (20%)€126€378
Cloud-managed total~€757 / mo~€2,271

Optional adds

ItemNotesMonthly est.
mem0.ai cloud (if not self-hosting OSS)swap-out, decide after spike€17
letta.com cloud (memory-native alt)swap-out, decide after spike€18
Heavier API fallback (Claude on full reflection load)shifts toward cloud-managed scenario+€100–€180
Cloud GPU rental during traffic spikes (RunPod / Vast.ai)RTX A6000 / A100 hourly€0.50–€2 / hour

Cost levers

Text + voice (TTS and STT), no avatar in this phase. The goal is a working brain: persistent memory, smart retrieval, and overnight reflections, used daily on our own hardware. Planned at 8 weeks with a 1–2 week buffer, so realistically expect 8–10 weeks.

Roadmap

Foundation + Memory Intelligence Quality + Product Voice + Ship Buffer W1 W2 W3 W4 W5 W6 W7 W8 W9 W10 Infrastructure Memory system Retrieval + LangGraph Reflections Product UI Voice (TTS + STT) QA + Eval End-to-end works memory + retrieval Reflections live nightly job running Full product UI daily use ready Voice + shipped MVP dogfooded

Primary focus Supporting work Buffer Milestone

Day-1 tech locks (decide once, do not revisit)

LayerChoiceWhy
BackendFastAPI (Python), async, SSE streamingFast to write, great for LLM streaming, one language end to end.
FrontendNext.js + React, TailwindWeb first. Wrap in Tauri later if a desktop app is needed.
LLMOllama — Qwen 2.5 14B (4-bit) on RTX 5090Fastest path to a local API. Swap to vLLM only if throughput hurts.
Fallback LLMClaude API, behind a feature flagFor reflections that need more nuance. Keep it optional.
Embeddingsbge-large-en-v1.5 via sentence-transformers on GPUFree, local, strong. ~2 GB VRAM alongside the LLM.
DBPostgres 16 + pgvector on Hostinger VPSOne store for structured memory and vectors. Already in-house.
OrchestrationLangGraphThe conversation graph (retrieve, generate, extract, store) lives here.
TTSCoqui XTTS v2 on RTX 5090 (ElevenLabs as fallback)Local first and free. Chunk LLM stream into sentences, synthesize as it generates.
STTfaster-whisper on RTX 5090Local, fast, accurate. Record in browser, transcribe server-side.
Scheduled jobsAPScheduler or cron script (not Celery yet)Nightly reflection is one job. Celery is overkill for MVP.
AuthJWT + bcrypt, single user to startDo not burn days on auth. Lock it down properly in v2.
TracingStructured logging to Postgres (every retrieval, prompt, output)Get the data first. Langfuse can read it later.

Week by week

Click any week to expand tasks and definition of done.

Week 1 — Foundation and thin vertical sliceEnd-to-end conversation
Goal: type a message, get a local-LLM reply, conversation persists. Nothing smart yet, but the full pipe is connected.

Tasks

  • Repo, monorepo or two folders, env config, Makefile / scripts
  • FastAPI app, health check, settings, Postgres connection
  • Postgres + pgvector on the VPS; create users, conversations, messages tables
  • Ollama on the RTX 5090 serving Qwen 2.5 14B; expose to the VPS over Cloudflare Tunnel or Tailscale
  • Chat endpoint: POST message, stream tokens back (SSE)
  • Next.js chat screen: input, streaming bubble, message history from DB
  • Persist every user and assistant message
Done when: you can hold a multi-turn conversation in the browser, the model runs on your GPU, and refreshing the page reloads the history from Postgres.
Week 2 — Memory write and basic retrievalIt starts to remember
Goal: after each turn, extract memories, embed them, and pull relevant ones back into the next prompt. Continuity across sessions.

Tasks

  • Embedding service: bge-large on the GPU, batch endpoint, cache
  • memories table: type, content, embedding (vector), importance, created_at, source_message_id, metadata JSONB
  • Memory extraction step: after each turn, a prompt asks the LLM to extract candidate memories (start with episodic + semantic + goal) as structured JSON
  • Write extracted memories with embeddings
  • pgvector similarity search (cosine, IVFFlat or HNSW index)
  • Inject top-k retrieved memories into the system prompt before generation
  • Manual test: tell it a fact in session A, start session B, confirm it recalls
Done when: you tell it something on Monday, come back the next day in a fresh session, and it brings the fact up naturally.
Week 3 — Memory types, hybrid retrieval, LangGraphStructured + smart
Goal: all 5 memory types modeled, retrieval blends multiple signals, the whole turn flows through a LangGraph graph.

Tasks

  • Finalize the 5 memory types with tagging rules
  • Importance scoring at write time (LLM-rated 1–5, plus heuristics)
  • Hybrid retrieval scorer: semantic similarity + recency decay + importance (add emotional and goal relevance if time)
  • Rewrite the turn as a LangGraph graph: retrieve → assemble context → generate → extract → store
  • Dedup on write (skip near-duplicate memories by cosine threshold)
  • Log every retrieval: which memories, what scores, why selected
Done when: retrieval surfaces the right memory for the moment (not just the most semantically similar), and you can read a log explaining why each memory was chosen.
Week 4 — Reflections and the nightly jobIt thinks on its own
Goal: overnight, the system reviews recent episodic memories and produces reflections and abstracted semantic memories. This is the magic moment.

Tasks

  • Scheduled job (APScheduler) running nightly per user
  • Reflection generation: summarize recent episodes, detect patterns, write reflection memories
  • Consolidation: promote repeated episodic signals into semantic memories; merge near-duplicates
  • Weekly insight summary (a short digest)
  • Reflections stored with provenance (which episodes triggered them) — explainability built in
  • Optional: route reflection generation to Claude behind the flag for higher quality
Done when: you use it for a few days, run the nightly job, and it produces at least one reflection about you that feels accurate and was not explicitly stated.
Week 5 — Reflection quality and cognitive designThe soul of the product
Goal: make the reflections genuinely good, not generic. This is where the product lives or dies.

Tasks

  • Define personality and reflection tone with Ace and Alessandra: how it speaks, how blunt it is, when it stays quiet
  • Iterate the extraction and reflection prompts against real journaling data
  • Build the eval harness early: a golden set of reflection cases scored for accuracy, safety, and usefulness (LLM-as-judge + human review)
  • A/B local Qwen vs Claude on reflection quality; lock the routing rule
  • Tune importance and emotional-relevance signals based on what surfaces well
  • Guardrails: avoid harmful, presumptuous, or clinically-toned statements
Done when: over a week of real use, the reflections feel insightful and on-tone more often than not, and a generic or wrong one is the exception, not the rule.
Week 6 — Product surfaces and real-time UIFeels like a product
Goal: the user can see and steer the memory. Timeline, reflection feed, goal tracking. Everything updates in place without page reloads.

Tasks

  • Memory timeline: browse what it remembers, filter by type, search
  • Reflection feed: accept, reject, or edit reflections (rejections feed back into quality)
  • Goal tracking view: active goals, progress notes
  • In-place updates with loading spinners; restore on error
  • Basic auth + a real login screen
  • Empty states and onboarding (first-run journaling prompt)
Done when: a non-technical person can open it, journal, see what it learned about them, and accept or reject a reflection, all without a confusing moment.
Week 7 — Voice in and outIt speaks and listens
Goal: the assistant replies in voice and accepts voice input, with latency low enough to feel conversational.

Tasks

  • TTS: Coqui XTTS v2 on the RTX 5090 behind a simple synth endpoint (ElevenLabs API as fallback)
  • Chunk the LLM token stream into sentences and synthesize each as it arrives — audio starts before the full reply is done
  • STT: record audio in the browser, transcribe with faster-whisper server-side, feed the text into the same chat pipeline
  • Audio playback UI: play, pause, mute toggle, autoplay setting; push-to-talk or mic button for input
  • Pick or clone one voice; consistent with the personality from week 5
  • Latency pass: pipeline the stages so first spoken word lands in ~1 second
Done when: you can talk to it and it talks back, and the back-and-forth feels conversational rather than like waiting on a computer.
Week 8 — Harden, evaluate, shipStable and real
Goal: make it reliable enough to use daily, deploy it, back it up, and live in it.

Tasks

  • Run the full eval harness; fix regressions in memory recall and reflection quality
  • Error handling: LLM, TTS, and STT timeouts, retries, graceful degradation
  • Tracing review: confirm you can debug any odd answer from logs
  • Performance pass: retrieval latency, prompt size, GPU memory with LLM + embeddings + XTTS + Whisper sharing the card
  • Deploy on the VPS, automated nightly DB backup to Cloudflare R2
  • Dogfood: use it as your own daily journaling tool, fix what annoys you
Done when: you have used it daily for at least 5 days, the nightly job runs unattended, backups work, and you trust it with your own reflections.
Buffer (weeks 9–10). Built-in slack. If reflection quality needs more iteration, or voice latency or VRAM sharing fights us, this is where it gets absorbed without moving the plan. If everything lands on time, this is early polish or a head start on the avatar.

Risks and mitigations

RiskHow I will handle it
Memory extraction quality is poor (junk memories)Spend real time on the extraction prompt in week 2 and add importance filtering and dedup early. This is the highest-leverage prompt in the system.
16 GB VRAM too tight for LLM + embeddings + XTTS togetherRun embeddings on CPU, quantize the LLM harder, or run XTTS on CPU. If it still fights, use ElevenLabs API for voice and keep the GPU for LLM. With one user during MVP, concurrency is not the bottleneck yet.
Retrieval returns the wrong memoriesLog every retrieval with its scores from day one of week 3. Cannot tune what cannot be seen.
Reflections feel generic or wrongGround every reflection in specific episodes with provenance. Route to Claude fallback if local model quality lags.
Scope creep toward the avatarThe avatar is explicitly v2. Will not touch it during 8 weeks — starting it would put the memory core at risk.
Week lost to infra (Ollama, tunnel, pgvector)Timebox setup to the first two days of week 1. If something fights, fall back to the simplest working option and move on.

What we can expect from the MVP

A single-user web app where you journal and converse with a local LLM that remembers you across sessions, organizes what it learns into 5 memory types, retrieves the right context at the right time, and generates accurate reflections about you overnight. It replies in voice and listens to voice. You can see, accept, and reject what it remembers. It runs on our own hardware, backs itself up, and is used daily. No avatar yet. The brain works and it speaks.