Skip to content

A typed, governed, replayable memory layer.

Every agent request flows through OctaMem before it reaches a model. Retrieval is structured, capture is automatic, and every read is traceable. This is what we mean when we say memory as infrastructure.

Request ledger

Watch memory become infrastructure.

Each request moves through a controlled path before it becomes model context: identity, retrieval, policy, and render.

request.opened

The request arrives with previous_context, actor, and budget.

Nothing reaches the model until the memory layer knows who is asking, what they can see, and how much context it can spend.

/actor: agent:sales-copilot

/previous_context: sales-q1

/budget: 3,200 tokens

Records selected

  • fact

    HIPAA-safe defaults required for this account.

  • decision

    V2 proposal rejected because reporting exposed PHI.

  • rule

    Route contract deltas through legal above $10k.

Context packet

Preserve approved scope. Redact PHI. Cite rejected V2 reporting. Capture final launch decision after response.

OctaMem sits between agents and the models they call. Every request flows through OctaMem to be enriched with relevant memory before it reaches the LLM. Every response is captured back as typed memory, audited, and made available to every agent on the account.

The memory layer holds three independent stores. Semantic stores facts. Episodic stores events. Procedural stores behaviour. Each is optimized for the shape of knowledge it preserves, and together they replace the workarounds nobody trusts in production: prompt stuffing, Redis caches, transcript replay, vector recall over raw conversations.

  • i.

    Recall.

    Typed retrieval with provenance. Every record carries a source, a timestamp, and a scope. Not a vector blob.

  • ii.

    Inject.

    Memory is rendered as structured context, not stuffed into the prompt. The model sees what matters in the right shape.

  • iii.

    Capture.

    Decisions become memory, with full lineage from the documents and conversations that produced them.

  • iv.

    Govern.

    Policy, audit, and access are enforced at the memory layer. Not the prompt, not the application code.

§ 03Retrieval

Scoped, typed, replayable.

Recall is a typed query, not a similarity search. You ask for what you need by scope, type, and time, and you get records with sources you can trace back to the documents or conversations that produced them.

  • query: natural-language question or keywords
  • previous_context: scopes results to a thread, topic, or tenant
  • api_key: per-environment, per-team isolation
  • details(): live storage, plan limits, balance
agents/recall_q1.pypython
# Retrieve memory for a specific conversation or topic.
results = client.get(
    query="What did the customer reject in v2?",
    previous_context="Q1 product planning",
)

# Each item carries content, metadata, and provenance.
for item in results:
    print(item.id, item.created_at, item.content)
Each item carries id, content, created_at, and provenance.
§ 04Policy & audit

Enforced at the storage layer. Not the prompt.

The model can’t leak what it never saw. Read scopes, write scopes, and redactions are applied inside OctaMem before a single token is rendered into context. The audit log is append-only and hash-chained, every entry references the previous, so tampering is detectable.

  • Role-based scopes (department, project, agent)
  • Field-level redaction (PII, PHI, pricing, IP)
  • Immutable audit log with hash-chained entries
  • Retention policies with automatic expiry
policy/legal_scope.pypython
# Policy is enforced at the storage layer, not the prompt.
client.policy.set(
    previous_context="legal-review",
    rules=[
        # Only legal team agents can write under this label.
        Allow(role="legal-agent", action="write"),
        # Customer-services agents can read but not see PII.
        Allow(role="cs-agent", action="read", redact=["pii"]),
        # Everything else is denied by default.
        DenyAll(),
    ],
)
Same policy DSL applies to REST, MCP, Python, and JS clients.
§ 06How it compares

Memory layer. Not a vector wrapper.

The shortest way to explain OctaMem is the things it’s deliberately not. Competitors solve retrieval accuracy. OctaMem solves retrieval accountability.

OCTAMEM
VECTOR DB
CHAT MEMORY LIBS
PROMPT STUFFING
Typed records
Semantic / episodic / procedural
Untyped vectors
Conversation buffers
Inline string
Source provenance
Every record source-linked
Optional metadata
Session-bound
None
Audit log
Append-only, hash-chained
Custom build
Usually missing
None
Policy enforcement
Storage layer · redaction
Application layer
Application layer
Prompt-engineered
Model-agnostic
Any LLM, any SDK
Yes
Often model-coupled
Per-model
Cost shape
Linear in storage
Linear + embed cost
Token-based
Compounds per call

fig. 4 · positioning matrix. Six dimensions that matter when an audit team asks.

§ 07Deployment

Cloud

Cloud.

Multi-tenant cloud with regional residency. Default for Team and Business plans.

us-east · eu-west · ap-southeast

VPC

VPC.

Single-tenant cloud in your own VPC. Customer-managed network and firewall.

AWS · GCP · Azure

On-prem

On-prem.

Self-contained deployment for air-gapped or regulated environments.

Kubernetes · OpenShift · bare metal