Security & data handling — Cerebro

Last updated: 2026-05-20. This is the honest, current-state document — not a marketing pitch. If something here doesn't match what the code does, treat the code as the source of truth and open an issue.

If you're a customer evaluating Cerebro and have a question this doesn't cover, email ashish.dhiman@frozo.ai and we'll answer in plain English, not legalese.


What Cerebro does with your data

Cerebro ingests text from your connected systems (Slack, GitHub, Linear, Notion, meetings, webhooks), runs each item through an LLM-based classifier to decide if it's a worth-keeping memory, then stores the keepers in your org's Postgres database for your team and your AI agents to query.

Three places your data flows:

┌─────────────────┐     ┌────────────────┐     ┌──────────────────┐
│ Your upstream   │ ──> │ Cerebro    │ ──> │ Your Postgres    │
│ (Slack/GH/etc.) │     │ classifier     │     │ (multi-tenant,   │
│                 │     │ (Anthropic via │     │  RLS-scoped to   │
│                 │     │  OpenRouter)   │     │  your org)       │
└─────────────────┘     └────────────────┘     └──────────────────┘
                              │
                              └─> ✗ Anthropic doesn't train on it
                                  ✗ We don't train on it
                                  ⚠ OpenRouter logs may persist
                                    (see "the weakest link" below)

What we send to the LLM (the honest version)

When a connector ingests an item, we send to Claude Haiku (via OpenRouter by default):

  • The title of the item (e.g. PR title, message first line, Linear issue summary)
  • The body (full message, PR description, page content, transcript chunk)
  • Minimal metadata (source kind, author handle, timestamp)

We use the response to classify the item into one of seven memory buckets (decision, observation, todo, learning, summary, entity, question) or mark it as noise. The classification result is kept in our audit log; the original text only persists in your vault if you approve it from /pending.

We do not send:

  • Your Supabase service-role key
  • Your other connectors' tokens
  • Anything from another customer's data (RLS-enforced)
  • The contents of memories you've already approved (those don't re-flow to the LLM unless you explicitly trigger contradiction detection or summarization)

We do send:

  • Content from any system whose connector is enabled. If a Slack channel is connected, every message classified is sent. If a GitHub repo is connected, every PR + issue body is sent.

If your team posts secrets in Slack (API keys, customer PII, credentials) and that channel is connected, those secrets will pass through the classifier. We don't redact today (see roadmap).

Third-party commitments

Anthropic (Claude API)

  • Training: Anthropic's commercial API terms commit to not training models on customer API traffic. This applies to direct API use and to OpenRouter pass-through.
  • Retention: API logs retained ~30 days for abuse review, then deleted (unless flagged for trust & safety review).
  • Region: US by default. EU + India regions available via Bedrock (see roadmap).
  • Compliance: SOC 2 Type II, HIPAA BAA available on enterprise plans.

OpenRouter (gateway, current default)

  • Training: pass-through to Anthropic, so Anthropic's no-train applies. OpenRouter themselves don't train.
  • Retention: this is the weak link. OpenRouter's default logging policy is documented at openrouter.ai/docs#privacy — they may log prompts unless you set headers: { "X-OpenRouter-No-Log": "true" } per request, or globally disable logging in your OpenRouter account settings.
  • Why we use it: unified billing across models, soft failover, per-org model overrides. For privacy-conscious deployments, we recommend BYOK with Anthropic-direct mode (skip OpenRouter entirely). Toggle is on the roadmap.

Supabase (your vault's storage)

  • Hosting: Supabase Cloud on AWS. Our current deployment uses aws-us-east-1 for the shared dev project. Production tenants will get region choice.
  • Encryption: at-rest (AES-256) and in-transit (TLS 1.3).
  • RLS: every table that holds tenant data has row-level security policies enforcing org membership. Service-role bypass is restricted to the connector classifier path (server-side only, never exposed to the browser bundle).
  • Compliance: SOC 2 Type II, HIPAA available on team plan.

Railway (web app hosting)

  • What lives here: the Next.js server. No customer memory data is stored on Railway — everything reads from Supabase at request time.
  • What flows through: encrypted HTTPS requests, environment variables (including the OpenRouter / Supabase service-role keys we need at runtime).

What's enforced by the database

Every Cloud table that holds tenant data — memories, projects, workspaces, connectors, connector sync state, audit entries — is scoped by Postgres Row-Level Security keyed off your auth.uid() and memberships table. Even if our code had a query bug, the DB would refuse to return another tenant's row. The RLS policies are in supabase/migrations/ and reviewed on every PR touching them.

The DPDP / GDPR per-subject erasure cascade (commit 0ed2371, design doc at docs/superpowers/specs/2026-05-19-dpdp-erasure-cascade-design.md in the OSS repo) handles right-to-erasure requests in a single cascade through markdown files, embeddings, full-text index, and audit log. It does not cascade into the LLM provider's logs — that's a hard constraint of using a third-party API, addressed only by BYOK + your own provider relationship.

Mitigations available today

If you're worried about LLM data exposure, three options exist now:

  1. Don't connect the sensitive channels. Connectors are opt-in per source. If your #exec-strategy Slack channel shouldn't flow through, just don't add it to the connector config.

  2. Use the OSS self-host route. github.com/frozo-ai/frozo-vault-mem runs locally, no connectors, no LLM calls unless you wire them up. Brings everything inside your network. You lose the company-wide ingest, but the brain stays on your machine.

  3. Disable a connector after evaluating. You can ingest a sample, see what gets classified, decide if you trust the flow, then turn the connector off if not. No retroactive data sharing beyond what was already classified during the eval.

Mitigations already shipped

  1. BYOK (bring your own key) — per-org Anthropic or OpenRouter key, stored encrypted in Supabase Vault, decrypted only by the service-role connector classifier path. Configure under /admin/llm. Your billing, your logs.

  2. Anthropic-direct mode — pick "Anthropic (direct)" in the BYOK provider radio to bypass OpenRouter entirely and call Anthropic API straight. Removes the intermediary logging risk.

  3. PII redaction (always on) — every text payload sent to the classifier passes through a regex sweep that masks emails, phones, SSN, India PAN/Aadhaar, credit cards, IPv4, and known API-key prefixes. Defensive depth — does not catch trade secrets, but neutralises the obvious leakage path. Source: packages/connectors-core/src/pii.ts. Tested.

Mitigations on the roadmap

  1. AWS Bedrock provider — call Claude via the customer's own AWS account / region / CloudTrail. The classifier call lives entirely inside their VPC. Effort: ~2 days.

  2. Per-connector privacy mode — let admins mark a connector as "metadata-only" (ingest titles + author + timestamp, never the full body). Useful for high-sensitivity channels that still need basic tracking. Effort: ~1 day.

  3. Self-host local LLM endpoint UI — point the classifier at a local Ollama / vLLM endpoint via the BYOK form. Works today through OpenRouter-compat URLs but no dedicated UI affordance yet. Effort: ~half-day.

Reporting a security issue

Email ashish.dhiman@frozo.ai with subject [SECURITY] Cerebro — <short description>.

We respond within 48 hours. Severe issues (data leakage, RLS bypass, auth bypass) get a same-day acknowledgement and a 7-day target fix window. Lower-severity issues get a tracked issue + a fix in the next minor release.

No bug bounty program yet — we're a pre-revenue project. We will absolutely credit you in release notes and patch the issue.

Compliance footprint

StandardStatusNotes
GDPR✅ Erasure cascade live since 0ed2371. EU residency pending Bedrock provider.
DPDP (India)✅ Same erasure cascade. India residency pending Bedrock provider.
SOC 2⏳ Inheriting from Supabase + Railway + Anthropic. No top-of-stack audit yet — pre-revenue.
HIPAA⏳ Underlying providers offer BAAs. We don't market for HIPAA workloads until BYOK ships.
ISO 27001⏳ Inherit from providers. No top-of-stack cert yet.

We will not pretend to have certifications we don't have. If your buying process needs an auditor's report on top of ours, that's a conversation we should have before you sign.


This document is part of the OSS vault-mem and proprietary vault-cloud repos. The OSS repo's SECURITY.md covers the self-host posture; this file covers the hosted Cloud product.