Skip to main content
Beekeeper Changelog
v1.0.0

v1.0.0: First Stable Release

The first public release of Beekeeper: threat intelligence for autonomous coding agents. It intercepts every tool call, package install, file access, and network egress before it executes, evaluates it against corroboration-based threat intelligence, and blocks or quarantines threats across 17 agent harnesses.

Overview

Beekeeper v1.0.0 is the first public release. It is a single static Go binary (beekeeper) that mediates autonomous-agent tool calls before they execute and evaluates them against unified, corroboration-based threat intelligence, fail-closed by default. A hijacked or off-task agent cannot successfully act on the developer's machine without Beekeeper deciding to permit it.

This release consolidates the project's full internal development history into one stable surface: the corroboration engine, sensitive-path enforcement, the package-manager nudge, editor-extension defense, cross-platform behavioral monitoring, an opt-in prompt-injection sidecar, policy-as-code, the TUI dashboard, config self-protection, and a validated release gate. There are no earlier public versions to migrate from.

Provenance. Beekeeper ships on GitHub Releases with the canonical go install path and the signed release artifacts described below. The verification commands let you audit the provenance story yourself before you trust the binary.

Highlights

1. Fail-closed hook handler + corroboration policy engine

beekeeper check evaluates tool calls against an mmap catalog index under hard caps (1 MB stdin / 5 s / 256 MB). The pure internal/policy corroboration engine applies a three-tier decision: one trusted source warns, two sources block, three sources block and recommend quarantine. Sources are Bumblebee, OSV, and Socket. Per-severity thresholds let a critical match escalate at a lower count, guarded by an all-versions-wildcard guard and degraded-source suppression so a single poisoned source cannot force a block. The same engine is called identically from the hook handler, the MCP gateway, and the Sentry correlation layer.

On a block the hook emits the per-harness deny contract that the agent actually honors (exit 2 plus hookSpecificOutput for Claude Code, {"action":"block"} for Hermes, and so on). Unknown or unconfigured harness IDs fail closed.

2. Multi-harness enforcement across 17 agents

Hook installers cover 17 agent harnesses across three honesty tiers: 10 Tier-1 agents get full exit-2 deny enforcement (Claude Code, Codex, Cursor, Augment, CodeBuddy, Qwen Code, Gemini CLI, Copilot, Antigravity, Windsurf); 3 Tier-2 agents work with documented caveats (Hermes is structurally fail-open, Cline is macOS/Linux only, OpenCode misses subagent task calls); and 4 Tier-3 agents (Kilo, Trae, Continue, OpenClaw) are covered only through the MCP gateway, with their native tools left explicitly unguarded. Only Claude Code is live-verified end to end; the other 16 are contract-shape tested and listed in a signed manual validation register (see Highlight 10).

3. Sensitive-path enforcement

policy.EvaluatePath blocks agent reads, and shell-redirect writes, of credential paths outside the working directory: ~/.ssh, ~/.aws, ~/.cargo/credentials, .env globs, and editor MCP config directories. Canonicalization closes evasion gaps (tilde and $VAR / ${VAR} / %VAR% expansion, symlink and ancestor-symlink resolution, Windows alternate-data-stream and trailing-dot variants). The block is merged most-restrictive-wins, so an allowlist can never downgrade a credential-read block.

4. Package-manager nudge and supply-chain matching

A single pure internal/pkgparse package catalog-matches npm, pnpm, bun, and yarn installs alike, including chained and env-prefixed commands. The internal/nudge subsystem advises (soft) or rewrites and blocks (hard) installs toward hardened package managers, with a detection-independent block mode that does not fail open. First hook install enables supply-chain enforcement by default.

5. Editor-extension defense

Agent --install-extension calls are intercepted before the extension lands. An fsnotify watcher monitors the extension directories, and the watch, scan, and quarantine workflow closes the Nx Console-class attack surface where a compromised agent silently installs a trojanized extension.

6. Cross-platform Sentry (opt-in, detection-only)

A privileged behavioral monitor, opt-in via beekeeper protect install, correlates process, file, and network events on Linux (eBPF and fanotify), macOS (eslogger, no entitlement), and Windows (ETW, no CGO). Its rule set spans SENTRY-001 through SENTRY-008: credential-file clusters, credential-CLI bursts, first-outbound phone-home, fresh-extension correlation, exfiltration-signature fusion, an agent-CLI credential cluster (006), a generalized exfil fusion with no fresh-extension precondition (007), and persistence-location writes (008). Scope covers both editor-extension and agent-CLI process trees, so standalone-terminal agents are in scope, not just editor extensions. File-write events are ingested on all three platforms; DNS queries are ingested on Linux and Windows. The Sentry is detection-only: it writes audit records, it does not quarantine or kill.

7. Background catalog sync

Threat-intel freshness is automatic. Alongside the manual beekeeper catalogs sync, an unprivileged per-user daemon (beekeeper catalogs daemon install) syncs on an interval (default 2 hours, clamped to a 2 to 24 hour range) using conditional ETag requests, via a systemd user timer, a macOS LaunchAgent, or a Windows scheduled task. The interval is a self-defended setting: a project-layer config cannot disable it or loosen the cadence.

8. LlamaFirewall prompt-injection sidecar (opt-in, experimental, local)

An optional supervised Python sidecar scores agent tool output with PromptGuard 2 and agent-generated code with CodeShield. Inference is fully local: no API key and no third-party cloud (the earlier Together AI AlignmentCheck path was removed entirely). It is non-blocking by default and fails closed on crash, missing model, or scan error. The gated 22M model is bootstrapped per operator via beekeeper llamafirewall install. The full sidecar runs on Linux and macOS; native Windows is unsupported because CodeShield's semgrep dependency has no Windows build.

9. Self-protection, policy-as-code, TUI, and audit

Because the agent runs as the file owner, the tool-call hook is the layer that can stop it tampering with Beekeeper: agents cannot read or write the state directory, overwrite the binary, remove their own hook entry (content-aware, so other hooks stay editable), or invoke Beekeeper's mutating subcommands through Bash. Declarative JSON policies (policy validate/test/list) are enforced live across check/gateway/watch/scan over a five-layer config merge. A Bubble Tea v2 TUI dashboard surfaces live activity, alerts, catalog freshness, scan, policy, quarantine, and health. Every decision is written to a single NDJSON audit log (beekeeper.ndjson, owner-only, not rotated) with optional syslog, OTLP, and HTTPS sinks and an audit query/tail/export CLI.

10. Validated release gate

Beekeeper ships with an auditable validation posture rather than an asserted one. A coverage gate accounts for every production Go file as tested or as a reason-coded no-test allowlist entry, and fails closed on unjustified growth. A 17-harness conformance suite golden-file-tests every installer config and deny contract. A cross-platform CI matrix covers two Linux kernels, macOS, and Windows, including eBPF, eslogger, ETW, and Unix peer-cred auth. Five fuzz targets (policy engine, IPC parser, catalog parser, MCP parser, and the Sentry rule evaluator) run as a blocking release gate. What cannot be automated, a live block on each of the 16 non-Claude-Code harnesses and the gated-model sidecar end-to-end, is captured in a signed manual validation register. See the security posture for the full tiering and the documented known gaps.

11. Self-defense from day one

  • Reproducible builds (-trimpath -buildvcs=false -mod=readonly)
  • Keyless cosign signing via GitHub Actions OIDC
  • SLSA Level 3 provenance (slsa-github-generator@v2.1.0)
  • CycloneDX SBOM attached to the release
  • Public threat model (docs/THREAT-MODEL.md)
  • A separately-hosted, separately-keyed (Ed25519) beekeeper-self compromise feed so Beekeeper can refuse to run a tampered build of itself

Download and verify

When the v1.0.0 release is published, it will ship reproducibly built, cosign-signed (keyless via GitHub Actions OIDC), with SLSA Level 3 provenance and a CycloneDX SBOM. Verify it as follows once it is available:

GitHub Release v1.0.0

Release assets include signed binaries, checksums, SLSA L3 provenance, and a CycloneDX SBOM.

Verification

Step 1: Download release assets
gh -R home-beekeeper/beekeeper release download v1.0.0 \
  --pattern "checksums.txt" \
  --pattern "checksums.txt.sigstore.json"
Step 2: Verify cosign signature
cosign verify-blob \
  --bundle checksums.txt.sigstore.json \
  --certificate-identity-regexp '^https://github\.com/home-beekeeper/beekeeper/' \
  --certificate-oidc-issuer 'https://token.actions.githubusercontent.com' \
  checksums.txt
Step 3: Verify SLSA L3 provenance
slsa-verifier verify-artifact beekeeper \
  --provenance-path beekeeper.intoto.jsonl \
  --source-uri github.com/home-beekeeper/beekeeper

A CycloneDX SBOM (*.cdx.json) is also attached to the release for dependency auditing. These commands run against the published GitHub release.

Known limitations

Beekeeper documents its gaps alongside its posture so you do not develop false confidence. The headline limitations in this release: Hermes is structurally fail-open; Tier-3 native tools (Kilo, Trae) are unguarded outside the MCP gateway; only Claude Code is live-verified; binding the gateway to a non-loopback interface exposes it over plaintext HTTP; release_age and lifecycle_script_allowlist policy rules are accepted but not enforced; and the Sentry ingests DNS queries but no correlation rule consumes them yet, so DNS-tunnel exfiltration is not detected. The complete list, with mitigations, is on the security posture page.

Internal development history

For maintainers, the work in this release was built across internal milestones (corroboration and the standalone harness, runtime behavioral hardening, the public docs site, runtime hardening II, and full-system validation). Those internal version tags are not public releases; v1.0.0 is the first and, at launch, the only available version. The parked Pollen Windows-inventory fork is tracked separately and ships on its own cadence.

On this page