Documentation

Design, generate, and run sovereign AI agents

Overview

Forge is an AI agent factory. You design an agent through a conversation with the Forge architect at myagentos.ai/create. The architect captures your context, wires the capabilities your agent needs, and emits a downloadable agent zip you run on your own infrastructure with your own keys.

Architect (web)

LLM-powered designer that runs the Phase 1.5 interview and emits a scout-config spec.

Generator (Python)

Reads the spec, builds the workspace, renders the runner template, zips the result.

Runtime (in your agent)

The vendored scout_runtime your agent ships with — REPL, gateway, capabilities.

BYOK end-to-end. You bring your own LLM keys, your own per-capability credentials, your own hardware. Forge writes the agent and gets out of the way. Every byte that runs your agent lives in the zip you downloaded — no phone-home, no telemetry, no vendor lock-in.

Getting Started

Build Your First Agent

Five steps from idea to a running agent on your machine.

  1. Go to myagentos.ai/create
  2. Enter your LLM API key (Anthropic, OpenAI, Gemini, or Grok)
  3. Describe what you want. The architect runs Phase 1.5 — a 5-wave context interview that captures who you are, what you want the agent to do, and how you want it to behave. This step is what makes the agent yours instead of generic.
  4. Confirm the design when the architect summarizes it back
  5. Click Build, download the zip
Quick test: "Build me an end-of-day inbox triage agent that drafts replies and surfaces what slipped." The architect will interview you, wire all 28 capabilities, generate the agent, and have you running it locally within minutes.

API Keys (BYOK)

Forge is bring-your-own-key end to end. There are two key surfaces to understand.

ProviderKey formatCost model
Anthropicsk-ant-...Pay per token (Claude family)
OpenAIsk-...Pay per token (GPT family)
GeminiAI...Free tier + paid
Grok (xAI)xai-...Pay per token
Custom (Ollama, llama.cpp)Any / noneFree (local)

The architect's key powers the design conversation at myagentos.ai. It only lives in your browser session — Forge never stores it server-side.

The agent's key powers your agent's reasoning at runtime. It goes in the .env file inside the downloaded zip.

In sovereign mode you don't need a runtime key at all — the LLM runs locally inside the container via Ollama.

Phase 1.5 Context Interview

This is what separates Forge from generic agent builders. Before the architect generates anything, it interviews you in 5 waves to capture 27 context fields. The answers get embedded into your agent's workspace files so the agent boots with you already known — no generic "How can I help you?" greeting.

F31 finalize gate: the architect cannot ship the agent until at least 5 required fields are captured — user_name, user_role, user_communication_style, response_length_preference, and primary_objective. The other 22 are strongly encouraged but optional.

Wave 1 — Who you are (8 questions)

Name, role, communication style, decision style, pet peeves, expertise areas, expertise gaps, timezone. Becomes your USER.md.

Wave 2 — Domain context (6 questions)

Primary objective, current workflow, pain points, ideal intervention point, escalation rules, audit requirements.

Wave 3 — Operational constraints (6 questions)

Data sources, output destinations, integration constraints, data sensitivity, budget, availability. Becomes your AGENTS.md.

Wave 4 — Voice (4 questions)

Voice role models, voice anti-patterns, humor tolerance, default response length. Becomes your SOUL.md.

Wave 5 — Hard rules (3 questions)

What the agent must NEVER do without confirmation, what it IS authorized to do unprompted, what triggers immediate escalation. Becomes your STANDING_ORDERS.md.

After your answers, the architect emits a scout-config JSON block. The web layer parses it and posts it to the generator, which embeds every captured field into your workspace files.

Your First Download

What you got is a self-contained Python package. Unzip, create a venv, install, fill in .env, run.

macOS / Linux

unzip agent.zip
cd agent
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
cp .env.example .env
# edit .env to add ANTHROPIC_API_KEY (and any per-capability keys
# the architect wired — Slack, GitHub, Notion, etc.)

# REPL mode (default)
python -m <agent-name>

# Gateway mode (for scout-tui or other external clients)
python -m <agent-name> --gateway

Windows (PowerShell)

Expand-Archive agent.zip -DestinationPath .
cd agent
py -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -e .
Copy-Item .env.example .env
# edit .env to add ANTHROPIC_API_KEY (and any per-capability keys
# the architect wired — Slack, GitHub, Notion, etc.)
notepad .env

# REPL mode (default)
py -m <agent-name>

# Gateway mode (for scout-tui or other external clients)
py -m <agent-name> --gateway
Need Python? macOS: brew install python or download from python.org. Linux: usually pre-installed; otherwise sudo apt install python3 python3-venv (Ubuntu/Debian) or sudo dnf install python3 (Rocky/Fedora). Windows: install from python.org with "Add to PATH" checked, or winget install Python.Python.3. If PowerShell blocks Activate.ps1, run Set-ExecutionPolicy -Scope CurrentUser RemoteSigned once.

See Gateway server and scout-tui client for the multi-client setup.

Platform Support

Forge agents run on Linux, macOS, and Windows. Every generated agent ships with cross-platform code — the shell capability detects Windows and uses cmd.exe, the python capability uses whichever interpreter you're already running, and all file paths use pathlib.

PlatformStatusNotes
Linux✓ FullAll 28 capabilities work. Tested target.
macOS✓ FullAll 28 work. TTS uses native say.
Windows 10/11✓ FullAll 28 work. Use PowerShell commands above.
WSL on Windows✓ FullIf you prefer Unix tooling on a Windows box.

The myagentos.ai website (architect chat, design, download) works in any modern browser — Chrome, Edge, Firefox, Safari — on any OS. Building the agent is OS-independent; only the "run the agent" step touches your machine, and that step has first-class support on all three.

Switching machines? Your agent is portable. Copy the folder (including .venv is optional — you can rebuild it with one command on the new machine). Workspace state (memory, drafts, logs) is plain markdown + sqlite — fully portable across OSes.

The 28 Capabilities

Capabilities are the menu the architect proposes during design. Each one is a discrete piece of functionality — memory, scheduling, email, slack, web search, code execution, etc. — that gets wired into your agent if you opt in.

F34 doctrine — wire all 28 by default. The architect wires every catalog capability into your agent unless you explicitly tell it to skip one ("skip Slack, skip Notion"). Omission is verbose — you must give a reason. The default of "wire everything" exists because incrementally-discovered capabilities tend to be more useful than incrementally-removed ones.

Core (5)

The primitives every agent should have — memory, scheduling, proactive behavior, procedural skills, user-set rules.

persistent_memory

Persistent memory (A-MEM)

Durable memory across sessions. Stored locally as embeddings + graph links via fastembed (ONNX, no PyTorch). Works air-gapped after first model fetch.

Use: Remembers that you prefer concise replies and that 'the Atlas project' is the migration plan you described last week.

scheduled_commitments

Scheduled follow-ups

The agent schedules its own follow-ups. When it decides 'I should check back in 3 days,' a commitment is queued and surfaced on the heartbeat. No external scheduler.

Use: On Monday you mention you're waiting on a vendor reply. On Thursday the agent surfaces the commitment and asks if you've heard back.

heartbeat_loop

Heartbeat / proactive check-ins

Drives proactive behavior on a configurable interval. Without it the agent only acts when you speak first.

Use: Every hour the agent reviews commitments. Every morning it offers a one-line summary of overnight changes.

skills

Skills system (procedural memory)

The agent loads SKILL.md playbooks for recurring task types — trigger conditions, numbered steps, pitfalls. Required for any agent doing repeatable workflows.

Use: You say 'deploy the API.' Agent matches the 'deploy-api' skill, runs its steps, reports each outcome.

standing_orders

Standing orders

Mutable user-set runtime rules with top priority in system prompt assembly. Different from SOUL.md (voice) — these are hard constraints you add after the agent ships.

Use: "Always check Calendar before suggesting meeting times." "Never send emails after 8pm without confirmation."

Execution (5)

Letting the agent actually do things — spawn sub-agents, run code, orchestrate workflows, isolate risky operations.

subagent_spawning

Sub-agent orchestration

Spawn child agents for parallel or isolated work. Useful for reasoning-heavy subtasks that would flood the parent's context, or for running multiple workstreams concurrently.

Use: You ask for a research brief on 3 competitors. Agent spawns 3 sub-agents in parallel, parent synthesizes.

python_execution

Python code execution

Write and run Python in a subprocess with the agent's installed packages available (pandas, numpy, requests, etc. if wired). 30s default timeout, 200KB output cap. Working directory isolated from the host.

Use: You paste a CSV and ask 'what's the trend in column 3?' Agent writes pandas code, runs it, surfaces the answer.

shell_execution

Shell command execution

Run shell commands in a sandboxed subprocess. Marked as a dangerous tool requiring explicit per-call approval when require_approval is set.

Use: You say 'show me git log for this repo.' Agent runs git log and surfaces the output.

flows

Multi-step workflows

Named flows that chain tool calls with explicit state, retries, and rollback. Checkpoint-persisted so they survive crashes.

Use: 'Incident-triage' flow: ack alert → fetch logs → look up runbook → page on-call → create post-mortem doc. Resumes from last checkpoint if interrupted.

sandbox

Sandboxed code execution

Isolated execution environment for python_execution and shell_execution. Limits filesystem access, network, CPU time, and memory. Per-capability config — sandbox python tightly while leaving shell more permissive, or vice versa.

Use: Agent runs user-submitted Python from chat. Sandbox limits to 30s, no network, read-only /tmp, 256MB RAM.

Communication (6)

Channels the agent can read from and write to — email, chat, SMS, webhooks, the gateway, voice.

email_imap

Email (IMAP read + SMTP send)

Read and send via standard IMAP/SMTP. Works with Gmail (app password), Fastmail, ProtonMail Bridge, self-hosted. Agent never sees the raw password.

Use: Agent watches a designated mailbox, forwards order confirmations to accounting, replies to common questions, escalates the rest.

slack_integration

Slack (post + read + DM)

Post messages, read channel history, send DMs. Uses Slack's bot API with a workspace token. The agent becomes a first-class Slack participant.

Use: Agent posts deploy notifications, monitors #help for FAQ-able questions, DMs on-call when alerts spike.

sms_messaging

SMS (Twilio)

Send and receive SMS via Twilio. Good for low-bandwidth proactive notifications and users who prefer text over chat apps.

Use: Agent texts you when a long-running task finishes or when a calendar event is starting.

webhook_receiver

HTTP webhook receiver

Exposes an HTTP endpoint for incoming webhooks. FastAPI + uvicorn. Useful for integrations with services that POST events.

Use: Agent receives GitHub PR webhooks and reviews each one. Or Stripe payment events and updates the accounting log.

gateway

HTTP + WebSocket gateway

Network surface for multi-client access. Exposes the agent via HTTP REST + WebSocket so external clients (scout-tui, browser dashboards, IDE plugins) can connect. Default bind 127.0.0.1:7891. Required if you want to reach the agent from anywhere other than the local REPL.

Use: You want the polished scout-tui experience. Agent starts the gateway on localhost:7891; scout-tui connects. Multiple clients can connect simultaneously.

tts

Text-to-speech (voice output)

Voice output via OpenAI, ElevenLabs, MiniMax, Edge, or local `say` on macOS. Auto-routes based on configured backend. Audio streams to speakers, saves as files, or returns via the gateway.

Use: Heartbeat detects a critical alert. Agent speaks 'alert: production database CPU at 95% for 3 minutes' through speakers, then posts the same text to Slack.

Data (4)

Reading and writing data — local files, the live web, structured storage.

file_ops

Local file read/write

Read and write within a configured directory tree. Strict path-escape rejection — agent can't reach outside its sandbox. Configurable sandbox_root + max_file_size_kb.

Use: Agent maintains a project journal in ~/projects/notes/, summarizes meeting transcripts into structured notes, edits config files on request.

web_search

Web search

Query the live web. Pluggable provider (Tavily, Brave, Exa, SerpAPI). The agent uses this when it needs current information beyond its training data.

Use: You ask "what was the latest funding round for Acme Corp?" Agent searches, reads top results, summarizes.

web_extract

Web page content extraction

Fetch a URL and extract its main content as clean markdown. httpx + markdownify. Falls back to raw HTML when the extractor can't find a main region.

Use: You share a URL. Agent fetches and summarizes, or extracts a specific data point.

sqlite_database

Local SQLite database

Maintain structured records in local SQLite. Schema and queries are agent-driven — you describe what to track and the agent picks the schema. Works air-gapped.

Use: Agent maintains an expense tracker: each receipt you mention gets logged with date, vendor, amount, category.

Integration (6)

Plugging the agent into the SaaS tools you already use.

github_integration

GitHub (issues + PRs + repos)

Read and write GitHub via REST + a PAT or GitHub App. Create/update issues, comment on PRs, read repo contents, run actions.

Use: Agent watches a repo's issues, auto-labels them by topic, drafts PR review comments, syncs your TODO list with the tracker.

calendar_integration

Calendar (Google + iCal)

Read events from Google Calendar or an iCal feed. Optionally create events. Useful for scheduling agents and agents that need to be aware of your day.

Use: Agent knows your meeting schedule and proactively asks if you want a prep brief 15 minutes before each meeting.

notion_integration

Notion (read + write pages)

Read and write Notion pages, databases, and properties. Good for agents that maintain knowledge bases or project trackers in Notion.

Use: Agent maintains a CRM-lite Notion database, adding rows when you mention new contacts and updating fields as it learns.

linear_integration

Linear (issue tracking)

Read and write Linear issues, comments, and projects via GraphQL. For engineering-focused agents.

Use: Agent triages new bugs into the right team's queue, drafts initial repro notes, links related issues.

acp

ACP (Agent Coordination Protocol)

Lets Claude Code, OpenAI Codex CLI, and other ACP-compatible tools drive your agent as a sub-agent. stdio + JSON-RPC — no HTTP overhead. Useful when you want the agent reachable inside your IDE workflow.

Use: You're in Claude Code and want your agent to handle code review on a side branch. Claude Code connects via ACP, hands off, surfaces findings in the same session.

plugins

Plugin system

Extensible plugin loader with cryptographic signing. External packages add custom tools, hooks, or workflows. Signed plugins verify against trusted keys; unsigned ones require explicit consent.

Use: You install a community-built 'jira-integration' plugin. It signs against a known key, gets auto-trusted, registers new tools.

Observability (2)

Seeing what your agent did, what it cost, and why.

conversation_logging

Conversation logging

Every turn logged to disk as JSON lines. Useful for debugging, training-data collection, and audit trails.

Use: When the agent's behavior surprises you, you review the exact prompt + response in <state_dir>/conversations/.

metric_tracking

Usage metrics (tokens + cost)

Tracks LLM token usage and estimated cost per session. Useful for BYOK spend monitoring and spotting runaway loops early.

Use: You ask "how much did our chat today cost?" Agent reports tokens by model and a dollar estimate.

Workspace Files

Every generated agent ships with a workspace/ directory holding five markdown files. They drive how the agent thinks, sounds, and behaves. You can read and edit any of them directly — they're yours.

USER.md

Stores: factual user profile (the Wave 1 + Wave 2 answers). Edited by: the agent updates it as it learns; you can edit directly anytime. Read: every turn, injected into the system prompt.

# USER.md

- Name: Dan
- Role: Runs the CIQ Startup Program. Serial founder; values speed and signal over polish.
- Communication style: terse
- Decision style: intuition-led, justifies after
- Pet peeves: "I'd be happy to help!", fake humility, em-dash overuse
- Expertise areas: GTM, startup ops, AI infra
- Expertise gaps: low-level Rust, GPU kernels
- Timezone: America/New_York

# Domain
- Primary objective: end-of-day inbox triage with draft replies
- Current workflow: scroll Gmail at 5pm, miss things, draft poorly under fatigue
- Pain points: response latency on key threads, dropped commitments
- Ideal intervention point: 5pm summary + drafts ready for review

SOUL.md

Stores: the agent's voice, tone, and values (the Wave 4 answers). Edited by: set at design time; you edit if voice drifts. Read: every turn.

# SOUL.md

## Voice
Terse. Lead with the answer. No filler openings ("Great question!"),
no apologies for things that aren't apologies' fault.

## Role models
- Patrick Collison on Twitter — short, exact, no posturing
- A senior engineer who's seen this before and isn't impressed

## Anti-patterns
- "I'd be happy to help!"
- Excessive disclaimers
- LinkedIn-speak
- Em-dash overuse

## Humor
Dry. Occasional. Never forced.

## Default response length
1-3 sentences for casual questions. Long form only when explicitly asked.

IDENTITY.md

Stores: agent name, archetype, catchphrases, emoji. Edited by: set at generation; rarely changes. Read: boot only.

# IDENTITY.md

- Name: Flint
- Archetype: laconic operator
- Catchphrase: "Acknowledged."
- Emoji: 🪨
- One-line self-description: end-of-day inbox triage, drafts ready by 5pm

AGENTS.md

Stores: operational manual — how the agent behaves, which channels it reaches you on, hard ops rules (the Wave 3 answers). Edited by: set at generation; edit when ops change. Read: every turn.

# AGENTS.md

## Data sources
- Gmail (IMAP)
- Calendar (Google)
- Linear (read-only)

## Output destinations
- Draft emails to Gmail drafts folder
- 5pm summary to Slack DM
- Critical escalations via SMS

## Availability
Always-on background. Heartbeat every 30 minutes during business hours.

## Data sensitivity
Customer email contents — never log to conversation_logging, never include in summaries to third parties.

## Token budget
$5/day cap. Stop and alert if exceeded.

STANDING_ORDERS.md

Stores: hard rules, auto-actions, escalation triggers (the Wave 5 answers). Only emitted if you captured these in Phase 1.5. Edited by: you, anytime. Top priority in system prompt. Read: every turn.

# STANDING_ORDERS.md

## Never without confirmation
- Send any outbound message (email, SMS, Slack post)
- Modify a calendar event
- Spend more than $1 of LLM tokens on a single task

## Authorized without asking
- Read inbox, calendar, Linear
- Draft emails (save to drafts only)
- Log activity to local SQLite

## Escalate immediately via SMS
- Any error in send path
- Token spend within 20% of daily cap
- Heartbeat missed for >2 hours

Running Your Agent

One package, four ways to interact: REPL, gateway server, scout-tui client, or ACP integration into another CLI.

REPL (default)

The default mode. python -m <agent> opens an interactive prompt. Type a message, agent replies in its configured voice. Ctrl-D to exit. Type /help to see slash commands.

$ python -m flint

flint v0.1 — laconic operator. Acknowledged.
Workspace loaded: USER.md, SOUL.md, IDENTITY.md, AGENTS.md, STANDING_ORDERS.md
Capabilities: 28 wired
Provider: anthropic / claude-opus-4-8

> what's on my plate today
3 threads waiting >24h. 1 calendar conflict at 14:00. Drafts ready in Gmail.

> draft a reply to the vendor email
Drafted. Saved to Gmail drafts. 4 lines, declines politely, asks for revised quote by Friday.

>

Model selection

Three ways to pick which model your agent runs. They compose — design-time choice flows into runtime, and runtime overrides whatever was baked in.

1. At design time (myagentos.ai)

When you enter your API key, the modal shows a model picker for the detected provider. Pick once, save with Remember, and that choice powers both Phase 1.5 AND your generated agent's .env.example as a pre-filledLLM_MODEL= line.

Recommended: claude-opus-4-8 for Anthropic, gpt-5 for OpenAI, gemini-2-5-pro for Gemini. The picker auto-detects your provider from the key prefix.

2. Before boot (.env)

Set LLM_MODEL=<model-id> in your agent's.env file. Generated agents ship with this line pre-filled if you picked a model at design time; you can edit it any time. The runner reads it before instantiating the provider.

.env

# LLM Provider
ANTHROPIC_API_KEY=sk-ant-...
LLM_MODEL=claude-opus-4-8
GATEWAY_AUTH_TOKEN=flint-dev-token-123
...

3. At runtime (REPL slash command)

Swap models mid-conversation. /model shows the current model + everything your provider supports./model <name> switches. The next turn uses the new model; conversation history is preserved.

> /model

  Provider: anthropic
  Current model: claude-opus-4-8
  Available:
    * claude-opus-4-8
      claude-opus-4-7
      claude-sonnet-4-5
      claude-sonnet-4-20250514
      claude-haiku-4-5

  Switch with: /model <name>

> /model claude-haiku-4-5

  Model switched to: claude-haiku-4-5

> quick — what's blocking the launch?
Three things: vendor contract (signed yesterday), legal review (waiting on
Sarah), and staging deploy (in CI, 12 minutes left).

Useful patterns: drop to Haiku for cheap sub-agent dispatch, step up to Opus for complex synthesis turns, A/B test providers without restarting your agent.

Gateway server

For multi-client access — scout-tui, browser dashboards, IDE plugins. --gateway starts a uvicorn server on 127.0.0.1:7891 by default with HTTP routes for /v1/health, /v1/hello, /v1/rpc and a WebSocket at /v1/ws for chat.

$ python -m flint --gateway
[flint] loading workspace…
[flint] wiring capabilities (28)…
[flint] gateway listening on http://127.0.0.1:7891
[flint] WARNING: no GATEWAY_AUTH_TOKEN set; gateway accepts unauthenticated connections.
[flint] press Ctrl-C to stop

Set GATEWAY_AUTH_TOKEN in .env to require token auth. Override the bind with GATEWAY_HOST and GATEWAY_PORT. Set GATEWAY_HOST=0.0.0.0 only behind a real auth token — never expose the gateway publicly without one.

scout-tui client

A separate Ink/React terminal client that connects to a running agent over the gateway WebSocket. Polished banner, connection status, command history.

# In a terminal where the agent gateway is reachable
export SCOUT_GATEWAY_URL=ws://127.0.0.1:7891/v1/ws
export SCOUT_GATEWAY_TOKEN=<same token as agent's GATEWAY_AUTH_TOKEN>

# Run the TUI
node dist/entry.js
# (or: npm install -g scout-tui, then: scout-tui)

Multiple TUI clients can connect to the same gateway — useful for screen-sharing the agent during a call.

ACP integration

Agent Coordination Protocol lets other CLIs — Claude Code, Codex, OpenCode — drive your agent as a sub-agent via stdio + JSON-RPC. Run the agent in ACP mode and configure the calling CLI to spawn it.

# Run the agent as an ACP server over stdio
python -m flint --acp --stdio

# In Claude Code (~/.claude.json), register flint as an ACP agent
# (advanced — see the ACP capability docs in your agent's workspace/)

Deployment

Five ways to run your agent. All start from the same downloaded zip.

Local Python

Python 3.10+. Simplest option.

Container

Rocky Linux 9. ~180MB. Podman or Docker.

Sovereign

Bundled LLM. No internet needed.

Local Python

Fastest path to running your agent. Requires Python 3.10+.

unzip agent.zip && cd agent
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
cp .env.example .env
# Add your API keys

# REPL
python -m <agent-name>

# Gateway (for scout-tui)
python -m <agent-name> --gateway

Rocky Linux Container

Requires Podman (recommended) or Docker. The container ships with Python 3.11 and all wired-capability deps pre-installed.

unzip agent.zip && cd agent
cp .env.example .env
# Add your API keys

# Build the Rocky Linux container (~180MB)
./container/build.sh

# Run with the REPL attached
./container/run.sh --env ./.env

# Run as a gateway daemon
./container/run.sh --env ./.env --gateway --detach

Fully Sovereign

Bundles Ollama and a local LLM into the container image. Once built, no internet is needed. No API keys for the LLM. No data leaves your machine.

unzip agent.zip && cd agent

# Build with bundled model (~2-6GB image)
./container/build.sh --sovereign --model llama3.2:3b

# Run — fully offline from this point
./container/run.sh --env ./.env
ModelSizeRAMSpeed (CPU)Best for
llama3.2:3b~2GB4GB+~10 tok/sMost agents, fast, lightweight
phi3:mini~2.3GB4GB+~8 tok/sStrong reasoning
mistral:7b~4GB8GB+~5 tok/sBest quality on CPU
llama3.1:8b~4.7GB8GB+~4 tok/sNewest Llama
gemma2:9b~5.4GB12GB+~3 tok/sGoogle's best small model

Cloud Deployment

Deploy to any cloud provider. You don't need your own hardware.

CPU VPS ($5-10/month)

Best for most agents. A 3B sovereign model runs fine on CPU. Works on Hetzner, DigitalOcean, Vultr, Linode.

# Build locally
./container/build.sh --sovereign --model llama3.2:3b

# Save and copy to VPS
podman save my-agent:sovereign -o my-agent.tar
scp my-agent.tar .env user@your-vps:~/

# On the VPS
ssh user@your-vps
podman load -i my-agent.tar
podman run -d --name my-agent --restart=always \
    -v ~/.env:/app/.env:ro \
    -v ~/data:/app/data \
    -p 127.0.0.1:7891:7891 \
    my-agent:sovereign --gateway

GPU Cloud

For 7B+ models or low-latency needs. Lambda Labs ($0.80/hr), RunPod ($0.39/hr), Vast.ai ($0.15/hr).

podman push my-agent:sovereign ghcr.io/yourname/my-agent:sovereign

# On GPU instance
podman pull ghcr.io/yourname/my-agent:sovereign
podman run -d --name my-agent \
    --device nvidia.com/gpu=all \
    -v ./.env:/app/.env:ro \
    my-agent:sovereign --gateway

Air-Gapped

For classified or disconnected environments. Build on an internet-connected machine; transport via secure media.

# On internet-connected machine
./container/build.sh --sovereign --model llama3.2:3b
podman save my-agent:sovereign -o my-agent.tar
# Copy my-agent.tar + .env to USB

# On air-gapped machine
podman load -i my-agent.tar
podman run -d --name my-agent \
    -v ./.env:/app/.env:ro \
    my-agent:sovereign
No internet at any point on the target machine. The LLM, Python runtime, and all deps are baked into the image.

Configuration

.env file

The generated .env.example is grouped by category: provider keys, per-capability credentials, gateway settings, sandbox config, observability. Copy it to .env and fill in what your agent needs.

# ─── Provider ───────────────────────────────────────────
ANTHROPIC_API_KEY=sk-ant-...
# OPENAI_API_KEY=sk-...
# GEMINI_API_KEY=AI...

# ─── Gateway (for --gateway mode) ──────────────────────
GATEWAY_HOST=127.0.0.1
GATEWAY_PORT=7891
GATEWAY_AUTH_TOKEN=change-me-to-something-long

# ─── Per-capability credentials ─────────────────────────
# Slack
SLACK_BOT_TOKEN=xoxb-...
# Email
IMAP_HOST=imap.gmail.com
IMAP_USER=you@gmail.com
IMAP_PASSWORD=<app-password>
# GitHub
GITHUB_TOKEN=ghp_...
# Notion
NOTION_API_KEY=secret_...
# Linear
LINEAR_API_KEY=lin_api_...
# Calendar
GOOGLE_CALENDAR_CREDENTIALS=./credentials.json
# Twilio
TWILIO_ACCOUNT_SID=AC...
TWILIO_AUTH_TOKEN=...
TWILIO_FROM_NUMBER=+1...

# ─── Web search (optional) ──────────────────────────────
# TAVILY_API_KEY=tvly-...
# BRAVE_SEARCH_API_KEY=...

# ─── Observability ──────────────────────────────────────
# LOG_LEVEL=INFO

Comments in .env.example link to where each provider issues keys. The generator pulls these from each capability's env_docs_urls field.

Sovereign Models

In sovereign mode you bundle a model into the container image. The model runs locally via Ollama. No runtime LLM key needed.

# Common choices
./container/build.sh --sovereign --model llama3.2:3b      # Fast, lightweight
./container/build.sh --sovereign --model phi3:mini         # Strong reasoning
./container/build.sh --sovereign --model mistral:7b        # Best quality on CPU
./container/build.sh --sovereign --model llama3.1:8b       # Newest Llama
Any model on ollama.com/library works. Pass the tag to --model.

Architecture

Three components, one direction: web designs → Python generates → your runtime runs.

Web

Architect chat → scout-config

Generator

Workspace + zip

Runtime

Your agent runs

1. Web (myagentos.ai)

The architect chat is a Next.js app backed by an LLM (the user-provided key). It runs the Phase 1.5 interview, makes capability decisions, then emits a scout-config JSON block. spec-bridge.ts parses the block out of the chat stream and posts it to /api/generate-agent.

2. Generator (Python)

scout_architect validates the spec. The F30 capability-decision gate rejects unaccounted-for catalog capabilities. The F31 context gate rejects specs missing the 5 required Phase 1.5 fields. scout_generator builds the workspace (USER.md, SOUL.md, IDENTITY.md, AGENTS.md, STANDING_ORDERS.md), renders the runner template, vendors scout_runtime into _vendored/, and zips it.

3. Runtime (in your agent)

Every agent ships with scout_runtime vendored — no external runtime dependency. On boot it loads the workspace, wires the capabilities the spec requested, and starts either the REPL or the gateway depending on the CLI flag. You own everything in the zip.

Sovereignty by design. No phone-home, no telemetry, no remote config fetch. The agent works the same on an air-gapped machine as it does online (minus network-dependent capabilities like web_search and Slack).

Troubleshooting

"AnthropicProvider requires an api_key"

Set ANTHROPIC_API_KEY in .env. Or use --provider with the env var name to point at a different key (F45).

"task executor not available"

Your gateway is missing the CLI handler. Re-download a fresh agent post-PR #73 (F47). The fix wires TaskExecutor + the CLI handler into the gateway during boot.

WebSocket 404 on /v1/ws

pip install 'uvicorn[standard]'

You're missing the WebSocket extras (F46). The standard uvicorn install ships without them.

"no gateway token found" / 403 from scout-tui

Set GATEWAY_AUTH_TOKEN in the agent's .env and SCOUT_GATEWAY_TOKEN in the env where you run scout-tui. They must match.

"FINALIZE BLOCKED" in architect chat

Missing Phase 1.5 fields or capability decisions. The architect will list exactly which (F30 = capability decisions, F31 = context fields). Answer the missing questions or explicitly omit the missing capabilities and retry.

ModuleNotFoundError on agent boot

Either pip install -e . wasn't run inside the venv, or a stray NODE_ENV=production in your environment poisoned an unrelated npm step (relevant only for the web layer; agents are pure Python).

Container exits immediately

podman logs <agent-name>

Usually missing .env, invalid API key, or an import error from a capability whose extras weren't installed. Logs will name it.

Sovereign: Ollama won't start

The bundled model needs to fit in RAM. A 3B model needs 4GB+, a 7B needs 8GB+. Try a smaller model:

./container/build.sh --sovereign --model llama3.2:3b

Vercel deploy timing

The web layer (myagentos.ai/create) deploys in ~60-90s from merge to live. If you just published a PR and don't see the change yet, give it the full window.

Sovereign AI agents. Designed by you. Owned by you.