Documentation

Providers

Agenvoy supports ten LLM providers behind a unified Agent.Send() interface.

Supported list

Provider Config name Notes
Anthropic Claude claude Messages API; parallel tool use enabled by default
OpenAI openai Chat Completions / Responses API
OpenAI Codex codex OAuth login (uses your ChatGPT / Codex account, no API key); SSE streaming; auto prompt-cache key (sha256(instructions))
Google Gemini gemini gemini-2.x / 3.x families
GitHub Copilot copilot Requires GitHub OAuth (one-shot login flow)
Nvidia NIM nvidia Llama, Mistral, and other open-weight hosted models
xAI Grok grok grok-4 / grok-3 families incl. grok-code-fast-1; non-streaming HTTP client
DeepSeek deepseek deepseek-chat (tool use) and deepseek-reasoner (CoT, temperature disabled); non-streaming HTTP client
OpenRouter openrouter Aggregator — routes to 200+ models from multiple providers via a single API key
Compat compat Any custom OpenAI-compatible endpoint

Model discovery — since v0.27.3 model lists are fetched live from each provider's API during agen model add / TUI /model global. The static JSON catalogs under configs/jsons/providors/ have been removed.

Provider configuration

agen model add          # Interactive provider/model add
agen model remove       # Interactive provider/model remove
agen model list         # List registered models
agen model dispatcher   # Choose the dispatcher model
agen model reasoning    # Set dispatcher reasoning effort: low / medium / high / xhigh

Credentials (API keys, OAuth tokens) are stored in the OS keychain under service agenvoy, never in plain JSON.

Dispatcher model

The dispatcher LLM decides which worker model handles each task. It is invoked through SelectAgent() before Execute() enters its iteration loop, receiving the user input plus a hint about any matched skill.

Configure via agen model dispatcher (model selection) and agen model reasoning (reasoning effort).

Streaming

Only openaiCodex uses SSE for response streaming (parseSSEStream accumulates argsBuf per item_id). Other providers receive the full response in one shot per turn.

Parallel tool calls

Prompt caching

openaiCodex/send.go computes sha256(instructions) and sends it as prompt_cache_key. Anthropic and OpenAI both honor automatic prefix caching at >=1024 tokens, so no explicit cache markers are needed.

Adding a custom OpenAI-compatible endpoint

Use the compat provider type and point at any endpoint that accepts the OpenAI Chat Completions schema. URL convention follows Zed: enter the URL up to /v1 (e.g. http://192.168.1.10:4000/v1, Ollama default http://localhost:11434/v1). compat/send.go appends only /chat/completions.

/providor → name: VLLM
            URL:  http://192.168.1.10:4000/v1
            API key: <bearer token, or blank>
            Model: gemma3-27b-it          (becomes compat[VLLM]@gemma3-27b-it)

Storage split (URL vs key)

What Where API
URL ~/.config/agenvoy/config.json compats[].URL session.UpsertCompat / session.GetCompatURL
API key OS keychain keychain.Set("COMPAT_<NAME>_API_KEY", value)

compat.New reads URL via session.GetCompatURL(instanceName). There is no COMPAT_<NAME>_URL keychain key (intentionally removed).

Tested compat targets

Target Works Notes
Ollama Yes default http://localhost:11434/v1
LM Studio Yes
vLLM Yes --enable-auto-tool-choice --tool-call-parser <name> for tool use
llama.cpp server Yes
LiteLLM proxy Yes virtual key as Bearer token
Groq / Together / DeepInfra / OpenRouter / Fireworks Yes
Azure OpenAI No needs api-key header (not Bearer) + ?api-version= query — not supported
Reasoning-only models (o1, deepseek-r1, QwQ) Partial compat sends temperature: 0.2 hardcoded; some servers 422

Send timeout (3 layers)

Send-side timeout has three independent layers, each catching a different failure mode:

Layer Value Catches Where
Transport ResponseHeaderTimeout 15s (SSE only) Backend stuck before returning headers (healthy SSE returns <1s; high load <=5s; 15s aligns with ProbeTimeout) openaiCodex/new.go::newHTTPClient()
http.Client.Timeout 10m Full request (headers + body) per-provider client
execute.go::AgentSendTimeout env AGENT_SEND_TIMEOUT_SECONDS, default 600s Exec-layer ceiling via context.WithTimeout internal/agents/exec/execute.go

All providers use 10m client timeout. Only the codex SSE transport adds a ResponseHeaderTimeout layer; non-SSE providers and compat omit it.

HTTP client factory split

Provider category Factory Config
Cloud non-SSE (claude / copilot / gemini / nvidia / openai / openrouter) provider.NewHTTPClient() Timeout=10m
Cloud SSE (openaiCodex) openaiCodex/new.go::newHTTPClient() Timeout=10m + ResponseHeaderTimeout=15s
Local / self-hosted (compat) inline &http.Client{Timeout: 10 * time.Minute} no ResponseHeaderTimeout — Ollama / vLLM / llama.cpp cold-start may hold 30-90s before headers; 15s would 100% false-positive

Local compat is not routed through the factory by design. Cold-start tolerance is non-negotiable for self-hosted backends.

Retry semantics

OAuth device-code polling (copilot/login.go) uses a separate http.Client{Timeout: 30s} per poll — zero timeout would let GitHub OAuth backend hang and lock the entire login flow.


[!NOTE] This document was auto-generated by Claude after reading the full source code.