Providers

Agenvoy supports ten LLM providers behind a unified Agent.Send() interface.

Supported list

Provider	Config name	Notes
Anthropic Claude	`claude`	Messages API; parallel tool use enabled by default
OpenAI	`openai`	Chat Completions / Responses API
OpenAI Codex	`codex`	OAuth login (uses your ChatGPT / Codex account, no API key); SSE streaming; auto prompt-cache key (`sha256(instructions)`)
Google Gemini	`gemini`	gemini-2.x / 3.x families
GitHub Copilot	`copilot`	Requires GitHub OAuth (one-shot login flow)
Nvidia NIM	`nvidia`	Llama, Mistral, and other open-weight hosted models
xAI Grok	`grok`	grok-4 / grok-3 families incl. `grok-code-fast-1`; non-streaming HTTP client
DeepSeek	`deepseek`	`deepseek-chat` (tool use) and `deepseek-reasoner` (CoT, temperature disabled); non-streaming HTTP client
OpenRouter	`openrouter`	Aggregator — routes to 200+ models from multiple providers via a single API key
Compat	`compat`	Any custom OpenAI-compatible endpoint

Model discovery — since v0.27.3 model lists are fetched live from each provider's API during agen model add / TUI /model global. The static JSON catalogs under configs/jsons/providors/ have been removed.

Provider configuration

agen model add          # Interactive provider/model add
agen model remove       # Interactive provider/model remove
agen model list         # List registered models
agen model dispatcher   # Choose the dispatcher model
agen model reasoning    # Set dispatcher reasoning effort: low / medium / high / xhigh

Credentials (API keys, OAuth tokens) are stored in the OS keychain under service agenvoy, never in plain JSON.

Dispatcher model

The dispatcher LLM decides which worker model handles each task. It is invoked through SelectAgent() before Execute() enters its iteration loop, receiving the user input plus a hint about any matched skill.

Configure via agen model dispatcher (model selection) and agen model reasoning (reasoning effort).

Streaming

Only openaiCodex uses SSE for response streaming (parseSSEStream accumulates argsBuf per item_id). Other providers receive the full response in one shot per turn.

Parallel tool calls

Claude Messages API — parallel tool use is on by default
OpenAI Responses API — parallel_tool_calls=true left on
The agenvoy execution engine still serializes commit (Pass 3) and respects per-tool concurrency markers

Prompt caching

openaiCodex/send.go computes sha256(instructions) and sends it as prompt_cache_key. Anthropic and OpenAI both honor automatic prefix caching at >=1024 tokens, so no explicit cache markers are needed.

Adding a custom OpenAI-compatible endpoint

Use the compat provider type and point at any endpoint that accepts the OpenAI Chat Completions schema. URL convention follows Zed: enter the URL up to /v1 (e.g. http://192.168.1.10:4000/v1, Ollama default http://localhost:11434/v1). compat/send.go appends only /chat/completions.

/providor → name: VLLM
            URL:  http://192.168.1.10:4000/v1
            API key: <bearer token, or blank>
            Model: gemma3-27b-it          (becomes compat[VLLM]@gemma3-27b-it)

Storage split (URL vs key)

What	Where	API
URL	`~/.config/agenvoy/config.json` `compats[].URL`	`session.UpsertCompat` / `session.GetCompatURL`
API key	OS keychain	`keychain.Set("COMPAT_<NAME>_API_KEY", value)`

compat.New reads URL via session.GetCompatURL(instanceName). There is no COMPAT_<NAME>_URL keychain key (intentionally removed).

Tested compat targets

Target	Works	Notes
Ollama	Yes	default `http://localhost:11434/v1`
LM Studio	Yes
vLLM	Yes	`--enable-auto-tool-choice --tool-call-parser <name>` for tool use
llama.cpp server	Yes
LiteLLM proxy	Yes	virtual key as Bearer token
Groq / Together / DeepInfra / OpenRouter / Fireworks	Yes
Azure OpenAI	No	needs `api-key` header (not `Bearer`) + `?api-version=` query — not supported
Reasoning-only models (o1, deepseek-r1, QwQ)	Partial	compat sends `temperature: 0.2` hardcoded; some servers 422

Send timeout (3 layers)

Send-side timeout has three independent layers, each catching a different failure mode:

Layer	Value	Catches	Where
Transport `ResponseHeaderTimeout`	`15s` (SSE only)	Backend stuck before returning headers (healthy SSE returns <1s; high load <=5s; 15s aligns with `ProbeTimeout`)	`openaiCodex/new.go::newHTTPClient()`
`http.Client.Timeout`	`10m`	Full request (headers + body)	per-provider client
`execute.go::AgentSendTimeout`	env `AGENT_SEND_TIMEOUT_SECONDS`, default `600s`	Exec-layer ceiling via `context.WithTimeout`	`internal/agents/exec/execute.go`

All providers use 10m client timeout. Only the codex SSE transport adds a ResponseHeaderTimeout layer; non-SSE providers and compat omit it.

HTTP client factory split

Provider category	Factory	Config
Cloud non-SSE (claude / copilot / gemini / nvidia / openai / openrouter)	`provider.NewHTTPClient()`	`Timeout=10m`
Cloud SSE (openaiCodex)	`openaiCodex/new.go::newHTTPClient()`	`Timeout=10m` + `ResponseHeaderTimeout=15s`
Local / self-hosted (compat)	inline `&http.Client{Timeout: 10 * time.Minute}`	no `ResponseHeaderTimeout` — Ollama / vLLM / llama.cpp cold-start may hold 30-90s before headers; 15s would 100% false-positive

Local compat is not routed through the factory by design. Cold-start tolerance is non-negotiable for self-hosted backends.

Retry semantics

sendFailCount accumulates unconditionally for timeout/network errors (payload didn't reach the model; signature comparison is meaningless)
For content-level errors (parse failure, 4xx with body, garbage response), retry is sig-based — same payload signature increments the counter; different signature resets it
sendFailCount >= MaxRetry (default 3) triggers the MaxRetry-exhausted path, which emits sendText + EventDone with a branch-specific message (timeout / context-length / generic)
During retries (sendFailCount < MaxRetry) only slog.Warn is emitted; no chat event surfaces (avoids noisy "retrying 1/3, 2/3" spam — only the final outcome reaches the user)

OAuth device-code polling (copilot/login.go) uses a separate http.Client{Timeout: 30s} per poll — zero timeout would let GitHub OAuth backend hang and lock the entire login flow.

[!NOTE] This document was auto-generated by Claude after reading the full source code.