Providers
hermes-router routes across a pool of providers. You only need one key to start — add more (and more providers) to stay online longer. You can stack quota by creating multiple keys per provider, and by signing up with multiple Google/GitHub accounts.
Add keys with hr auth add <provider> (see configuration.md for where
they’re stored).
Free providers
Section titled “Free providers”| Provider | Free tier | Sign up |
|---|---|---|
| Gemini | Generous per-minute limits | aistudio.google.com |
| OpenRouter | 50 requests/day per key | openrouter.ai |
| SambaNova | Free, fast Llama models | cloud.sambanova.ai |
| GitHub Models | Free with any GitHub account | github.com/settings/tokens |
| Cerebras | Fast inference, free tier | cloud.cerebras.ai |
| Groq | Fast inference, free tier | console.groq.com |
| Mistral | Free tier | console.mistral.ai |
| Cohere | 1,000 calls/mo per key | dashboard.cohere.com |
| Z.ai (GLM) | ~1k requests/day | z.ai |
| Naga AI | 100 requests/day per key | naga.ac |
| NVIDIA NIM | 40 requests/min per key | build.nvidia.com |
| Hugging Face | ~$0.10/mo credit (PRO: $2/mo) — 45k+ models | huggingface.co/settings/tokens |
Hugging Face note: one token reaches 45,000+ models across many inference partners via an OpenAI-compatible endpoint. The free credit is small, so it’s best as an extra in the pool (the router fails over to other providers when it runs out). The default model uses the
:cheapestsuffix to stretch the credit; change it withHUGGINGFACE_MODEL.
Paid providers
Section titled “Paid providers”Add your existing API key; the router handles everything else.
| Provider | Default model | API keys |
|---|---|---|
| OpenAI | gpt-4o-mini | platform.openai.com |
| Anthropic | claude-haiku-4-5 | console.anthropic.com |
Anthropic’s API uses a different wire format from OpenAI. hermes-router translates automatically — your app sends the same OpenAI-format request regardless of which provider handles it.
Codex (ChatGPT subscription)
Section titled “Codex (ChatGPT subscription)”Codex lets you use your ChatGPT subscription (Plus/Pro/Go) for completions instead of a pay-per-token API key. It doesn’t use an API key — it authenticates with OAuth tokens, so setup is different:
codex login # one-time, with the official Codex CLI (opens browser / device flow)hr auth import-codex # copy the login into the router (reads ~/.codex/auth.json)hr restartThe router stores the account under codex_accounts in auth.json, refreshes the access
token automatically before it expires, and translates your OpenAI-format requests to the
Codex Responses API transparently. Add several accounts (run hr auth import-codex after
logging into each) and pair with hr mode sequential to drain one account’s quota before the
next. Override the model with CODEX_MODEL (default gpt-5.5).
⚠️ Terms of service: routing ChatGPT subscription quota through a proxy is a gray area in OpenAI’s terms and could risk your account. Use your own accounts, at your own discretion.
Kimi (Moonshot coding plan)
Section titled “Kimi (Moonshot coding plan)”The Kimi coding plan (Moonshot) is a subscription, but — unlike Codex — it authenticates
with a normal API key (sk-...), not OAuth. Its endpoint is OpenAI-compatible, so it adds
like any other provider:
hr auth add kimi # paste your Kimi/Moonshot keyhr restartDefaults to https://api.kimi.com/coding/v1 with model kimi-for-coding. Using the standard
Moonshot API instead of the coding plan? Point it elsewhere with KIMI_BASE_URL
(e.g. https://api.moonshot.ai/v1) and set KIMI_MODEL to a model like kimi-k2-0905-preview.
Get a key at platform.kimi.ai / platform.moonshot.ai.
Local models (Ollama / LM Studio / llama.cpp)
Section titled “Local models (Ollama / LM Studio / llama.cpp)”Run a model on your own machine and route to it — free, private, and fast, with the cloud
providers as automatic fallback. Any OpenAI-compatible local server works (Ollama, LM Studio,
llama.cpp’s server, vLLM…). It’s keyless, so there’s nothing to add with hr auth add —
just point the router at it:
# e.g. with Ollama: ollama serve && ollama pull llama3.1hr model set local llama3.1 # writes LOCAL_MODEL; enables the local providerhr restartOr set it directly in .env:
LOCAL_BASE_URL=http://localhost:11434/v1 # Ollama default (LM Studio: http://localhost:1234/v1)LOCAL_MODEL=llama3.1 # comma-separate for multi-model failover# LOCAL_EMBED_MODEL=nomic-embed-text # optional: also serve /v1/embeddings locallyThe provider turns on as soon as LOCAL_BASE_URL or LOCAL_MODEL is set.
Conversation mode — send the model id hermes-router:fast (or the header
X-Hermes-Profile: fast) and the router prefers your local model for short/casual turns,
falling back to the cloud pool for heavier requests. Plain hermes-router keeps the normal
smart routing across every provider.
Valid provider names
Section titled “Valid provider names”Use these names with hr auth add, hr model set, and the <PROVIDER>_* environment
variables:
gemini, openrouter, sambanova, github_models, cerebras, groq, mistral,
cohere, zai, naga, nvidia, huggingface, kimi, openai, anthropic, codex,
local.
Per-provider capabilities
Section titled “Per-provider capabilities”Each provider’s model is probed at startup for function-calling and reasoning
support; results show up in hr status and /v1/status. See
usage.md for how those affect tool routing, and
configuration.md for the override variables.