# `gemini gemma` — Automated Local Model Routing Setup Local model routing uses a local Gemma 3 1B model running on your machine to classify and route user requests. It routes simple requests (like file reads) to Gemini Flash and complex requests (like architecture discussions) to Gemini Pro. > [!NOTE] > This is an experimental feature currently under active development. ## What is this? This feature saves cloud API costs by using local inference for task classification instead of a cloud-based classifier. It adds a few milliseconds of local latency but can significantly reduce the overall token usage for hosted models. ## Quick start ```bash # One command does everything: downloads runtime, pulls model, configures settings, starts server gemini gemma setup ``` You'll be prompted to accept the Gemma Terms of Use. The model is ~1 GB. After setup, **just use the CLI normally** — routing happens automatically on every request. ## Commands | Command | What it does | | --------------------- | -------------------------------------------------------------- | | `gemini gemma setup` | Full install (binary + model + settings + server start) | | `gemini gemma status` | Health check — shows what's installed and running | | `gemini gemma start` | Start the LiteRT server (auto-starts on CLI launch by default) | | `gemini gemma stop` | Stop the LiteRT server | | `gemini gemma logs` | Tail the server logs to see routing requests live | | `/gemma` | In-session status check (type it inside the CLI) | ## Verifying it works 1. Run `gemini gemma status` — all checks should show green 2. Open two terminals: - Terminal 1: `gemini gemma logs` (watch for incoming requests) - Terminal 2: use the CLI normally 3. You should see classification requests appear in the logs as you interact with the CLI 4. The `/gemma` slash command inside a session shows a quick status panel ## Setup flags ```bash gemini gemma setup --port 8080 # custom port gemini gemma setup --no-start # don't start server after install gemini gemma setup --force # re-download everything gemini gemma setup --skip-model # binary only, skip the 1GB model download ``` ## How it works under the hood - Local Gemma classifies each request as "simple" or "complex" (~100ms) - Simple → Flash, Complex → Pro - If the local server is down, the CLI silently falls back to the cloud classifier — no errors, no disruption ## Disabling Set `enabled: false` in settings or just run `gemini gemma stop` to turn off the server: ```json { "experimental": { "gemmaModelRouter": { "enabled": false } } } ``` ## Advanced setup If you are in an environment where the `gemini gemma setup` command cannot automatically download binaries (for example, behind a strict corporate firewall), you can perform the setup manually. For more information, see the [Manual Local Model Routing Setup guide](./local-model-routing.md).