docs(core): add automated gemma setup guide (#26233)

Co-authored-by: Samee Zahid <sameez@google.com>
This commit is contained in:
Samee Zahid
2026-04-29 16:08:54 -07:00
committed by GitHub
parent fa1a7c10bd
commit 8cec567064
4 changed files with 103 additions and 12 deletions
+3 -3
View File
@@ -34,11 +34,11 @@ Gemini CLI will use a locally-running **Gemma** model to make routing decisions
reduce costs associated with hosted model usage while offering similar routing
decision latency and quality.
In order to use this feature, the local Gemma model **must** be served behind a
Gemini API and accessible via HTTP at an endpoint configured in `settings.json`.
The easiest way to set this up is using the automated `gemini gemma setup`
command.
For more details on how to configure local model routing, see
[Local Model Routing](../core/local-model-routing.md).
[`gemini gemma` — Local Model Routing Setup](../core/gemma-setup.md).
### Model selection precedence
+83
View File
@@ -0,0 +1,83 @@
# `gemini gemma` — Automated Local Model Routing Setup
Local model routing uses a local Gemma 3 1B model running on your machine to
classify and route user requests. It routes simple requests (like file reads) to
Gemini Flash and complex requests (like architecture discussions) to Gemini Pro.
<!-- prettier-ignore -->
> [!NOTE]
> This is an experimental feature currently under active development.
## What is this?
This feature saves cloud API costs by using local inference for task
classification instead of a cloud-based classifier. It adds a few milliseconds
of local latency but can significantly reduce the overall token usage for hosted
models.
## Quick start
```bash
# One command does everything: downloads runtime, pulls model, configures settings, starts server
gemini gemma setup
```
You'll be prompted to accept the Gemma Terms of Use. The model is ~1 GB.
After setup, **just use the CLI normally** — routing happens automatically on
every request.
## Commands
| Command | What it does |
| --------------------- | -------------------------------------------------------------- |
| `gemini gemma setup` | Full install (binary + model + settings + server start) |
| `gemini gemma status` | Health check — shows what's installed and running |
| `gemini gemma start` | Start the LiteRT server (auto-starts on CLI launch by default) |
| `gemini gemma stop` | Stop the LiteRT server |
| `gemini gemma logs` | Tail the server logs to see routing requests live |
| `/gemma` | In-session status check (type it inside the CLI) |
## Verifying it works
1. Run `gemini gemma status` — all checks should show green
2. Open two terminals:
- Terminal 1: `gemini gemma logs` (watch for incoming requests)
- Terminal 2: use the CLI normally
3. You should see classification requests appear in the logs as you interact
with the CLI
4. The `/gemma` slash command inside a session shows a quick status panel
## Setup flags
```bash
gemini gemma setup --port 8080 # custom port
gemini gemma setup --no-start # don't start server after install
gemini gemma setup --force # re-download everything
gemini gemma setup --skip-model # binary only, skip the 1GB model download
```
## How it works under the hood
- Local Gemma classifies each request as "simple" or "complex" (~100ms)
- Simple → Flash, Complex → Pro
- If the local server is down, the CLI silently falls back to the cloud
classifier — no errors, no disruption
## Disabling
Set `enabled: false` in settings or just run `gemini gemma stop` to turn off the
server:
```json
{ "experimental": { "gemmaModelRouter": { "enabled": false } } }
```
## Advanced setup
If you are in an environment where the `gemini gemma setup` command cannot
automatically download binaries (for example, behind a strict corporate
firewall), you can perform the setup manually.
For more information, see the
[Manual Local Model Routing Setup guide](./local-model-routing.md).
+3 -2
View File
@@ -15,8 +15,9 @@ requests sent from `packages/cli`. For a general overview of Gemini CLI, see the
modular GEMINI.md import feature using @file.md syntax.
- **[Policy Engine](../reference/policy-engine.md):** Use the Policy Engine for
fine-grained control over tool execution.
- **[Local Model Routing (experimental)](./local-model-routing.md):** Learn how
to enable use of a local Gemma model for model routing decisions.
- **[Local Model Routing (experimental)](./gemma-setup.md):** Learn how to
enable use of a local Gemma model for model routing decisions using the
automated setup command.
## Role of the core
+14 -7
View File
@@ -1,22 +1,29 @@
# Local Model Routing (experimental)
# Manual Local Model Routing Setup (experimental)
Gemini CLI supports using a local model for
[routing decisions](../cli/model-routing.md). When configured, Gemini CLI will
use a locally-running **Gemma** model to make routing decisions (instead of
sending routing decisions to a hosted model).
<!-- prettier-ignore -->
> [!NOTE]
> This is an experimental feature currently under active development.
<!-- prettier-ignore -->
> [!IMPORTANT]
> **Recommended:** We now provide a fully automated setup command. We recommend
> using the [`gemini gemma` Setup Guide](./gemma-setup.md) instead of following
> these manual steps.
This feature can help reduce costs associated with hosted model usage while
offering similar routing decision latency and quality.
> **Note: Local model routing is currently an experimental feature.**
## Setup
## Manual Setup
Using a Gemma model for routing decisions requires that an implementation of a
Gemma model be running locally on your machine, served behind an HTTP endpoint
and accessed via the Gemini API.
To serve the Gemma model, follow these steps:
and accessed via the Gemini API. If you cannot use the `gemini gemma setup`
command, follow these manual steps:
### Download the LiteRT-LM runtime