From 8cec56706476e73db5a12df6080d6e1b09fc3c56 Mon Sep 17 00:00:00 2001
From: Samee Zahid <sameescouser24@gmail.com>
Date: Wed, 29 Apr 2026 16:08:54 -0700
Subject: [PATCH] docs(core): add automated gemma setup guide (#26233)

Co-authored-by: Samee Zahid <sameez@google.com>
---
 docs/cli/model-routing.md        |  6 +--
 docs/core/gemma-setup.md         | 83 ++++++++++++++++++++++++++++++++
 docs/core/index.md               |  5 +-
 docs/core/local-model-routing.md | 21 +++++---
 4 files changed, 103 insertions(+), 12 deletions(-)
 create mode 100644 docs/core/gemma-setup.md

diff --git a/docs/cli/model-routing.md b/docs/cli/model-routing.md
index c9ec073a64..a29dd98a9b 100644
--- a/docs/cli/model-routing.md
+++ b/docs/cli/model-routing.md
@@ -34,11 +34,11 @@ Gemini CLI will use a locally-running **Gemma** model to make routing decisions
 reduce costs associated with hosted model usage while offering similar routing
 decision latency and quality.
 
-In order to use this feature, the local Gemma model **must** be served behind a
-Gemini API and accessible via HTTP at an endpoint configured in `settings.json`.
+The easiest way to set this up is using the automated `gemini gemma setup`
+command.
 
 For more details on how to configure local model routing, see
-[Local Model Routing](../core/local-model-routing.md).
+[`gemini gemma` — Local Model Routing Setup](../core/gemma-setup.md).
 
 ### Model selection precedence
 
diff --git a/docs/core/gemma-setup.md b/docs/core/gemma-setup.md
new file mode 100644
index 0000000000..166bdc885b
--- /dev/null
+++ b/docs/core/gemma-setup.md
@@ -0,0 +1,83 @@
+# `gemini gemma` — Automated Local Model Routing Setup
+
+Local model routing uses a local Gemma 3 1B model running on your machine to
+classify and route user requests. It routes simple requests (like file reads) to
+Gemini Flash and complex requests (like architecture discussions) to Gemini Pro.
+
+<!-- prettier-ignore -->
+> [!NOTE]
+> This is an experimental feature currently under active development.
+
+## What is this?
+
+This feature saves cloud API costs by using local inference for task
+classification instead of a cloud-based classifier. It adds a few milliseconds
+of local latency but can significantly reduce the overall token usage for hosted
+models.
+
+## Quick start
+
+```bash
+# One command does everything: downloads runtime, pulls model, configures settings, starts server
+gemini gemma setup
+```
+
+You'll be prompted to accept the Gemma Terms of Use. The model is ~1 GB.
+
+After setup, **just use the CLI normally** — routing happens automatically on
+every request.
+
+## Commands
+
+| Command               | What it does                                                   |
+| --------------------- | -------------------------------------------------------------- |
+| `gemini gemma setup`  | Full install (binary + model + settings + server start)        |
+| `gemini gemma status` | Health check — shows what's installed and running              |
+| `gemini gemma start`  | Start the LiteRT server (auto-starts on CLI launch by default) |
+| `gemini gemma stop`   | Stop the LiteRT server                                         |
+| `gemini gemma logs`   | Tail the server logs to see routing requests live              |
+| `/gemma`              | In-session status check (type it inside the CLI)               |
+
+## Verifying it works
+
+1. Run `gemini gemma status` — all checks should show green
+2. Open two terminals:
+   - Terminal 1: `gemini gemma logs` (watch for incoming requests)
+   - Terminal 2: use the CLI normally
+3. You should see classification requests appear in the logs as you interact
+   with the CLI
+4. The `/gemma` slash command inside a session shows a quick status panel
+
+## Setup flags
+
+```bash
+gemini gemma setup --port 8080      # custom port
+gemini gemma setup --no-start       # don't start server after install
+gemini gemma setup --force           # re-download everything
+gemini gemma setup --skip-model     # binary only, skip the 1GB model download
+```
+
+## How it works under the hood
+
+- Local Gemma classifies each request as "simple" or "complex" (~100ms)
+- Simple → Flash, Complex → Pro
+- If the local server is down, the CLI silently falls back to the cloud
+  classifier — no errors, no disruption
+
+## Disabling
+
+Set `enabled: false` in settings or just run `gemini gemma stop` to turn off the
+server:
+
+```json
+{ "experimental": { "gemmaModelRouter": { "enabled": false } } }
+```
+
+## Advanced setup
+
+If you are in an environment where the `gemini gemma setup` command cannot
+automatically download binaries (for example, behind a strict corporate
+firewall), you can perform the setup manually.
+
+For more information, see the
+[Manual Local Model Routing Setup guide](./local-model-routing.md).
diff --git a/docs/core/index.md b/docs/core/index.md
index 2724e8e922..ca10cc6e48 100644
--- a/docs/core/index.md
+++ b/docs/core/index.md
@@ -15,8 +15,9 @@ requests sent from `packages/cli`. For a general overview of Gemini CLI, see the
   modular GEMINI.md import feature using @file.md syntax.
 - **[Policy Engine](../reference/policy-engine.md):** Use the Policy Engine for
   fine-grained control over tool execution.
-- **[Local Model Routing (experimental)](./local-model-routing.md):** Learn how
-  to enable use of a local Gemma model for model routing decisions.
+- **[Local Model Routing (experimental)](./gemma-setup.md):** Learn how to
+  enable use of a local Gemma model for model routing decisions using the
+  automated setup command.
 
 ## Role of the core
 
diff --git a/docs/core/local-model-routing.md b/docs/core/local-model-routing.md
index 220ee13c46..3ab3709ed1 100644
--- a/docs/core/local-model-routing.md
+++ b/docs/core/local-model-routing.md
@@ -1,22 +1,29 @@
-# Local Model Routing (experimental)
+# Manual Local Model Routing Setup (experimental)
 
 Gemini CLI supports using a local model for
 [routing decisions](../cli/model-routing.md). When configured, Gemini CLI will
 use a locally-running **Gemma** model to make routing decisions (instead of
 sending routing decisions to a hosted model).
 
+<!-- prettier-ignore -->
+> [!NOTE]
+> This is an experimental feature currently under active development.
+
+<!-- prettier-ignore -->
+> [!IMPORTANT]
+> **Recommended:** We now provide a fully automated setup command. We recommend
+> using the [`gemini gemma` Setup Guide](./gemma-setup.md) instead of following
+> these manual steps.
+
 This feature can help reduce costs associated with hosted model usage while
 offering similar routing decision latency and quality.
 
-> **Note: Local model routing is currently an experimental feature.**
-
-## Setup
+## Manual Setup
 
 Using a Gemma model for routing decisions requires that an implementation of a
 Gemma model be running locally on your machine, served behind an HTTP endpoint
-and accessed via the Gemini API.
-
-To serve the Gemma model, follow these steps:
+and accessed via the Gemini API. If you cannot use the `gemini gemma setup`
+command, follow these manual steps:
 
 ### Download the LiteRT-LM runtime