mirror of
https://github.com/google-gemini/gemini-cli.git
synced 2026-05-01 07:24:38 -07:00
docs(core): add automated gemma setup guide (#26233)
Co-authored-by: Samee Zahid <sameez@google.com>
This commit is contained in:
@@ -34,11 +34,11 @@ Gemini CLI will use a locally-running **Gemma** model to make routing decisions
|
||||
reduce costs associated with hosted model usage while offering similar routing
|
||||
decision latency and quality.
|
||||
|
||||
In order to use this feature, the local Gemma model **must** be served behind a
|
||||
Gemini API and accessible via HTTP at an endpoint configured in `settings.json`.
|
||||
The easiest way to set this up is using the automated `gemini gemma setup`
|
||||
command.
|
||||
|
||||
For more details on how to configure local model routing, see
|
||||
[Local Model Routing](../core/local-model-routing.md).
|
||||
[`gemini gemma` — Local Model Routing Setup](../core/gemma-setup.md).
|
||||
|
||||
### Model selection precedence
|
||||
|
||||
|
||||
@@ -0,0 +1,83 @@
|
||||
# `gemini gemma` — Automated Local Model Routing Setup
|
||||
|
||||
Local model routing uses a local Gemma 3 1B model running on your machine to
|
||||
classify and route user requests. It routes simple requests (like file reads) to
|
||||
Gemini Flash and complex requests (like architecture discussions) to Gemini Pro.
|
||||
|
||||
<!-- prettier-ignore -->
|
||||
> [!NOTE]
|
||||
> This is an experimental feature currently under active development.
|
||||
|
||||
## What is this?
|
||||
|
||||
This feature saves cloud API costs by using local inference for task
|
||||
classification instead of a cloud-based classifier. It adds a few milliseconds
|
||||
of local latency but can significantly reduce the overall token usage for hosted
|
||||
models.
|
||||
|
||||
## Quick start
|
||||
|
||||
```bash
|
||||
# One command does everything: downloads runtime, pulls model, configures settings, starts server
|
||||
gemini gemma setup
|
||||
```
|
||||
|
||||
You'll be prompted to accept the Gemma Terms of Use. The model is ~1 GB.
|
||||
|
||||
After setup, **just use the CLI normally** — routing happens automatically on
|
||||
every request.
|
||||
|
||||
## Commands
|
||||
|
||||
| Command | What it does |
|
||||
| --------------------- | -------------------------------------------------------------- |
|
||||
| `gemini gemma setup` | Full install (binary + model + settings + server start) |
|
||||
| `gemini gemma status` | Health check — shows what's installed and running |
|
||||
| `gemini gemma start` | Start the LiteRT server (auto-starts on CLI launch by default) |
|
||||
| `gemini gemma stop` | Stop the LiteRT server |
|
||||
| `gemini gemma logs` | Tail the server logs to see routing requests live |
|
||||
| `/gemma` | In-session status check (type it inside the CLI) |
|
||||
|
||||
## Verifying it works
|
||||
|
||||
1. Run `gemini gemma status` — all checks should show green
|
||||
2. Open two terminals:
|
||||
- Terminal 1: `gemini gemma logs` (watch for incoming requests)
|
||||
- Terminal 2: use the CLI normally
|
||||
3. You should see classification requests appear in the logs as you interact
|
||||
with the CLI
|
||||
4. The `/gemma` slash command inside a session shows a quick status panel
|
||||
|
||||
## Setup flags
|
||||
|
||||
```bash
|
||||
gemini gemma setup --port 8080 # custom port
|
||||
gemini gemma setup --no-start # don't start server after install
|
||||
gemini gemma setup --force # re-download everything
|
||||
gemini gemma setup --skip-model # binary only, skip the 1GB model download
|
||||
```
|
||||
|
||||
## How it works under the hood
|
||||
|
||||
- Local Gemma classifies each request as "simple" or "complex" (~100ms)
|
||||
- Simple → Flash, Complex → Pro
|
||||
- If the local server is down, the CLI silently falls back to the cloud
|
||||
classifier — no errors, no disruption
|
||||
|
||||
## Disabling
|
||||
|
||||
Set `enabled: false` in settings or just run `gemini gemma stop` to turn off the
|
||||
server:
|
||||
|
||||
```json
|
||||
{ "experimental": { "gemmaModelRouter": { "enabled": false } } }
|
||||
```
|
||||
|
||||
## Advanced setup
|
||||
|
||||
If you are in an environment where the `gemini gemma setup` command cannot
|
||||
automatically download binaries (for example, behind a strict corporate
|
||||
firewall), you can perform the setup manually.
|
||||
|
||||
For more information, see the
|
||||
[Manual Local Model Routing Setup guide](./local-model-routing.md).
|
||||
+3
-2
@@ -15,8 +15,9 @@ requests sent from `packages/cli`. For a general overview of Gemini CLI, see the
|
||||
modular GEMINI.md import feature using @file.md syntax.
|
||||
- **[Policy Engine](../reference/policy-engine.md):** Use the Policy Engine for
|
||||
fine-grained control over tool execution.
|
||||
- **[Local Model Routing (experimental)](./local-model-routing.md):** Learn how
|
||||
to enable use of a local Gemma model for model routing decisions.
|
||||
- **[Local Model Routing (experimental)](./gemma-setup.md):** Learn how to
|
||||
enable use of a local Gemma model for model routing decisions using the
|
||||
automated setup command.
|
||||
|
||||
## Role of the core
|
||||
|
||||
|
||||
@@ -1,22 +1,29 @@
|
||||
# Local Model Routing (experimental)
|
||||
# Manual Local Model Routing Setup (experimental)
|
||||
|
||||
Gemini CLI supports using a local model for
|
||||
[routing decisions](../cli/model-routing.md). When configured, Gemini CLI will
|
||||
use a locally-running **Gemma** model to make routing decisions (instead of
|
||||
sending routing decisions to a hosted model).
|
||||
|
||||
<!-- prettier-ignore -->
|
||||
> [!NOTE]
|
||||
> This is an experimental feature currently under active development.
|
||||
|
||||
<!-- prettier-ignore -->
|
||||
> [!IMPORTANT]
|
||||
> **Recommended:** We now provide a fully automated setup command. We recommend
|
||||
> using the [`gemini gemma` Setup Guide](./gemma-setup.md) instead of following
|
||||
> these manual steps.
|
||||
|
||||
This feature can help reduce costs associated with hosted model usage while
|
||||
offering similar routing decision latency and quality.
|
||||
|
||||
> **Note: Local model routing is currently an experimental feature.**
|
||||
|
||||
## Setup
|
||||
## Manual Setup
|
||||
|
||||
Using a Gemma model for routing decisions requires that an implementation of a
|
||||
Gemma model be running locally on your machine, served behind an HTTP endpoint
|
||||
and accessed via the Gemini API.
|
||||
|
||||
To serve the Gemma model, follow these steps:
|
||||
and accessed via the Gemini API. If you cannot use the `gemini gemma setup`
|
||||
command, follow these manual steps:
|
||||
|
||||
### Download the LiteRT-LM runtime
|
||||
|
||||
|
||||
Reference in New Issue
Block a user