Files
gemini-cli/docs/resources/quota-and-pricing.md

6.7 KiB
Raw Permalink Blame History

Gemini CLI: Quotas and pricing

Gemini CLI offers a generous free tier that covers many individual developers' use cases. For enterprise or professional usage, or if you need increased quota, several options are available depending on your authentication account type.

For a high-level comparison of available subscriptions and to select the right quota for your needs, see the Plans page.

Overview

This article outlines the specific quotas and pricing applicable to Gemini CLI when using different authentication methods.

Generally, there are three categories to choose from:

  • Free Usage: Ideal for experimentation and light use.
  • Paid Tier (fixed price): For individual developers or enterprises who need more generous daily quotas and predictable costs.
  • Pay-As-You-Go: The most flexible option for professional use, long-running tasks, or when you need full control over your usage.

Free usage

Access to Gemini CLI begins with a generous free tier, perfect for experimentation and light use.

Your free usage is governed by the following limits, which depend on your authorization type.

Log in with Google (Gemini Code Assist for individuals)

For users who authenticate by using their Google account to access Gemini Code Assist for individuals. This includes:

  • 1000 model requests / user / day
  • 60 model requests / user / minute
  • Model requests will be made across the Gemini model family as determined by Gemini CLI.

Learn more at Gemini Code Assist for Individuals Limits.

Log in with Gemini API Key (unpaid)

If you are using a Gemini API key, you can also benefit from a free tier. This includes:

  • 250 model requests / user / day
  • 10 model requests / user / minute
  • Model requests to Flash model only.

Learn more at Gemini API Rate Limits.

Log in with Vertex AI (Express Mode)

Vertex AI offers an Express Mode without the need to enable billing. This includes:

  • 90 days before you need to enable billing.
  • Quotas and models are variable and specific to your account.

Learn more at Vertex AI Express Mode Limits.

Paid tier: Higher limits for a fixed cost

If you use up your initial number of requests, you can continue to benefit from Gemini CLI by upgrading to one of the following subscriptions:

Pay as you go

If you hit your daily request limits or exhaust your Gemini Pro quota even after upgrading, the most flexible solution is to switch to a pay-as-you-go model, where you pay for the specific amount of processing you use. This is the recommended path for uninterrupted access.

To do this, log in using a Gemini API key or Vertex AI.

Vertex AI (regular mode)

An enterprise-grade platform for building, deploying, and managing AI models, including Gemini. It offers enhanced security, data governance, and integration with other Google Cloud services.

  • Quota: Governed by a dynamic shared quota system or pre-purchased provisioned throughput.
  • Cost: Based on model and token usage.

Learn more at Vertex AI Dynamic Shared Quota and Vertex AI Pricing.

Gemini API key

Ideal for developers who want to quickly build applications with the Gemini models. This is the most direct way to use the models.

  • Quota: Varies by pricing tier.
  • Cost: Varies by pricing tier and model/token usage.

Learn more at Gemini API Rate Limits, Gemini API Pricing

Its important to highlight that when using an API key, you pay per token/call. This can be more expensive for many small calls with few tokens, but it's the only way to ensure your workflow isn't interrupted by reaching a limit on your quota.

Gemini for workspace plans

These plans currently apply only to the use of Gemini web-based products provided by Google-based experiences (for example, the Gemini web app or the Flow video editor). These plans do not apply to the API usage which powers the Gemini CLI. Supporting these plans is under active consideration for future support.

Check usage and limits

You can check your current token usage and applicable limits using the /stats model command. This command provides a snapshot of your current session's token usage, as well as information about the limits associated with your current quota.

For more information on the /stats command and its subcommands, see the Command Reference.

A summary of model usage is also presented on exit at the end of a session.

Tips to avoid high costs

When using a pay-as-you-go plan, be mindful of your usage to avoid unexpected costs.

  • Be selective with suggestions: Before accepting a suggestion, especially for a computationally intensive task like refactoring a large codebase, consider if it's the most cost-effective approach.
  • Use precise prompts: You are paying per call, so think about the most efficient way to get your desired result. A well-crafted prompt can often get you the answer you need in a single call, rather than multiple back-and-forth interactions.
  • Monitor your usage: Use the /stats model command to track your token usage during a session. This can help you stay aware of your spending in real time.