How much does it cost to run ChatGPT in Google Sheets via the API?

Using gpt-4o-mini, OpenAI charges $0.15 per million input tokens and $0.60 per million output tokens as of June 2026. A classification pass over 1,000 rows with 50-word descriptions per row costs roughly $0.04. Full gpt-4o runs $2.50 per million input tokens. Most finance use cases (expense reclassification, data standardization, variance commentary) don't need gpt-4o's reasoning depth. gpt-4o-mini handles them at a fraction of the cost.

Why do my ASK_GPT outputs change between model refreshes?

Two causes. First, if temperature is above 0, the API is probabilistic by design. The OpenAI documentation explicitly states outputs can differ for identical inputs. Set temperature to 0 in your payload for any financial classification task. Second, custom functions recalculate when upstream cells change, triggering new API calls that can return slightly different phrasings even at temperature 0. The practical fix: once you're satisfied with a classification run, paste the column as values to lock the outputs. Treat GPT outputs like imported external data, not live formulas.

Can ChatGPT in Google Sheets reference data across multiple tabs?

Not directly via a custom function. Each `=ASK_GPT()` call receives only what you pass as a string. You can concatenate cross-tab values manually, e.g., `"Revenue: " & 'P&L'!C14 & " | Budget: " & Assumptions!$B$3`, which works for a few reference points. It breaks down for genuine cross-tab reasoning where the model needs to understand relationships between ranges. A sidebar agent that reads the sheet directly handles multi-tab analysis without the string-concatenation workaround.

What's the rate limit I'll hit running ChatGPT formulas across a large dataset?

Standard OpenAI accounts start on Tier 1, capped at 60 requests per minute. If you're running a custom function across 300+ rows simultaneously, you'll hit quota errors and see `#ERROR!` values in your sheet. The fix is adding `Utilities.sleep(1000)` between calls in your Apps Script loop, processing in batches of 50-60 rows at a time. Tier 2 raises the limit to 5,000 requests per minute, which is sufficient for most bulk processing. You reach Tier 2 automatically after $50 of cumulative API spend.

Is it safe to send financial model data through the OpenAI API?

OpenAI's API terms state that data submitted via the API is not used for model training by default, unlike the free ChatGPT web interface. That said, data is transmitted to OpenAI's servers. For anything covered by your company's data policy (customer PII, unreleased deal terms, M&A target data), check with your compliance team before sending it through. Most finance teams handle this by stripping identifying information before running GPT classification: anonymize company names in the prompt, keep only the numerical or categorical data you actually need classified.

ChatGPT in Google Sheets: 3 Methods for Finance (2026)

Three Ways to Wire ChatGPT into Google Sheets

Method 1: The IMPORTXML Hack (Don't)

There's a category of tutorials that routes ChatGPT requests through IMPORTXML pointed at some unofficial proxy. They mostly stopped working in 2024. Google's security model treats external service calls differently now, and any solution that doesn't go through the official OpenAI API is one terms-of-service update away from breaking silently.

Don't build anything real on these.

Method 2: Apps Script + OpenAI API (The Practical Option)

This is the approach that actually works. You write a small Google Apps Script function that calls the OpenAI API directly, then expose it as a custom function in your sheet. As of June 2026, gpt-4o-mini costs $0.15 per million input tokens and $0.60 per million output tokens. Running a classification pass on 1,000 rows of 50-word descriptions costs roughly $0.04.

Here's the core function. Drop it into Extensions → Apps Script:

// Store your API key in Script Properties - never hardcode it
const API_KEY = PropertiesService.getScriptProperties().getProperty('OPENAI_API_KEY');

function ASK_GPT(prompt, systemPrompt) {
  // systemPrompt is optional; defaults to generic assistant
  systemPrompt = systemPrompt || 'You are a helpful assistant. Be concise.';

  const payload = {
    model: 'gpt-4o-mini',
    messages: [
      { role: 'system', content: systemPrompt },
      { role: 'user', content: prompt }
    ],
    max_tokens: 150,
    temperature: 0  // 0 = deterministic outputs; critical for financial classification
  };

  const options = {
    method: 'post',
    contentType: 'application/json',
    headers: { Authorization: 'Bearer ' + API_KEY },
    payload: JSON.stringify(payload)
  };

  const response = UrlFetchApp.fetch('https://api.openai.com/v1/chat/completions', options);
  const json = JSON.parse(response.getContentText());
  return json.choices[0].message.content.trim();
}

With that wired up, you can call it like a native formula:

=ASK_GPT("Classify this expense as COGS or OpEx: " & B2, "Return only 'COGS' or 'OpEx'. No explanation.")

Or pull context from another tab:

=ASK_GPT("Segment: " & 'Revenue'!C2 & ". Product: " & 'Revenue'!D2 & ". Classify as enterprise or SMB.", "Return 'Enterprise' or 'SMB' only.")

One real constraint: Google Apps Script caps execution at 6 minutes per run, and the OpenAI API enforces 60 requests per minute on standard Tier 1 accounts. Running ASK_GPT across 500 rows simultaneously will hit quota errors. The fix is adding Utilities.sleep(1000) between calls in a loop, or processing column-by-column rather than as a dragged formula.

Temperature matters here. The OpenAI API documentation states that "outputs may differ even for identical inputs" at temperature above 0, which is a problem in any financial workflow where a reclassification needs to stay consistent across quarterly refreshes. Set temperature to 0 for anything that feeds into a model.

Method 3: A Sidebar Agent

Apps Script functions are stateless. Each call is a fresh conversation: the model has no memory of what it classified in row 47 when it's working on row 48. For simple row-by-row tasks, that's fine. For anything requiring reasoning across your sheet structure (why this formula breaks, how to restructure a waterfall, how to bridge two tabs), stateless API calls don't cut it.

A sidebar agent stays resident while you work. It reads your active range, understands your sheet structure, and can chain multiple operations without you re-explaining context each time.

What ChatGPT in Google Sheets Actually Does Well for Finance

The honest answer: narrow, repeatable text tasks on structured data.

Expense reclassification. You have 800 vendor line items imported from your ERP. Half are miscategorized. A formula like =ASK_GPT("Reclassify to standard chart of accounts: " & A2 & " | Current category: " & B2, "Return the corrected account name only.") processes the whole column in a few minutes. It beats a VLOOKUP table when the inputs are free-text and messy.

Narrative generation for board packs. The numbers are done; you need 3 sentences on the $2.1M revenue miss in Q2. =ASK_GPT("Write a 2-sentence CFO commentary: Revenue was $18.4M vs $20.5M budget. Primary driver: enterprise deal slippage of " & C2 & " deals.") won't be perfect, but it's faster than staring at a blank cell.

Data standardization. Country names entered as "US", "United States", "USA", and "U.S." by different team members. =ASK_GPT("Standardize to ISO 3166-1 alpha-2: " & A2) normalizes the column without a 40-row SWITCH formula.

Tagging large transaction logs. Contribution margin analysis by SKU across 5,000 rows with inconsistent product descriptions. GPT handles the fuzzy matching faster than you can build the regex.

What these have in common: single-column operations on text that doesn't require cross-tab reasoning.

Where ChatGPT in Google Sheets Breaks (and Why It Matters for Models)

This is where the "just ask ChatGPT" enthusiasm runs into reality.

Non-determinism. Even at temperature 0, minor prompt variations produce different outputs. If you're using ASK_GPT to classify revenue as ARR vs. non-recurring, and the prompt changes slightly between quarterly refreshes, you'll get silent reclassifications that only surface when your ARR bridge doesn't tie. The OpenAI API is a probability engine made more deterministic, not fully deterministic.

No awareness of your model structure. A custom function call doesn't know that your SUMIFS in 'Returns Analysis'!E14 references a range you just restructured. It can't flag that your DCF terminal value assumes a 2.5% perpetuity growth rate that contradicts the macro assumptions on the Inputs tab. Reasoning about relationships between cells and tabs requires actually reading the sheet.

Recalculation instability. Custom functions recalculate when dependent cells change. Edit an upstream cell with ASK_GPT called 50 times downstream and you've triggered 50 API calls. At $2.50 per million input tokens for full gpt-4o, that adds up. More importantly, it adds latency to every model edit.

Context window limits. You can't pass an entire 8-tab financial model into a cell formula. If your analysis needs to cross-reference 'P&L'!C:C, 'Balance Sheet'!E:E, and 'Cash Flow'!D:D simultaneously, a custom function can't do it without ugly string concatenation that strips the structure anyway.

For anything beyond column-level text processing, the custom function approach hits a ceiling fast.

When a Sidebar Agent Is the Right Tool

For tasks where you need the model to understand what's in the sheet (not just transform a cell value), a sidebar agent is a different category of tool.

ModelMonkey sits in the Google Sheets sidebar and reads your active context. You can ask it: "The EBITDA margin in 'P&L'!F22 doesn't match what rolls up from 'Segment'!H:H - find where the bridge breaks." It runs the actual reads, traces the references, and surfaces what's off. That's not something a =ASK_GPT() call can do.

The two workflows aren't mutually exclusive. Use Apps Script custom functions for bulk column operations where stateless processing is fine. Use a sidebar agent for model-level reasoning where context matters.