Can DuckDB query multiple sheets in the same Google Sheets file at once?

Yes. When DuckDB is integrated via a `sheet_range()` table function, you reference each sheet as a separate range argument and join them in SQL. For example, `FROM sheet_range('Sheet1!A1:D200') a JOIN sheet_range('Sheet2!A1:C100') b ON a.id = b.id` works as a standard SQL join. The ranges are fetched independently and joined in DuckDB's in-memory engine.

What SQL features does this integration support?

DuckDB supports standard SQL including CTEs (`WITH` clauses), window functions (`OVER`, `PARTITION BY`, `QUALIFY`), lateral joins, `HAVING` on aggregates, and most ANSI SQL features. It does not support stored procedures or triggers - those aren't relevant to a query-only integration against spreadsheet data.

How does this compare to Google Sheets' built-in QUERY function?

QUERY() uses a limited SQL-like syntax called Google Visualization API Query Language. It handles basic SELECT, WHERE, GROUP BY, and ORDER BY, but doesn't support multi-range joins, window functions, or CTEs. DuckDB speaks standard SQL with no restrictions on query complexity. The trade-off is that QUERY() runs inside Sheets with no setup, while DuckDB requires an integration layer.

What happens when the result set is too large?

Most DuckDB-in-Sheets integrations cap inline results at around 200 rows to keep response times reasonable and avoid overwhelming the spreadsheet. Larger result sets typically require specifying a destination sheet where the full output gets written. This is a deliberate design choice - the aggregate usually matters more than 50,000 individual rows inline.

Does the DuckDB integration work with formulas and calculated cells, or only raw values?

It reads the computed values of cells, not the underlying formulas. So if a cell contains `=A2*B2`, DuckDB sees the numeric result, not the formula string. This is usually what you want for analysis. If you need to analyze formula structure itself, that requires a different approach (reading formulas via the Sheets API directly).

Google Sheets DuckDB Integration: What It Actually Does

That's the pitch. The reality is more nuanced, and worth understanding before you wire it up.

What DuckDB Actually Adds to Google Sheets

Google Sheets has QUERY(), which speaks a limited SQL dialect. It handles basic SELECT and WHERE, but stumbles on anything involving multiple source ranges, aggregations across joins, or window functions. DuckDB doesn't have those restrictions - it's a full OLAP engine that supports standard SQL, including CTEs, lateral joins, and QUALIFY for filtering window function results.

The architectural difference matters. QUERY() evaluates inside the Sheets formula engine, which means it's limited to what that engine exposes. DuckDB runs as a separate in-process database that reads your sheet data as tabular input. Your spreadsheet becomes a data source rather than the execution environment.

In practice, this means you can do things like:

Join two ranges from different sheets on a common key column
Compute rolling averages or running totals with OVER (PARTITION BY ... ORDER BY ...)
Filter aggregated results with HAVING instead of wrapping in a subquery
Run GROUP BY on multiple dimensions at once and pivot the output

A financial analyst building a contribution margin analysis across product lines and regions - the kind of thing that currently lives in a 12-tab workbook with VLOOKUP chains - can collapse that into a single SQL query against the raw data ranges.

How the Integration Works

There are a few ways to get DuckDB operating against Sheets data, covered in more detail in How to Connect DuckDB to Google Sheets. The short version: you're either exporting data to a format DuckDB can read (CSV, Parquet), or you're using an integration layer that handles the data bridging automatically.

The more interesting approach is embedding DuckDB directly in the toolchain - so the query engine has live access to sheet ranges rather than stale exports. This is what ModelMonkey does internally: a sheet_range() table function registered as a DuckDB UDF lets you reference spreadsheet data in A1 notation directly inside SQL queries.

SELECT 
  category,
  SUM(revenue) as total_revenue,
  AVG(margin_pct) as avg_margin
FROM sheet_range('P&L!B2:E500')
GROUP BY category
ORDER BY total_revenue DESC

That query runs against live spreadsheet data. No export step, no stale CSV. The sheet_range() function fetches the range at query time and returns it as a typed tabular source DuckDB can operate on.

Column types get inferred from the data - numbers come through as DOUBLE, booleans as BOOLEAN, and anything mixed defaults to VARCHAR. As of April 2026, DuckDB's type inference handles most finance-oriented data well, though you'll occasionally need explicit casts when a column has header text mixed with numeric data.

The Limits That Actually Matter

Google Sheets caps at 10 million cells. DuckDB can handle datasets orders of magnitude larger - but inside a Sheets integration, you're bounded by what Sheets itself can hold. According to Google's Sheets documentation, that 10M cell limit is hard. So the scenario where you'd need to "promote" the workload to a server-side database (MotherDuck, for instance) just doesn't arise if your data lives entirely in Sheets.

What does matter is output size. Returning a 50,000-row result set inline to a spreadsheet is slow and usually pointless - you wanted the aggregate, not the raw rows. A sensible integration caps inline results at around 200 rows and routes larger outputs to a destination sheet. ModelMonkey enforces this: queries return 200 rows by default, and larger outputs require specifying output:"sheet" to write results directly to a named range. This keeps the LLM context clean while still making full result sets accessible.

Query timeouts are the other practical limit. Complex joins against large ranges can take several seconds. The DuckDB instance runs in-memory, which is fast, but the bottleneck is usually fetching the range data from the Sheets API - that round trip adds latency proportional to range size.

Where the Native Embedding Wins

The export-and-query approach (dump to CSV, run DuckDB locally, import results back) works, but it has a friction problem: the data is stale the moment you export it. If you're building analysis that needs to refresh as the spreadsheet updates, you're back to manual steps.

An embedded integration where DuckDB reads ranges at query time solves this. The query always reflects the current state of the sheet. This matters most for operational dashboards - inventory trackers, sales pipelines, appointment schedules - where the underlying data changes daily and the analysis needs to stay current without manual intervention.

The other advantage is provenance. When DuckDB runs as part of an AI agent workflow (as it does in ModelMonkey), the system can track which ranges were accessed for each query. That lets the agent explain its reasoning: "I queried ranges P&L!B2:E500 and Products!A2:C200 and joined on product_id." Compare that to a formula in a cell - you know the output, but the derivation is buried in nested functions.

What a Real Workflow Looks Like

Consider an operations analyst with a surgery scheduling dashboard: one sheet has procedure codes and duration estimates, another has actual start/end times, a third has surgeon assignments. Calculating on-time performance by surgeon and procedure type is genuinely hard with formulas. It involves multiple VLOOKUP chains, intermediate calculation columns, and pivot tables that need manual refreshing.

With DuckDB:

SELECT 
  s.surgeon_name,
  p.procedure_type,
  COUNT(*) as total_cases,
  ROUND(
    100.0 * SUM(CASE WHEN a.start_time <= s.scheduled_start THEN 1 ELSE 0 END) / COUNT(*), 
    1
  ) as on_time_pct
FROM sheet_range('Actuals!A2:D500') a
JOIN sheet_range('Schedule!A2:C500') s ON a.case_id = s.case_id
JOIN sheet_range('Procedures!A2:B100') p ON s.procedure_code = p.code
GROUP BY s.surgeon_name, p.procedure_type
HAVING COUNT(*) >= 5
ORDER BY on_time_pct ASC

That's one query. No intermediate columns, no VLOOKUP chains, no manually refreshed pivot. The analyst asks for it, it runs against the current sheet data, and the results land in a summary sheet.

The formula equivalent would take an afternoon to build and another afternoon to debug when someone adds a surgeon with a name that has a comma in it.