The SELECT clause supports aggregation functions — SUM, COUNT, AVG, MAX, MIN — combined with GROUP BY and PIVOT. That combination turns a raw transaction dump into a summary table without SUMIFS, without helper columns, without manually extending ranges every time someone adds a row. It's the part most tutorials skip because it's harder to explain, and it's the part that actually changes how you build models.
Column Lettering Is Range-Relative, Not Sheet-Relative
Fast orientation before the real stuff: a in a QUERY formula doesn't mean column A on the sheet. It means the first column of whatever range you passed to QUERY. If your range is transactions!C:H, then a maps to column C, b to column D, and so on.
This matters most when your data doesn't start at column A — which is almost always. Add a column to the left of your source range and every downstream QUERY formula breaks silently. Something to plan around.
One related behavior that trips people up: you can filter on a column without outputting it. SELECT a, b WHERE c = 'Marketing' is valid — column c applies the filter but doesn't appear in the result. Most people assume WHERE columns have to be in SELECT. They don't.
SELECT with Aggregation: Replace Your SUMIFS
Say your transactions tab has 5 columns: date, department, category, vendor, amount. You want a summary by department and category — the kind of thing you'd normally build with 20 SUMIFS across a matrix, each hardcoded to a specific department-category pair.
=QUERY(transactions!A:E,
"SELECT B, C, SUM(E)
WHERE A >= date '2026-01-01'
GROUP BY B, C
ORDER BY B, SUM(E) DESC
LABEL SUM(E) 'Total'")
That single formula builds the entire summary. New departments appear automatically. New categories appear automatically. The date filter excludes prior-period data. The LABEL clause at the end renames the aggregated column from QUERY's default ("sum amount") to something readable.
On a representative 10,000-row transaction log, this approach runs in about 600ms — compared to roughly 1,100ms for an equivalent matrix of SUMIFS formulas scanning the same range. The gap widens with row count.
For a P&L pull from a detail tab: add WHERE D = 'COGS' to isolate cost lines, or remove the category column from GROUP BY to collapse to department-only. One formula, multiple views.
PIVOT: Cross-Tabs Without Rebuilding Them
QUERY's PIVOT keyword creates a cross-tab that updates when source data changes, with no manual column additions required.
=QUERY(transactions!A:E,
"SELECT B, SUM(E)
WHERE A >= date '2026-01-01'
GROUP BY B
PIVOT C")
Rows are departments. Columns are whatever distinct values appear in column C — dynamically. You don't specify what those columns are. QUERY reads the data and creates one column per unique value. On an expense dataset that adds new categories over time, this is the only approach that doesn't require manual column additions every quarter.
One genuine limitation: PIVOT column headers are generated from data values, so they're awkward to reference in downstream formulas. If you need the output to feed further calculations, GROUP BY only is the safer path — handle the cross-tab presentation separately.
What Happens When Your Data Is Dirty
Here's what most QUERY tutorials won't tell you: SELECT works cleanly on clean data. Real data breaks it in specific, hard-to-diagnose ways.
Type inference is silent. QUERY scans the first portion of a column and decides its type. If your amount column has 4,997 numbers and 3 cells where someone typed "N/A" or "pending", QUERY may classify the entire column as a string. When that happens, SUM(E) returns blank — no error, no warning. You only notice when the summary doesn't reconcile.
The fix: a helper column with =IFERROR(VALUE(E2), 0), which converts "N/A" to 0 and leaves numbers alone. Run QUERY against the helper, not the raw column.
Blank rows cause the same problem. ERP and database exports frequently include intermittent blank rows (common in SAP, Oracle, and most CSV exports from legacy systems). These interrupt QUERY's type-detection scan. Adding WHERE E IS NOT NULL before other conditions filters them out. Defining your range as full columns (A:E) rather than a fixed range like A1:E5000 also helps — QUERY handles sparse data better on open-ended ranges.
Mixed date formats are the worst offender. QUERY's WHERE clause on dates requires date 'YYYY-MM-DD' format. If the date column has some values stored as text strings and some as actual date values, the filter silently returns nothing. Test with =TYPE(A2): 1 means number (date values are stored as numbers in Sheets, so this is what you want), 2 means text (broken). Text dates need =DATEVALUE(A2) in a helper column before QUERY can filter on them.
The failure mode throughout is silence — empty results, not error messages. Before trusting a QUERY formula on a real dataset, run SELECT COUNT(a) first. If the count doesn't match your expected row count, something is wrong with type inference before you've touched the real query.
Dynamic SELECT: Dropdown-Driven Reports
The most useful QUERY pattern for reporting is one that rarely gets documented: building the SELECT string from a cell reference.
=QUERY(data!A:F,
"SELECT " & B1 & " WHERE A >= date '2026-01-01'")
If B1 contains "a, b, d", the query returns columns 1, 2, and 4. Change B1 to "a, c, f" and the output changes instantly.
Pair this with a data validation dropdown — a cell where the user picks "Revenue View" or "Cost View" — and a VLOOKUP that maps each option to a column string, and you have a single output range that reconfigures based on user input. No Apps Script, no custom menus.
The practical version: if different stakeholders want different column subsets (finance wants margins, operations wants headcount, executives want KPIs), this pattern serves them all from one data tab without maintaining 3 separate output ranges. Column insertions still break it — the fundamental limitation of lettered column addressing — but for stable schemas it's the cleanest solution available inside a formula.
QUERY vs FILTER vs VLOOKUP
| Task | Best tool | Approx. time on 10,000 rows |
|---|---|---|
| Return rows matching a condition | QUERY or FILTER | ~400ms (similar) |
| Summarize by group (replace SUMIFS matrix) | QUERY GROUP BY | ~600ms vs ~1,100ms for equivalent SUMIFS |
| Cross-tab output | QUERY PIVOT | ~900ms |
| Single-value key lookup | VLOOKUP / INDEX-MATCH | ~80ms |
| Dynamic column selection | QUERY with string concat | Same as base query |
| Regex / fuzzy row filtering | QUERY (MATCHES keyword) | ~700ms |
VLOOKUP wins on single-value lookups by a wide margin — it's faster and the formula is easier to audit. FILTER is better than QUERY for conditional row selection when you don't need aggregation, because the syntax is more transparent and it handles array conditions natively. QUERY takes over when you need GROUP BY, PIVOT, or aggregated column selection. Trying to use QUERY for a simple single-value lookup is like taking the highway to the corner store.
A Real Revenue Pull: Stripe Export → Monthly ARR
The pattern above becomes concrete here. You have a tab called stripe_export with columns: date, customer_id, mrr_usd, plan, status. You want a monthly summary by plan on a separate reporting tab.
=QUERY(stripe_export!A:E,
"SELECT
YEAR(A), MONTH(A), D,
SUM(C)
WHERE E = 'active'
GROUP BY YEAR(A), MONTH(A), D
ORDER BY YEAR(A), MONTH(A), D
LABEL YEAR(A) 'Year', MONTH(A) 'Month', D 'Plan', SUM(C) 'MRR'")
This produces a 4-column table: year, month, plan, MRR. Every time the Stripe export tab updates, this formula recalculates. No additional formulas, no SUMIFS matrix to extend when you add a new plan tier.
The catch: if mrr_usd has even one non-numeric value — an export script that wrote "null" or "—" instead of 0 — SUM returns blank for the entire column. Wrap the source: =IFERROR(VALUE(C2), 0) in a helper column, then point QUERY at the helper.
As of April 2026, QUERY's YEAR() and MONTH() functions work on date-formatted cells but not on text strings that look like dates. If your export writes dates as text ("2026-01-15" rather than a Sheets date value), add a helper column with =DATEVALUE(A2) and reference that instead.
When the column lettering on a wide range gets confusing — which it does fast on a 15-column export — ModelMonkey writes the QUERY string for you. You describe the summary you need and it generates the formula, handles the column mapping, and spots the type issues before they produce silent blanks. Try ModelMonkey free for 14 days — it works in both Google Sheets and Excel.