Gartner puts the average organizational cost of poor data quality at $12.9 million per year. Finance and ops analysts spend 60–80% of their time on data preparation rather than analysis, according to research published in the Harvard Business Review. Most of that prep work is manual, repetitive, and solvable with the right formula stack.
What Google Sheets Cleans Natively (and What It Doesn't)
Google added a Data Cleanup menu in 2022, accessible via Data → Data cleanup. It handles whitespace trimming and duplicate removal with one click — genuinely useful for quick wins. But it doesn't fix numbers stored as text, normalize capitalization, strip non-printing characters, or handle mixed data types. Those require formulas.
| Problem | Native Menu Fix | Formula Required |
|---|---|---|
| Leading/trailing spaces | Yes (Trim whitespace) | TRIM() |
| Duplicate rows | Yes (Remove duplicates) | UNIQUE() |
| Numbers as text | No | VALUE(), --, paste-as-values |
| Inconsistent caps | No | PROPER(), UPPER(), LOWER() |
| Non-printing characters | No | CLEAN() |
| Mixed date formats | No | DATEVALUE() + manual parsing |
As of April 2026, Google's official Sheets documentation confirms the Data Cleanup menu was introduced in 2022 alongside the MAP function — but neither addresses type coercion or encoding issues automatically.
Problem 1: Whitespace
The most common data cleaning issue, and the easiest to miss. A cell containing " Revenue " won't match "Revenue" in a VLOOKUP, SUMIF, or pivot table filter. Gartner's research on data quality consistently ranks whitespace and encoding errors as the top cause of lookup failures in spreadsheet models.
TRIM() removes leading, trailing, and duplicate internal spaces:
=TRIM(A2)
For a whole column, wrap it in ARRAYFORMULA:
=ARRAYFORMULA(TRIM(A2:A500))
Note: TRIM handles standard spaces (character 32). It won't catch non-breaking spaces (character 160), which paste in from web pages. For those, you need SUBSTITUTE combined with TRIM:
=TRIM(SUBSTITUTE(A2, CHAR(160), " "))
The article Google Sheets Trim Whitespace: Menu vs Formula covers the trade-offs between the menu option and formula approach in more detail.
Problem 2: Numbers Stored as Text
This one breaks financial models silently. A column of revenue figures imported from a CRM or exported from a database often arrives formatted as text — the cells are left-aligned, the SUM returns 0, and there's no error to alert you. In multi-sheet P&L models where a "Revenue" tab feeds into "FCFF" and "Returns Analysis," one column of text-formatted numbers cascades into wrong valuations.
The fastest fix is coercing with double-negative:
=--A2
Or use VALUE() explicitly:
=VALUE(A2)
To test whether a cell is secretly text, use:
=ISNUMBER(A2)
If that returns FALSE on what looks like a number, you have a type problem. For currency strings like "$1,250.00", strip the symbol and comma before converting:
=VALUE(SUBSTITUTE(SUBSTITUTE(A2,"$",""),",",""))
Problem 3: Duplicate Rows
UNIQUE() returns a deduplicated array — clean and non-destructive since it writes to a new range rather than modifying the source:
=UNIQUE(A2:D500)
The Data → Data cleanup → Remove duplicates menu modifies in place and prompts you to select which columns define uniqueness. For a model where row identity is defined by a composite key (say, client ID + transaction date), the menu option is easier to configure than formula logic.
For conditional deduplication — keeping the most recent record per client, for instance — you need a SORT + UNIQUE combination:
=UNIQUE(SORT(A2:D500, 3, FALSE))
This sorts by column 3 descending (newest first), then deduplicates, effectively keeping the latest record per unique row.
Problem 4: Inconsistent Capitalization
"new york", "New York", "NEW YORK" — three values that mean the same thing but won't group correctly in a pivot table. Standardize with PROPER(), UPPER(), or LOWER() depending on the target format:
=PROPER(A2) → "New York"
=UPPER(A2) → "NEW YORK"
=LOWER(A2) → "new york"
PROPER() capitalizes the first letter of each word. Watch out: it will also capitalize words after hyphens and apostrophes, so "o'brien" becomes "O'Brien" correctly but "mcdonald's" becomes "Mcdonald'S" incorrectly. For names with unusual capitalization rules, you're better off building a lookup table of exceptions.
Problem 5: Special Characters and Non-Printing Characters
CLEAN() strips non-printing characters (ASCII codes 0–31). This matters most for data exported from legacy systems, PDF-to-spreadsheet conversions, or anything that passes through an API where encoding gets mangled.
=CLEAN(A2)
Combine with TRIM for a single-pass clean:
=TRIM(CLEAN(A2))
Google's Sheets API documentation notes that imported data often retains line feed characters (CHAR(10)) that CLEAN removes — this is the formula equivalent of what PDF-to-spreadsheet tools should handle upstream but often don't.
Chaining Formulas for Multi-Problem Data
Real-world imported data rarely has just one problem. A contacts list from a CRM might have leading spaces, mixed caps, and non-printing characters simultaneously. Chain the functions:
=PROPER(TRIM(CLEAN(A2)))
For numbers with currency symbols and whitespace:
=VALUE(TRIM(SUBSTITUTE(SUBSTITUTE(CLEAN(A2),"$",""),",","")))
This handles the full stack: strip non-printing characters, remove symbols, remove commas, strip whitespace, then coerce to number. Ugly, but accurate.
For cleaning an entire column of 500+ rows, Google's MAP function (released in 2022 alongside LAMBDA) lets you apply any formula across a range without ARRAYFORMULA limitations:
=MAP(A2:A500, LAMBDA(cell, PROPER(TRIM(CLEAN(cell)))))
This is cleaner than ARRAYFORMULA for complex lambdas because each cell processes independently without the row-matching constraints that cause ARRAYFORMULA to break with multi-argument functions.
When Formulas Aren't Enough
Some cleaning problems don't fit neatly into formula logic: phone numbers in 12 different formats, addresses that need geocoding before normalization, or 50,000-row transaction logs where you need to classify descriptions into spending categories.
For those scenarios, ModelMonkey can handle the classification and transformation work conversationally — describe the cleaning logic in plain language ("normalize all phone numbers to E.164 format" or "flag rows where the date is outside Q1 2025"), and it writes the formulas or runs the transformation directly in your sheet.