Integrations5 min read

SQL Queries with DuckDB

Query your spreadsheet data and external databases using SQL

Overview

ModelMonkey includes a built-in SQL engine powered by DuckDB, allowing you to run SQL queries directly on your spreadsheet data. You can also query external databases (PostgreSQL, BigQuery) and even join data across multiple sources.

This is useful when you need precise control over data transformations, want to perform complex aggregations, or need to combine data from different sources in ways that natural language might not express clearly.

Tip

You don't need to write SQL yourself. Just describe what you want in plain English, and ModelMonkey will generate and execute the appropriate SQL query for you.

When SQL Queries Are Used

ModelMonkey automatically uses SQL queries when your request involves:

Complex Aggregations
Grouping data, calculating sums, averages, counts, or other statistical operations across many rows.

Filtering and Sorting
Finding specific records based on multiple conditions, or sorting large datasets by various criteria.

Joining Data
Combining data from your spreadsheet with external databases, or merging multiple sheets together.

Data Transformations
Reformatting, calculating new columns, or restructuring data in ways that require row-by-row processing.

You can also explicitly ask ModelMonkey to "use SQL" or "run a SQL query" if you prefer the precision of SQL for a particular operation.

Querying Spreadsheet Data

Your spreadsheet data is accessible through the sheet_range() function. This function takes an A1 notation reference and returns the data as a queryable table.

Important Requirements

The first row of your range must contain column headers
Column names are derived from the header row
Empty headers result in columns named like "column_1", "column_2", etc.

Supported Range Formats

Full sheet: sheet_range('Sheet1!A:Z')
Specific range: sheet_range('Sales!A1:D100')
Named ranges with special characters are automatically escaped

text

-- Query all data from a sheet
SELECT * FROM sheet_range('Sheet1!A1:D100')

-- Filter and aggregate
SELECT category, SUM(amount) as total
FROM sheet_range('Sales!A1:E500')
WHERE date >= '2024-01-01'
GROUP BY category

-- Sort by multiple columns
SELECT * FROM sheet_range('Customers!A:F')
ORDER BY region ASC, revenue DESC

Example spreadsheet queries

Querying External Databases

If you have external data connections configured, you can query them using their connection name:

PostgreSQL
Reference tables as connection_name.schema.table. The default schema is usually "public".

BigQuery
Reference tables as connection_name.dataset.table.

External database queries work exactly like standard SQL. You can use all the features supported by DuckDB, including joins, subqueries, window functions, and CTEs.

Tip

Set up your external connections first in the External Data Connections panel before querying them.

text

-- Query a PostgreSQL table
SELECT * FROM my_postgres.public.customers
WHERE country = 'USA'

-- Query a BigQuery table
SELECT event_date, COUNT(*) as events
FROM my_bigquery.analytics.page_views
GROUP BY event_date

-- Join PostgreSQL with BigQuery
SELECT p.name, b.total_events
FROM my_postgres.public.products p
JOIN my_bigquery.analytics.product_stats b
  ON p.id = b.product_id

Example external database queries

Joining Spreadsheet and Database Data

One of the most powerful features is the ability to join data across different sources. You can combine spreadsheet data with external databases in a single query.

Common Use Cases

Enrich spreadsheet records with customer data from your database
Compare local tracking data against your data warehouse
Validate spreadsheet entries against external reference tables
Combine sales data from sheets with product catalog from database

text

-- Join spreadsheet orders with database customers
SELECT
  s.order_id,
  s.order_date,
  s.amount,
  c.customer_name,
  c.email,
  c.tier
FROM sheet_range('Orders!A1:D500') s
JOIN my_postgres.public.customers c
  ON s.customer_id = c.id
WHERE s.amount > 1000

-- Cross-database aggregation
SELECT
  c.country,
  COUNT(*) as order_count,
  SUM(s.amount) as total_sales
FROM sheet_range('Sales!A1:C1000') s
JOIN my_postgres.public.customers c
  ON s.customer_id = c.id
GROUP BY c.country
ORDER BY total_sales DESC

Example cross-source joins

Query Output Options

Query results can be delivered in two ways:

Inline Results (Default)
Results are returned directly to ModelMonkey for analysis and reasoning. This is useful for exploratory queries where you want to understand the data before deciding what to do with it.

Returns up to 20 rows by default (configurable up to 100)
Includes column types and metadata
Ideal for quick data exploration

Sheet Output
Results are written directly to a new sheet in your spreadsheet. This is useful when you want to keep the query results for further work.

Creates a new sheet with your specified name
Includes all rows (no row limit)
Header row added automatically
Perfect for creating reports or staging data

Tip

Ask ModelMonkey to "save the results to a new sheet" or "write the output to a sheet called Results" to use sheet output mode.

Data Type Handling

ModelMonkey automatically infers column types from your spreadsheet data:

Supported Types

Numbers: Integers and decimals are detected and treated as numeric types
Booleans: TRUE/FALSE values are recognized as booleans
Text: Everything else is treated as text (VARCHAR)
Dates: Date values in cells are typically handled as text; use SQL date functions to parse them

Type Inference Process
ModelMonkey samples the first several rows of data to determine the appropriate type for each column. If a column contains mixed types, it defaults to text to avoid data loss.

text

-- Parse date strings in queries
SELECT
  order_id,
  strptime(order_date, '%Y-%m-%d') as parsed_date,
  amount
FROM sheet_range('Orders!A1:C100')
WHERE strptime(order_date, '%Y-%m-%d') >= DATE '2024-01-01'

-- Cast numeric strings
SELECT
  product_name,
  CAST(price AS DOUBLE) as numeric_price
FROM sheet_range('Products!A1:B50')

Working with data types

Performance Considerations

Query Execution

Queries have a 30-second timeout
Maximum result set is 16,384 rows for safety
Large spreadsheet ranges are loaded into memory for querying

Best Practices

Specify precise ranges instead of entire columns (e.g., A1:D500 instead of A:D)
Use WHERE clauses to filter early and reduce data processed
For very large datasets, consider using sheet output to avoid memory issues
Index large external database tables for faster joins

Caching
Query results are not cached between requests. Each query re-reads the current spreadsheet data, ensuring you always get the latest values.

Warning

Very large ranges (tens of thousands of rows) may slow down query execution. Consider filtering data or using external databases for large-scale analytics.

Supported SQL Features

ModelMonkey supports a wide range of SQL features through DuckDB:

Standard Operations

SELECT, FROM, WHERE, ORDER BY, LIMIT
GROUP BY with aggregate functions (SUM, COUNT, AVG, MIN, MAX, etc.)
DISTINCT and HAVING clauses
UNION, INTERSECT, EXCEPT

Joins

INNER JOIN, LEFT JOIN, RIGHT JOIN, FULL OUTER JOIN
Cross joins and self joins
Multiple table joins

Advanced Features

Subqueries and CTEs (WITH clauses)
Window functions (ROW_NUMBER, RANK, LAG, LEAD, etc.)
CASE expressions
String functions, date functions, mathematical functions

Not Supported

INSERT, UPDATE, DELETE (queries are read-only)
CREATE TABLE, ALTER TABLE (no schema modifications)
Stored procedures or user-defined functions

Example Queries

Here are some practical examples of SQL queries you might use:

text

-- Monthly sales summary
SELECT
  strftime(strptime(date, '%Y-%m-%d'), '%Y-%m') as month,
  COUNT(*) as num_orders,
  SUM(amount) as total_revenue,
  AVG(amount) as avg_order_value
FROM sheet_range('Orders!A1:D1000')
GROUP BY month
ORDER BY month DESC

-- Find duplicate entries
SELECT email, COUNT(*) as count
FROM sheet_range('Contacts!A1:E500')
GROUP BY email
HAVING COUNT(*) > 1

-- Top customers by spending
SELECT
  customer_id,
  SUM(amount) as total_spent,
  COUNT(*) as num_orders
FROM sheet_range('Orders!A1:D1000')
GROUP BY customer_id
ORDER BY total_spent DESC
LIMIT 10

-- Running total with window function
SELECT
  date,
  amount,
  SUM(amount) OVER (ORDER BY date) as running_total
FROM sheet_range('Transactions!A1:C100')

Practical SQL query examples

Troubleshooting

"Column not found" errors

Check that your header row contains the expected column names
Column names are case-sensitive
Spaces in column names work but may need quoting in complex queries

"Invalid range" errors

Verify the sheet name is spelled correctly
Ensure the range exists and contains data
Check for special characters in sheet names that need escaping

"Query timeout" errors

Simplify your query or reduce the data range
Add WHERE clauses to filter data earlier
Consider using sheet output for large result sets

"Connection not found" errors

Verify the external connection name is correct
Check that the connection is active in the External Data Connections panel
Ensure you have permission to access the external database

Empty or unexpected results

Check data types match your WHERE conditions
Verify date formats match what's in your spreadsheet
Look for leading/trailing spaces in text data

SQL Queries with DuckDB

Overview

When SQL Queries Are Used

Querying Spreadsheet Data

Querying External Databases

Joining Spreadsheet and Database Data

Query Output Options

Data Type Handling

Performance Considerations

Supported SQL Features

Example Queries

Troubleshooting

Related Documentation

External Data Connections

Spreadsheet Editing