Integrations6 min read

External Data Connections

Connect to PostgreSQL and BigQuery to query external data alongside your spreadsheets

Overview

External Data Connections let you query databases directly from ModelMonkey. Instead of manually exporting data or writing complex import scripts, you can ask ModelMonkey to pull data from your PostgreSQL or BigQuery databases and combine it with your spreadsheet data.

Once connected, you can ask questions like "Compare our Google Sheets sales data with customer demographics from PostgreSQL" or "Find all orders from last month and join them with user profiles from our database."

Tip

Your credentials are encrypted using enterprise-grade envelope encryption with Google Cloud KMS. ModelMonkey never stores plain-text passwords.

Warning

External data connections are currently available in Google Sheets only. Excel support is coming soon.

Supported Data Sources

ModelMonkey currently supports two external data sources:

PostgreSQL
Connect to any PostgreSQL database accessible over the internet. This includes self-hosted PostgreSQL, Amazon RDS, Google Cloud SQL, Azure Database for PostgreSQL, and other managed PostgreSQL services.

Google BigQuery
Connect to Google BigQuery datasets using a service account. This is ideal for large-scale analytics workloads and data warehouse queries.

Both connection types support read-only queries to protect your production data from accidental modifications.

Creating a Connection

To create a new external data connection:

  1. Open the External Data Connections panel from the ModelMonkey sidebar
  2. Click the "New Connection" button
  3. Select your data source type (PostgreSQL or BigQuery)
  4. Enter your connection details
  5. Test the connection to verify it works
  6. Save the connection

The setup wizard guides you through each step and validates your configuration before saving.

PostgreSQL Connection Setup

To connect to a PostgreSQL database, you need the following information:

Required Fields

  • Connection Name: A friendly name to identify this connection (e.g., "Production Analytics DB")
  • Host: The database server hostname or IP address
  • Port: The PostgreSQL port (default: 5432)
  • Database: The name of the database to connect to
  • Username: Your database username
  • Password: Your database password

Optional Fields

  • Schema: The default schema to use (default: "public")
  • SSL Mode: How to handle SSL connections
    • Disable: No SSL (not recommended for production)
    • Allow: Use SSL if available
    • Prefer: Try SSL first, fall back to non-SSL (default)
    • Require: Always use SSL

Warning

For security, use a read-only database user with access limited to the tables you need to query.

text
Connection Name: Production Analytics
Host: analytics-db.example.com
Port: 5432
Database: analytics
Schema: public
SSL Mode: Require
Username: readonly_user

Example PostgreSQL configuration

BigQuery Connection Setup

To connect to Google BigQuery, you need a service account with appropriate permissions:

Required Fields

  • Connection Name: A friendly name to identify this connection
  • Project ID: Your Google Cloud project ID (found in the Google Cloud Console)
  • Service Account JSON: A JSON key file for a service account with BigQuery access

Optional Fields

  • Dataset ID: A default dataset to query from
  • Location: The data location (US, EU, or a specific region)

Required IAM Roles
Your service account needs these permissions:

  • BigQuery Data Viewer: Read table metadata and data
  • BigQuery Job User: Create and execute queries
  • BigQuery Read Session User: Read data via the Storage API

Tip

Create a dedicated service account for ModelMonkey with only the permissions listed above. Never use your personal credentials or overly-permissive service accounts.

Creating a BigQuery Service Account

To create a service account for BigQuery access:

  1. Go to the Google Cloud Console (console.cloud.google.com)
  2. Navigate to IAM & Admin → Service Accounts
  3. Click "Create Service Account"
  4. Enter a name like "modelmonkey-readonly"
  5. Grant the following roles:
    • BigQuery Data Viewer
    • BigQuery Job User
    • BigQuery Read Session User
  6. Click "Done" to create the account
  7. Click on the new service account, then go to the "Keys" tab
  8. Click "Add Key" → "Create new key" → "JSON"
  9. Download the JSON file and upload it to ModelMonkey

Keep this JSON file secure. It provides access to your BigQuery data.

Warning

Never share your service account JSON file or commit it to version control. If compromised, delete the key immediately from Google Cloud Console.

Testing Connections

Before saving a connection, ModelMonkey validates that it can reach your database:

What the Test Checks

  • Network connectivity to the host
  • Authentication with your credentials
  • Permission to query the database

Test Results
A successful test shows:

  • Server version information
  • Available schemas (PostgreSQL)
  • Number of accessible tables

If the test fails, you'll see a specific error message explaining what went wrong. Common issues include:

  • Incorrect hostname or port
  • Wrong username or password
  • Firewall blocking the connection
  • Missing database permissions

Security & Credential Management

ModelMonkey uses enterprise-grade security to protect your database credentials:

Envelope Encryption
Your credentials are encrypted using a unique Data Encryption Key (DEK) generated for each connection. The DEK itself is encrypted by Google Cloud KMS before storage. This two-layer approach means your credentials are never stored in plain text.

SSRF Protection
ModelMonkey blocks connections to private IP ranges, localhost, and cloud metadata endpoints. This prevents potential security vulnerabilities even if malicious input is provided.

Secure Memory Handling
Decrypted credentials are held in secure memory buffers and automatically cleared when no longer needed.

Read-Only Access
BigQuery connections are explicitly configured as read-only. For PostgreSQL, we recommend using database users with SELECT-only permissions.

Using Your Connections

Once a connection is active, you can query it by simply asking ModelMonkey in natural language:

Example Requests

  • "Show me the top 10 customers by revenue from our PostgreSQL database"
  • "Compare our Google Sheets sales data with customer demographics from the external database"
  • "Find all orders from last month and join them with user profiles"
  • "Analyze trends in our BigQuery analytics data"

ModelMonkey automatically:

  1. Connects to your external database securely
  2. Translates your request into SQL
  3. Executes the query
  4. Returns results that can be written to your spreadsheet

You can combine data from multiple sources in a single request, letting you join spreadsheet data with external database records.

Tip

Reference your connections by name in your requests. For example, "from the Production Analytics connection" helps ModelMonkey know which database to query.

Managing Connections

Viewing Connections
All your connections appear in the External Data Connections panel. Each connection shows:

  • Connection name
  • Provider type (PostgreSQL or BigQuery)
  • Host or project information
  • Active/Inactive status
  • Creation date

Connection Status

  • Active: The connection is available for queries
  • Inactive: The connection exists but is disabled

Deleting Connections
To remove a connection:

  1. Click the menu icon (three dots) on the connection card
  2. Select "Delete"
  3. Confirm the deletion

Deleting a connection permanently removes it and its encrypted credentials. This action cannot be undone.

Troubleshooting

Connection Test Fails

  • Verify your hostname is correct and publicly accessible
  • Check that your database port is open in your firewall
  • Confirm your username and password are correct
  • For BigQuery, ensure your service account has the required roles

Queries Return Errors

  • Check that your database user has SELECT permission on the tables you're querying
  • Verify the table and schema names are correct
  • For BigQuery, ensure your service account can access the specified dataset

Connection Times Out

  • Your database may be behind a firewall that blocks ModelMonkey's servers
  • Check if your cloud provider requires IP allowlisting
  • Consider using a database proxy or VPN if direct access isn't possible

"SSRF Attempt Detected" Error

  • You cannot connect to private IP addresses or localhost
  • Use a publicly accessible hostname for your database
  • Cloud-hosted databases (RDS, Cloud SQL, etc.) typically have public endpoints available

Best Practices

Use Dedicated Read-Only Users
Create a database user specifically for ModelMonkey with only SELECT permissions. This limits the blast radius if credentials are ever compromised.

Enable SSL/TLS
Always use SSL mode "Require" for PostgreSQL connections in production. BigQuery connections are always encrypted.

Limit Table Access
Grant access only to the tables and schemas ModelMonkey needs to query. Avoid using superuser or admin accounts.

Use Descriptive Connection Names
Name your connections clearly (e.g., "Production Analytics", "Staging Orders DB") so you can easily reference them in requests.

Review Connections Regularly
Periodically audit your connections and remove any that are no longer needed. Fewer active connections means a smaller security surface.

Related Documentation