Method 1: GA4 BigQuery Export โ PostgreSQL
Google's native BigQuery export is the cleanest data source GA4 offers. It logs every event at the hit level โ no sampling, no aggregation, no API quotas. Google's documentation calls this "unsampled, hit-level data" and notes that it includes fields unavailable through the reporting API, such as samplingMetadatas.
The free tier covers 10GB of active storage and the first 1TB of queries per month. For most mid-size businesses, that's enough to run GA4 exports indefinitely without a BigQuery bill.
Step 1: Enable the BigQuery link in GA4.
Admin โ BigQuery Links โ Link. Choose a GCP project, pick your export frequency (streaming or daily), and confirm. Daily is usually fine unless you need intraday event data.
Step 2: Pull from BigQuery into PostgreSQL using Python.
from google.cloud import bigquery
import psycopg2
import os
BQ_PROJECT = "your-gcp-project"
BQ_DATASET = "analytics_123456789" # your GA4 property ID
PG_CONN = os.environ["PG_CONN_STRING"]
def sync_ga4_events(date_str: str):
bq = bigquery.Client(project=BQ_PROJECT)
pg = psycopg2.connect(PG_CONN)
query = f"""
SELECT
event_date,
event_name,
user_pseudo_id,
geo.country AS country,
traffic_source.source AS source,
traffic_source.medium AS medium,
(SELECT value.int_value FROM UNNEST(event_params)
WHERE key = 'session_id') AS session_id,
(SELECT value.string_value FROM UNNEST(event_params)
WHERE key = 'page_location') AS page_location
FROM `{BQ_PROJECT}.{BQ_DATASET}.events_{date_str}`
WHERE event_name IN ('page_view','session_start','purchase')
"""
rows = list(bq.query(query).result())
with pg.cursor() as cur:
cur.executemany(
"""
INSERT INTO ga4_events_20260418
(event_date, event_name, user_pseudo_id, country,
source, medium, session_id, page_location)
VALUES (%s, %s, %s, %s, %s, %s, %s, %s)
ON CONFLICT DO NOTHING
""",
[(r.event_date, r.event_name, r.user_pseudo_id, r.country,
r.source, r.medium, r.session_id, r.page_location)
for r in rows]
)
pg.commit()
pg.close()
Step 3: Create the target table in PostgreSQL.
CREATE TABLE ga4_events_20260418 (
id BIGSERIAL PRIMARY KEY,
event_date DATE NOT NULL,
event_name TEXT NOT NULL,
user_pseudo_id TEXT,
country TEXT,
source TEXT,
medium TEXT,
session_id BIGINT,
page_location TEXT,
inserted_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX idx_ga4_events_date ON ga4_events_20260418 (event_date);
CREATE INDEX idx_ga4_events_source ON ga4_events_20260418 (source, medium);
Once loaded, a query like this gives you channel-level CAC by joining your CRM data:
SELECT
g.source,
g.medium,
COUNT(DISTINCT g.user_pseudo_id) AS sessions,
SUM(o.order_value) AS revenue,
SUM(o.order_value)
/ NULLIF(COUNT(DISTINCT g.user_pseudo_id), 0) AS revenue_per_session
FROM ga4_events_20260418 g
LEFT JOIN crm_orders o
ON g.user_pseudo_id = o.ga_client_id
AND g.event_date BETWEEN '2026-01-01' AND '2026-03-31'
WHERE g.event_name = 'session_start'
GROUP BY 1, 2
ORDER BY revenue DESC;
Method 2: GA4 Data API โ PostgreSQL
The Data API is the right tool when you don't have BigQuery set up, or when you only need aggregated metrics โ sessions, conversions, revenue by channel โ rather than raw hit-level events.
According to Google's GA4 Data API quota documentation (as of April 2026), the free tier allows 200,000 requests per project per day. For a daily sync job that pulls 30-day rolling data, you'll use fewer than 50 requests.
The tradeoff: aggregated data is sampled above certain thresholds, and you lose access to raw event parameters. For board-pack-level reporting โ $4.2M ARR, 38.5% gross margin, CAC by channel โ that's usually fine. For funnel analysis or cohort work, reach for BigQuery instead.
from google.analytics.data_v1beta import BetaAnalyticsDataClient
from google.analytics.data_v1beta.types import (
RunReportRequest, DateRange, Dimension, Metric
)
import psycopg2, os
GA4_PROPERTY_ID = "properties/123456789"
PG_CONN = os.environ["PG_CONN_STRING"]
def sync_channel_report(start_date: str, end_date: str):
client = BetaAnalyticsDataClient()
request = RunReportRequest(
property=GA4_PROPERTY_ID,
date_ranges=[DateRange(start_date=start_date, end_date=end_date)],
dimensions=[
Dimension(name="sessionDefaultChannelGrouping"),
Dimension(name="date"),
],
metrics=[
Metric(name="sessions"),
Metric(name="totalRevenue"),
Metric(name="conversions"),
],
)
response = client.run_report(request)
pg = psycopg2.connect(PG_CONN)
with pg.cursor() as cur:
for row in response.rows:
channel = row.dimension_values[0].value
date = row.dimension_values[1].value
sessions = int(row.metric_values[0].value)
revenue = float(row.metric_values[1].value)
conversions = int(row.metric_values[2].value)
cur.execute(
"""
INSERT INTO ga4_channel_daily
(report_date, channel, sessions, revenue, conversions)
VALUES (%s, %s, %s, %s, %s)
ON CONFLICT (report_date, channel)
DO UPDATE SET
sessions = EXCLUDED.sessions,
revenue = EXCLUDED.revenue,
conversions = EXCLUDED.conversions,
updated_at = NOW()
""",
(date, channel, sessions, revenue, conversions)
)
pg.commit()
pg.close()
The 8-12 second sync time for a 90-day rolling report makes this fast enough for a nightly cron job. Run it at 02:00 UTC after GA4 closes the previous day's data.
Method 3: Managed ETL Connector (Airbyte / Fivetran)
If you'd rather not babysit a Python script, both Airbyte and Fivetran maintain GA4 โ PostgreSQL connectors that handle incremental syncs, schema evolution, and API version changes.
Airbyte's open-source version is free to self-host; their cloud tier starts at $0.50 per 1,000 monthly active rows (MAR), which for a mid-size e-commerce site typically runs $8โ40/month. As of April 2026, Airbyte's GA4 connector supports both the Admin API and the Data API source, with configurable sync frequency down to 1 hour.
Fivetran's GA4 connector lands in the same pricing ballpark but adds automatic type-casting and dbt-compatible schema normalization out of the box โ worth it if your downstream models are already in dbt.
The setup is essentially: authorize GA4, point at your PostgreSQL host, pick your sync window. No code unless you want custom transformations.
Comparison: Three Methods Side by Side
| Method | Cost | Setup Time | Data Granularity | Maintenance |
|---|---|---|---|---|
| BigQuery Export | Free (within GCP free tier) | 2-4 hours | Hit-level, unsampled | Low โ script runs on cron |
| GA4 Data API | Free (200K req/day) | 1-2 hours | Aggregated, sampled | Low โ one Python file |
| Managed ETL (Airbyte/Fivetran) | $8โ40/month (cloud) or self-hosted | 30-60 min | Aggregated or hit-level (connector-dependent) | Near-zero โ vendor manages |
The BigQuery route wins on data quality. The managed ETL wins on time-to-value. The Data API wins if you need aggregated channel metrics and want to avoid third-party dependencies.
Querying the Data for FP&A Work
Once GA4 events are in PostgreSQL, you can join them against your CRM, billing system, or ad spend tables without any API rate limits or sampling concerns.
A CAC analysis across channels:
WITH ga4_sessions AS (
SELECT
source,
medium,
DATE_TRUNC('month', event_date) AS month,
COUNT(DISTINCT user_pseudo_id) AS unique_users
FROM ga4_events_20260418
WHERE event_name = 'session_start'
AND event_date >= '2026-01-01'
GROUP BY 1, 2, 3
),
ad_spend AS (
SELECT
source,
medium,
DATE_TRUNC('month', spend_date) AS month,
SUM(spend_usd) AS total_spend
FROM marketing_spend
GROUP BY 1, 2, 3
)
SELECT
g.source,
g.medium,
g.month,
g.unique_users,
a.total_spend,
a.total_spend / NULLIF(g.unique_users, 0) AS cac
FROM ga4_sessions g
LEFT JOIN ad_spend a
USING (source, medium, month)
ORDER BY cac ASC NULLS LAST;
For a company doing $4.2M ARR with 38.5% gross margin, a 6.6x spread in CAC across channels ($312 paid social vs. $47 organic vs. $28 referral) is exactly the kind of insight that doesn't survive aggregation in GA4's native reports. The raw PostgreSQL join surfaces it in seconds.
If you're pulling that data into Google Sheets for your board pack, ModelMonkey connects Google Sheets directly to PostgreSQL โ so the query above runs live in a cell, no export required.