Engineering

Building Multi-Tenant Analytics on Cloudflare's Edge

Analytics Engine, Durable Objects, R2, D1, and Cloudflare for SaaS: the infrastructure behind ClickStream.

March 2026 • 20 min read

Why Cloudflare's Edge

ClickStream processes behavioral data in under 3ms. That's not a rounding error -- it's a hard requirement. Behavioral scores need to be available in the same page load to enable real-time personalization and intervention. Round-tripping to an origin server in us-east-1 adds 50-200ms of latency. Edge computing eliminates that entirely.

Cloudflare's developer platform provides the building blocks: Workers for compute, Analytics Engine for time-series storage, Durable Objects for stateful coordination, KV for configuration, R2 for object storage, and D1 for relational data. Each has tradeoffs. This post covers how we use each one and the gotchas we've encountered.

Analytics Engine: The Write Path

Analytics Engine (AE) is Cloudflare's purpose-built time-series analytics store. It's designed for high-volume write ingestion with SQL-based querying. ClickStream uses AE as the primary write path for all behavioral events.

The Schema Constraint: 20 Fields

Analytics Engine has a hard schema limit: 20 blob fields (strings, up to 256 bytes each) and 20 double fields (64-bit floats). That's it. No arrays, no nested objects, no variable-length schemas.

For an analytics platform that tracks 26 behavioral scores plus identity signals, UTM parameters, device info, and page metadata, 20 fields is not enough -- unless you get creative.

Pipe-Delimited Encoding

ClickStream packs multiple values into single blob fields using pipe-delimited encoding:

// Analytics Engine schema: ClickStream event
{
    // Blob fields (20 max, 256 bytes each)
    blob1: siteId,                                    // "site_abc123"
    blob2: visitorId,                                 // "v_lq8x2f3k_abc123def"
    blob3: sessionId,                                 // "s_m1n2o3p4"
    blob4: eventType,                                 // "page_view"
    blob5: pageUrl,                                   // "/pricing"
    blob6: referrer,                                  // "https://google.com/search?q=..."
    blob7: identitySignals,                           // "cookie:v_abc|hem:b4c9|gclid:EAI..."
    blob8: utmParams,                                 // "google|cpc|brand|identity|ad1"
    blob9: deviceInfo,                                // "desktop|Chrome|macOS|1920x1080"
    blob10: geoInfo,                                  // "US|CA|San Francisco|37.77|-122.42"

    // Double fields (20 max, 64-bit float)
    double1: intentScore,                             // 73.0
    double2: engagementScore,                         // 85.0
    double3: frustrationScore,                        // 12.0
    double4: purchaseTimingScore,                     // 61.0
    double5: churnRiskScore,                          // 8.0
    double6: contentAffinityScore,                    // 44.0
    double7: velocityScore,                           // 78.0
    double8: loyaltyScore,                            // 55.0
    double9: conversionProbScore,                     // 67.0
    double10: sessionQualityScore,                    // 72.0
    double11: attentionScore,                         // 80.0
    double12: navEfficiencyScore,                     // 63.0
    double13: priceSensitivityScore,                  // 34.0
    double14: socialProofRespScore,                   // 47.0
    double15: botProbabilityScore,                    // 2.0
    double16: channelAttributionScore,                // 88.0
    double17: scrollDepthPercent,                     // 75.5
    double18: timeOnPageMs,                           // 45230.0
    double19: clickCount,                             // 8.0
    double20: timestamp                               // 1741612800000.0
}

SQL Gotchas

Querying pipe-delimited fields in AE's SQL requires string functions. Some gotchas we've learned:

-- Extracting UTM source from pipe-delimited blob8
-- Format: "source|medium|campaign|term|content"
SELECT
    blob1 AS site_id,
    -- Extract first segment (utm_source)
    SUBSTRING(blob8, 1, POSITION('|' IN blob8) - 1) AS utm_source,
    COUNT(*) AS events,
    AVG(double1) AS avg_intent_score
FROM clickstream_events
WHERE
    blob1 = 'site_abc123'
    AND timestamp >= NOW() - INTERVAL '7' DAY
GROUP BY blob1, utm_source
ORDER BY events DESC

-- GOTCHA: Analytics Engine SQL is NOT full SQL.
-- No JOINs. No subqueries. No CTEs.
-- Aggregations only: COUNT, SUM, AVG, MIN, MAX, QUANTILE
-- String functions are limited: SUBSTRING, POSITION, LENGTH

Analytics Engine is a write-optimized store with read-side limitations. It handles millions of writes per second with zero configuration. But if you need JOINs or complex queries, you need to export to a real analytical database.

Multi-Tenant Isolation

ClickStream is a multi-tenant platform. Each customer's data must be isolated. In Analytics Engine, tenant isolation is achieved through the blob1 field (site ID) as a mandatory filter on every query.

// Worker: enforce tenant isolation on every AE query
async function queryAnalytics(siteId, query) {
    // CRITICAL: Always include site_id filter
    // This prevents cross-tenant data leakage
    const safeQuery = query.replace(
        'FROM clickstream_events',
        `FROM clickstream_events WHERE blob1 = '${sanitize(siteId)}'`
    );

    return await env.ANALYTICS_ENGINE.query(safeQuery);
}

At the API layer, every request is authenticated with a site-specific API key. The API extracts the site ID from the key and injects it into every query. There's no way for a customer to query another customer's data because the site ID filter is enforced server-side, not client-side.

Durable Objects: Real-Time Streaming

Durable Objects (DOs) are Cloudflare's solution for stateful edge compute. Each DO instance is a single-threaded JavaScript actor with persistent storage, automatically colocated with its most recent caller.

ClickStream uses Durable Objects for two things: real-time streaming dashboards and session aggregation.

Real-Time Dashboard Streaming

// Durable Object: Real-time event stream per site
export class SiteEventStream {
    constructor(state, env) {
        this.state = state;
        this.sessions = new Map(); // WebSocket connections
    }

    async fetch(request) {
        if (request.headers.get('Upgrade') === 'websocket') {
            const pair = new WebSocketPair();
            const [client, server] = Object.values(pair);

            this.state.acceptWebSocket(server);
            this.sessions.set(server, { connectedAt: Date.now() });

            return new Response(null, { status: 101, webSocket: client });
        }

        // Incoming event from the tracking worker
        if (request.method === 'POST') {
            const event = await request.json();

            // Broadcast to all connected dashboard clients
            for (const ws of this.state.getWebSockets()) {
                try {
                    ws.send(JSON.stringify(event));
                } catch (e) {
                    // Client disconnected, will be cleaned up
                }
            }

            return new Response('OK');
        }
    }

    // Hibernation API: DO sleeps when no connections
    async webSocketClose(ws) {
        this.sessions.delete(ws);
    }
}

The Hibernation API

Durable Objects are billed for active wall-clock time. Without hibernation, a DO serving real-time dashboards would be billed 24/7 even when no events are flowing. The Hibernation API lets the DO sleep between events, waking only when a WebSocket message arrives or a new connection is made.

This reduces DO costs from ~$30/month per always-on instance to pennies per actual usage. For a multi-tenant platform with thousands of sites, hibernation is the difference between viable and prohibitively expensive.

KV: Configuration Caching

Workers KV stores site configuration: which behavioral models are enabled, consent settings, SDK version, custom event definitions, and API keys.

// KV structure for site config
// Key: "config:site_abc123"
// Value:
{
    "site_id": "site_abc123",
    "domain": "example.com",
    "cname": "t.example.com",
    "models_enabled": [
        "intent", "engagement", "frustration", "purchase_timing",
        "churn_risk", "content_affinity", "velocity", "loyalty",
        "conversion_prob", "session_quality", "attention",
        "nav_efficiency", "price_sensitivity", "social_proof",
        "bot_probability", "channel_attribution"
    ],
    "consent_required": true,
    "cookie_max_age": 31536000,
    "sdk_version": "1.4.2",
    "parquet_export_enabled": true,
    "r2_namespace": "site-abc123"
}

The Consistency Model

KV is eventually consistent with a propagation delay of up to 60 seconds globally. This is fine for configuration (you don't need instant config updates) but would be terrible for real-time event data. That's why events go to Analytics Engine (immediate write availability) and not KV.

KV is a read-optimized store with weak write consistency. Use it for data that changes rarely and is read on every request (config, API keys, feature flags). Never use it for event streams or counters.

R2: The Data Lake

R2 is Cloudflare's S3-compatible object storage with zero egress fees. ClickStream uses R2 as the managed data lake, exporting behavioral data in Parquet format on a configurable schedule.

Parquet Export Pipeline

// Scheduled worker: Export AE data to R2 as Parquet
export default {
    async scheduled(event, env) {
        // Query last hour of events from Analytics Engine
        const events = await env.ANALYTICS_ENGINE.query(`
            SELECT *
            FROM clickstream_events
            WHERE timestamp >= NOW() - INTERVAL '1' HOUR
        `);

        // Group by site_id for multi-tenant isolation
        const bySite = groupBySiteId(events);

        for (const [siteId, siteEvents] of Object.entries(bySite)) {
            // Convert to Parquet using parquet-wasm
            const parquetBuffer = await toParquet(siteEvents);

            // Write to ClickStream's R2 data lake
            const key = `${siteId}/${formatDate()}/events_${Date.now()}.parquet`;
            await env.R2_DATA_LAKE.put(key, parquetBuffer, {
                httpMetadata: { contentType: 'application/octet-stream' },
                customMetadata: {
                    siteId,
                    eventCount: String(siteEvents.length),
                    exportedAt: new Date().toISOString()
                }
            });
        }
    }
};

D1: Relational Data

D1 is Cloudflare's SQLite-at-the-edge database. ClickStream uses D1 for relational data that doesn't fit AE or KV: customer accounts, billing records, site registrations, API key management, and schema migrations.

Migration Strategy

-- Migration: 001_create_sites.sql
CREATE TABLE IF NOT EXISTS sites (
    id TEXT PRIMARY KEY,
    domain TEXT NOT NULL UNIQUE,
    cname TEXT,
    owner_email TEXT NOT NULL,
    plan TEXT DEFAULT 'free',
    created_at TEXT DEFAULT (datetime('now')),
    updated_at TEXT DEFAULT (datetime('now'))
);

CREATE TABLE IF NOT EXISTS api_keys (
    key_hash TEXT PRIMARY KEY,
    site_id TEXT NOT NULL REFERENCES sites(id),
    name TEXT,
    permissions TEXT DEFAULT 'read',
    created_at TEXT DEFAULT (datetime('now')),
    last_used_at TEXT,
    FOREIGN KEY (site_id) REFERENCES sites(id)
);

CREATE INDEX idx_api_keys_site ON api_keys(site_id);

Local vs. Remote D1

D1's local development experience uses a local SQLite file. Remote D1 runs on Cloudflare's infrastructure. The schemas are the same, but the data is separate. Migrations must be applied to both:

# Apply migration locally
wrangler d1 execute clickstream-db --local --file=migrations/001_create_sites.sql

# Apply migration to production
wrangler d1 execute clickstream-db --remote --file=migrations/001_create_sites.sql

# GOTCHA: --local and --remote are COMPLETELY separate databases
# There is no automatic sync between them

Cloudflare for SaaS: Custom Hostnames

Cloudflare for SaaS (CF4SaaS) is the infrastructure that makes the CNAME cookie architecture work. It allows ClickStream to provision SSL certificates for customer subdomains automatically.

How Custom Hostnames Work

  1. Customer creates CNAME: t.example.comedge.clickstream.com
  2. ClickStream API calls CF4SaaS to register t.example.com as a custom hostname
  3. Cloudflare automatically provisions a DCV (Domain Control Validation) and SSL certificate
  4. Traffic to t.example.com hits ClickStream's Worker, which reads the Host header to identify the customer
// Worker: Route requests by Host header
export default {
    async fetch(request, env) {
        const host = request.headers.get('Host');
        // host = "t.example.com" (customer's CNAME subdomain)

        // Look up site config by custom hostname
        const siteConfig = await env.KV.get(`hostname:${host}`, 'json');

        if (!siteConfig) {
            return new Response('Unknown hostname', { status: 404 });
        }

        // Now we know which customer this request belongs to
        const siteId = siteConfig.site_id;

        // Handle tracking request, set cookie, etc.
        return handleTrackingRequest(request, siteId, siteConfig, env);
    }
};

The Host Header Pattern

The Host header is the key to multi-tenancy with custom hostnames. When t.example.com CNAMEs to edge.clickstream.com, the HTTP request arrives at ClickStream's Worker with Host: t.example.com. This lets the Worker identify the customer without any path prefix or query parameter convention.

This is cleaner than alternatives like edge.clickstream.com/site/abc123 (which leaks the site ID in URLs) or abc123.edge.clickstream.com (which requires wildcard DNS).

Edge Caching Strategy

Not everything needs to be computed fresh on every request. ClickStream caches aggressively at the edge:

Resource Cache Location TTL Invalidation
SDK JavaScript Cloudflare CDN 1 hour (+ stale-while-revalidate) Cache purge on deploy
Site configuration Worker KV 60 seconds (KV propagation) KV write
Behavioral model weights Worker global scope Worker lifetime (~5 min) Worker restart
Visitor identity First-party cookie 365 days Cookie expiry or deletion
Dashboard data Cache API (per-PoP) 30 seconds TTL expiry

The Cache API (distinct from the CDN cache) lets us cache AE query results at each Cloudflare PoP. A dashboard that refreshes every 30 seconds only hits AE once per 30 seconds per PoP, not once per viewer. For a dashboard with 50 concurrent viewers across 20 PoPs, that's 20 AE queries/30s instead of 50.

Architecture Summary

Here's how all the pieces fit together:

Component Cloudflare Service Purpose
Event ingestion Workers Receive tracking events, compute behavioral scores
Event storage Analytics Engine Time-series store for all events and scores
Real-time streaming Durable Objects WebSocket-based live dashboards
Configuration KV Site config, API keys, feature flags
Data lake R2 Parquet data lake with customer export capability
Relational data D1 Accounts, billing, site registration
Custom hostnames CF for SaaS CNAME cookie architecture, SSL provisioning
Edge caching Cache API + CDN SDK delivery, query result caching

Every component runs at the edge. There is no origin server. There is no us-east-1. A visitor in Tokyo hits a Cloudflare PoP in Tokyo, their behavioral scores are computed in Tokyo, and the event is written from Tokyo. The 3ms latency target is achievable precisely because there's no cross-continent round-trip.

The edge isn't where you cache things. The edge is where you compute things. When your entire analytics pipeline runs at the edge, latency becomes a rounding error.

Enterprise-Grade Analytics Without Enterprise-Grade Bills

Scale to millions of visitors without scaling your infrastructure costs. Edge-native architecture means you pay for compute, not idle servers.

GET EARLY ACCESS