Whitepaper

Cross-Device Identity Resolution

Identity graph data model, graph resolution algorithm, deterministic vs probabilistic matching, confidence scoring, and the D1 schema that powers it all.

ClickStream Research · March 2026 · 20 min read

Abstract

A single customer uses 3.6 devices on average. They browse on their phone during commute, research on a work laptop, and convert on a home desktop. Without cross-device identity resolution, each session looks like a different person — inflating unique visitor counts, breaking attribution, and fracturing behavioral profiles. This whitepaper details ClickStream's identity graph data model (nodes and edges), the graph resolution algorithm, the distinction between deterministic and probabilistic matching, the confidence scoring system, edge type definitions, the D1 database schema, and the complete identity resolution flow from anonymous visitor to unified customer profile.

What This Means for You

When you view a visitor profile in the Visitors tab at einstein.clickstream.com, you see a unified timeline across all their devices — desktop, mobile, tablet — stitched together automatically. No login is required from the visitor. The platform resolves identity using deterministic signals (cookies, hashed emails) and probabilistic matching (behavioral biometrics, IP clustering). This whitepaper explains the resolution algorithms that make it work.

Table of Contents

  1. The Cross-Device Problem
  2. Identity Graph Data Model
  3. Node Types
  4. Edge Types and Confidence
  5. Deterministic Matching
  6. Probabilistic Matching
  7. Graph Resolution Algorithm
  8. D1 Database Schema
  9. Identity Resolution Flow
  10. Cross-Site Identity (Multi-Domain)
  11. Conclusion

1. The Cross-Device Problem

The average customer journey spans multiple devices and sessions. Consider this scenario: a user sees a paid search ad on their phone, clicks through and browses for 2 minutes, leaves. Later that day, they search for the brand on their work laptop, read pricing documentation for 10 minutes, and sign up for a webinar using their work email. Two days later, they visit the site on their home desktop, log in, and complete a purchase.

Without identity resolution, analytics reports this as three separate visitors with three separate journeys. The paid search ad gets no attribution. The webinar signup is unconnected to the purchase. The behavioral profile is fragmented across three incomplete records.

Cross-device identity resolution is the process of connecting these fragmented signals into a single, unified customer profile. It is the prerequisite for accurate attribution, complete behavioral profiling, and reliable customer analytics.

2. Identity Graph Data Model

ClickStream models identity as a directed graph where nodes represent identity signals and edges represent observed relationships between signals. This graph-based approach is more flexible than a flat table because it naturally handles many-to-many relationships (one person can have multiple devices, and one device can be shared by multiple people).

2.1 Graph Structure

// Conceptual graph structure Graph { nodes: [ { id: "cookie_v_m2x7k...", type: "cookie" }, { id: "cookie_v_p8r2j...", type: "cookie" }, { id: "cookie_v_k4n9w...", type: "cookie" }, { id: "email_a1b2c3...", type: "email_hash" }, { id: "fp_x7y8z9...", type: "device signature" }, { id: "ip_192.168.1.0/24", type: "ip_cluster" } ], edges: [ { from: "cookie_v_m2x7k...", to: "email_a1b2c3...", type: "login", confidence: 0.99 }, { from: "cookie_v_p8r2j...", to: "email_a1b2c3...", type: "form_submit", confidence: 0.95 }, { from: "cookie_v_k4n9w...", to: "ip_192.168.1.0/24", type: "ip_match", confidence: 0.40 } ] }

3. Node Types

Node TypeIdentifierPersistenceUniquenessSource
CookieFirst-party visitor ID400 daysUnique per browser+deviceServer-set cookie via CNAME proxy
Email HashSHA-256 of normalized emailPermanentUnique per person (ideally)Form submission, login, checkout
Phone HashSHA-256 of normalized phonePermanentUnique per personForm submission, account creation
Device SignatureHash of device attributesSession to daysMedium (shared across similar devices)Screen, timezone, language, platform, GPU
IP ClusterIP address or /24 subnetDynamicLow (shared across household/office)Request headers
External IDCRM ID, SSO ID, etc.PermanentUnique (from external system)API integration, tag parameter

4. Edge Types and Confidence

Edges connect two nodes and carry a confidence score (0.0–1.0) representing the system's belief that the two nodes belong to the same person. Edge types determine the initial confidence:

Edge TypeFrom NodeTo NodeBase ConfidenceDescription
loginCookieEmail Hash0.99User logged in with this email on this cookie
form_submitCookieEmail Hash0.95User submitted a form with this email
checkoutCookieEmail Hash0.98User completed checkout with this email
same_sessionCookieDevice Signature0.80Cookie and device signature observed in same session
ip_matchCookieIP Cluster0.40Cookie observed from this IP range
ip_temporalCookieCookie0.30Two cookies from same IP within short time window
signature_matchCookieCookie0.60Two cookies with same device signature
external_linkCookieExternal ID0.95CRM/ad-tech system linked this cookie to external ID
email_matchEmail HashEmail Hash1.00Same email hash (identity anchor)

5. Deterministic Matching

Deterministic matching creates high-confidence edges based on explicit identity signals. These edges are created when the visitor takes an action that directly reveals their identity:

Deterministic edges are the strongest signals in the identity graph. A single deterministic edge can resolve a previously anonymous visitor into a known customer and link all their historical behavioral data to that customer profile.

6. Probabilistic Matching

Probabilistic matching creates lower-confidence edges based on statistical signals that suggest two nodes may belong to the same person:

6.1 IP Clustering

When two cookies are observed from the same IP address within a short time window (e.g., 24 hours), they may belong to the same person using different browsers or devices. Base confidence: 0.30 (low, because IP addresses are shared across households and offices).

6.2 Signature Matching

When two cookies share the same device signature (screen resolution, timezone, language, platform, GPU renderer), they may be the same device with cleared cookies. Base confidence: 0.60 (medium, because device signatures are not perfectly unique).

6.3 Behavioral Similarity

When two cookies exhibit similar behavioral patterns (same content affinity, similar navigation paths, similar engagement scores) from the same geographic region, they may be the same person. Base confidence: 0.20 (very low, used only as supporting evidence).

6.4 Confidence Boosting

Probabilistic confidence increases when multiple independent signals corroborate each other:

function boostConfidence(edges: Edge[]): number { // Start with the highest single-edge confidence let maxConfidence = Math.max(...edges.map(e => e.confidence)); // Each additional corroborating edge boosts confidence for (const edge of edges.slice(1)) { const boost = edge.confidence * (1 - maxConfidence) * 0.5; maxConfidence = Math.min(maxConfidence + boost, 0.95); } return maxConfidence; } // Example: IP match (0.40) + signature match (0.60) + behavioral (0.20) // Result: 0.60 + (0.40 * 0.40 * 0.5) + (0.20 * 0.32 * 0.5) = 0.712

7. Graph Resolution Algorithm

The graph resolution algorithm merges nodes that are connected by edges exceeding a confidence threshold. The algorithm runs incrementally on every new edge creation:

7.1 Algorithm Steps

  1. New edge arrives: A new identity signal creates an edge (e.g., a login event creates a cookie-to-email edge).
  2. Transitive closure: Find all nodes reachable from either endpoint of the new edge through edges with confidence > 0.50.
  3. Cluster identification: Group all reachable nodes into a single identity cluster.
  4. Primary identity selection: Select the highest-fidelity node as the cluster's primary identifier (preference order: email hash > phone hash > external ID > oldest cookie).
  5. Profile merge: Aggregate behavioral scores, session histories, and attribution data across all nodes in the cluster.

7.2 Conflict Resolution

When the algorithm detects that two previously separate identity clusters should be merged (e.g., a visitor logs in on their phone and the email hash connects their phone cookie to their desktop cookie), it executes a cluster merge:

async function mergeIdentityClusters( clusterA: string, clusterB: string, db: D1Database ): Promise<void> { // Determine which cluster is the primary (older = primary) const primary = clusterA < clusterB ? clusterA : clusterB; const secondary = clusterA < clusterB ? clusterB : clusterA; // Update all nodes in secondary cluster to point to primary await db.prepare( `UPDATE identity_nodes SET cluster_id = ? WHERE cluster_id = ?` ).bind(primary, secondary).run(); // Merge behavioral score aggregates await db.prepare( `UPDATE visitor_scores SET cluster_id = ? WHERE cluster_id = ?` ).bind(primary, secondary).run(); // Re-attribute sessions await db.prepare( `UPDATE sessions SET cluster_id = ? WHERE cluster_id = ?` ).bind(primary, secondary).run(); }

8. D1 Database Schema

The identity graph is stored in Cloudflare D1, a SQLite database at the edge. The schema consists of three core tables:

8.1 identity_nodes

CREATE TABLE identity_nodes ( id TEXT PRIMARY KEY, -- Node identifier (cookie ID, email hash, etc.) node_type TEXT NOT NULL, -- 'cookie', 'email_hash', 'phone_hash', 'device signature', 'ip_cluster', 'external' cluster_id TEXT NOT NULL, -- Identity cluster this node belongs to site_id TEXT NOT NULL, -- Customer site identifier first_seen INTEGER NOT NULL, -- Unix timestamp of first observation last_seen INTEGER NOT NULL, -- Unix timestamp of most recent observation metadata TEXT, -- JSON blob with node-specific data created_at INTEGER NOT NULL DEFAULT (unixepoch()) ); CREATE INDEX idx_nodes_cluster ON identity_nodes(cluster_id); CREATE INDEX idx_nodes_site ON identity_nodes(site_id); CREATE INDEX idx_nodes_type ON identity_nodes(node_type, site_id);

8.2 identity_edges

CREATE TABLE identity_edges ( id INTEGER PRIMARY KEY AUTOINCREMENT, from_node TEXT NOT NULL REFERENCES identity_nodes(id), to_node TEXT NOT NULL REFERENCES identity_nodes(id), edge_type TEXT NOT NULL, -- 'login', 'form_submit', 'checkout', 'ip_match', etc. confidence REAL NOT NULL, -- 0.0 to 1.0 evidence TEXT, -- JSON with supporting data created_at INTEGER NOT NULL DEFAULT (unixepoch()), UNIQUE(from_node, to_node, edge_type) ); CREATE INDEX idx_edges_from ON identity_edges(from_node); CREATE INDEX idx_edges_to ON identity_edges(to_node); CREATE INDEX idx_edges_confidence ON identity_edges(confidence);

8.3 identity_clusters

CREATE TABLE identity_clusters ( cluster_id TEXT PRIMARY KEY, site_id TEXT NOT NULL, primary_node_id TEXT, -- Highest-fidelity node (email hash preferred) primary_type TEXT, -- Type of primary node node_count INTEGER DEFAULT 1, edge_count INTEGER DEFAULT 0, first_seen INTEGER NOT NULL, last_seen INTEGER NOT NULL, total_sessions INTEGER DEFAULT 0, total_pageviews INTEGER DEFAULT 0, avg_engagement REAL DEFAULT 0, max_intent REAL DEFAULT 0, conversion_count INTEGER DEFAULT 0, created_at INTEGER NOT NULL DEFAULT (unixepoch()) ); CREATE INDEX idx_clusters_site ON identity_clusters(site_id);

9. Identity Resolution Flow

The complete identity resolution flow executes on every incoming event at the edge:

  1. Cookie received: Edge worker reads the first-party cookie. If no cookie exists, a new one is created.
  2. Node lookup: The worker queries D1 for an existing node matching this cookie ID.
  3. New visitor path: If no node exists, create a new node, a new single-node cluster, and assign the cluster ID.
  4. Returning visitor path: If the node exists, update last_seen and proceed to edge evaluation.
  5. Edge evaluation: Check if the event contains any identity signals (email in form data, login event, etc.). If yes, create deterministic edges.
  6. Probabilistic evaluation: Compare device signature, IP, and behavioral signals against recent nodes. Create probabilistic edges where applicable.
  7. Graph resolution: If new edges connect previously separate clusters, execute cluster merge.
  8. Profile update: Update the cluster's aggregate behavioral scores, session count, and last-seen timestamp.

10. Cross-Site Identity (Multi-Domain)

Some customers operate multiple domains (e.g., marketing-site.com and app.product.com). ClickStream supports cross-site identity resolution while maintaining data isolation:

10.1 Shared Email Hash

When a visitor logs in on one domain and the same email hash appears on another domain (both belonging to the same customer), the identity graph creates a cross-site edge. This requires both domains to be configured under the same ClickStream account.

10.2 Cross-Site Edge Confidence

SignalCross-Site ConfidenceNotes
Same email hash0.99Strongest cross-site signal
Same phone hash0.95Strong cross-site signal
Same external ID0.95CRM or SSO integration
Same device signature + IP0.50Moderate (could be shared device)
Same IP only0.15Very weak (office networks)

11. Conclusion

Cross-device identity resolution transforms fragmented analytics into unified customer intelligence. ClickStream's graph-based identity model — with typed nodes, weighted edges, confidence scoring, and incremental resolution — handles the complexity of real-world customer journeys where one person uses multiple devices, clears cookies, switches browsers, and interacts across multiple company domains.

The key architectural decisions that make this work are: first-party cookies as the primary anchor (high persistence, high confidence), deterministic matching via email hash as the cross-device bridge, probabilistic matching as a supplementary signal with conservative confidence thresholds, and incremental graph resolution that executes on every event without batch processing.

The D1 schema is designed for edge-first execution: simple, denormalized where needed for performance, and partitioned by site for data isolation. The identity graph is not a data warehouse feature that runs nightly. It is a real-time system that resolves identity on every page view, every form submission, and every login event — at the edge, in under 3ms.

One Customer, One Profile, Maximum ROI

Stop paying to acquire the same person twice. Cross-device identity resolution unifies every visit into one conversion path — so your ad spend works harder.

GET EARLY ACCESS