26 real-time scoring models processed in under 3ms at the edge. Feature extraction pipeline, 3-phase model architecture, and dual dataset approach.
Traditional analytics platforms collect events and process them in batch, often hours or days after the interaction occurred. ClickStream takes a fundamentally different approach: 26 behavioral scoring models execute at the edge — in the same Cloudflare Worker that processes the incoming event — completing all scoring in under 3 milliseconds with zero origin server round-trips. This whitepaper details the feature extraction pipeline, the BehavioralFeatures interface, the 3-phase model architecture (core, psychological, predictive), the dual dataset approach using Cloudflare Analytics Engine and D1, and the Analytics Engine schema with blob/double field mapping. Every behavioral signal is computed before the HTTP response is returned to the client.
When you open the Intelligence tab in your ClickStream dashboard at einstein.clickstream.com, you see 26 real-time behavioral scores updating for every active visitor. Those scores are computed at the Cloudflare edge closest to each visitor — typically within 3ms of the triggering event. You don't need to configure or deploy anything; the platform handles it all. This whitepaper explains the architecture that makes it possible.
Behavioral intelligence loses value with latency. A frustration score computed 24 hours after a user rage-quit your checkout flow cannot prevent the abandonment. An intent signal processed in a batch pipeline cannot trigger a real-time personalization. The value of behavioral signals decays exponentially with time.
ClickStream processes all behavioral scoring at the edge for three reasons:
Behavioral intelligence is not a data warehouse query. It is a real-time signal that must be computed, stored, and actionable within the same HTTP request-response cycle that captures the raw event.
Every incoming event passes through a feature extraction pipeline before any scoring model executes. The pipeline transforms raw event data into normalized features:
The SDK sends a structured event payload with every interaction:
Raw values are normalized to 0–1 scales using domain-specific ranges. For example:
| Raw Feature | Normalization | Range |
|---|---|---|
scrollDepth |
Already 0–1 | [0, 1] |
timeOnPage |
log(ms) / log(600000) | [0, 1] (capped at 10 min) |
clickCount |
min(clicks / 20, 1) | [0, 1] |
mouseVelocity |
min(velocity / 10, 1) | [0, 1] |
rageClicks |
min(rageClicks / 5, 1) | [0, 1] |
sessionPageCount |
min(pages / 15, 1) | [0, 1] |
formInteractions |
min(interactions / 10, 1) | [0, 1] |
All extracted and normalized features are structured into a BehavioralFeatures interface that serves as the input contract for all 26 scoring models:
Phase 1 models evaluate the fundamental behavioral signals that apply to every visitor on every page. They execute first because Phase 2 and Phase 3 models depend on their outputs.
| # | Model | Output Range | Primary Inputs | Use Case |
|---|---|---|---|---|
| 1 | Engagement Score | 0–100 | scrollDepth, timeOnPage, clickCount, sessionPageCount | Overall engagement level for content optimization and audience segmentation |
| 2 | Frustration Score | 0–100 | rageClicks, deadClicks, errorEncountered, scrollVelocity, cursorReversals | UX problem detection, bug prioritization, support escalation triggers |
| 3 | Intent Score | 0–100 | pageCategory, formInteractions, sessionPageCount, isReturning, timeOnPage | Lead scoring, sales prioritization, real-time chat triggers |
| 4 | Attention Score | 0–100 | scrollDepth, timeOnPage, mouseDistance, mouseVelocity | Content effectiveness measurement, ad viewability proxy |
| 5 | Navigation Fluency | 0–100 | sessionPageCount, sessionDuration, deadClicks, cursorReversals | Site architecture evaluation, information architecture optimization |
The engagement score uses a weighted combination of normalized features:
The frustration score is critical for real-time UX monitoring. Scores above 70 trigger automated alerts:
Phase 2 models infer higher-order psychological states from the combination of Phase 1 outputs and raw features. These models require more context and produce signals that are useful for personalization and content strategy.
| # | Model | Output Range | Primary Inputs | Use Case |
|---|---|---|---|---|
| 6 | Cognitive Load | 0–100 | timeOnPage, scrollVelocity, cursorReversals, mouseAcceleration, Frustration Score | Content complexity assessment, readability optimization |
| 7 | Decision Readiness | 0–100 | Intent Score, formInteractions, pageCategory, sessionPageCount, isReturning | CTA timing optimization, offer presentation triggers |
| 8 | Content Affinity | Category vector | pageCategory history, timeOnPage per category, scrollDepth per category | Content recommendation, personalized navigation |
| 9 | Urgency Signal | 0–100 | sessionDuration, sessionPageCount, formCompletionRate, daysSinceLastVisit | Time-sensitive offer triggers, exit-intent calibration |
Decision readiness combines intent with behavioral signals that indicate a visitor is moving toward a conversion event:
Phase 3 models produce forward-looking predictions about visitor behavior. They depend on both Phase 1 and Phase 2 outputs, plus historical patterns when available from D1.
| # | Model | Output Range | Primary Inputs | Use Case |
|---|---|---|---|---|
| 10 | Churn Risk | 0–100 | Engagement Score trend, daysSinceLastVisit, sessionDuration decline, Frustration Score | Retention campaigns, win-back triggers |
| 11 | Purchase Timing | 0–100 | Decision Readiness, Intent Score, pageCategory sequence, formCompletionRate | Sales notification, discount timing, follow-up cadence |
| 12 | LTV Estimate | Tier (1–5) | Engagement Score, Content Affinity, sessionPageCount, entrySource, deviceType | Customer segmentation, ad spend allocation |
| 13 | Bounce Probability | 0–100 | timeOnPage (first 10s), scrollDepth (first 10s), mouseVelocity, entrySource | Exit-intent popup triggers, content above-fold optimization |
| 14 | Conversion Probability | 0–100 | Intent Score, Decision Readiness, formCompletionRate, isReturning, entrySource | Real-time bidding signals, personalization prioritization |
| 15 | Content Consumption Depth | 0–100 | Attention Score, scrollDepth, timeOnPage, sessionPageCount, Content Affinity | Content strategy, paywall timing, newsletter targeting |
| 16 | Bot Probability | 0–100 | mouseVelocity consistency, clickCount patterns, timing regularity, interaction absence | Traffic quality filtering, fraud detection, ad spend protection |
Bot detection is a critical Phase 3 model that protects the integrity of all other scores. Bots exhibit distinct behavioral patterns:
ClickStream writes behavioral data to two complementary data stores, each optimized for different access patterns:
The Events dataset captures every raw interaction. It is append-only, high-throughput, and optimized for aggregation queries. Each write is a single structured event with up to 20 blob (string) fields and 20 double (numeric) fields.
The Scores dataset stores the computed behavioral scores per visitor per session. It is a relational SQLite database at the edge, optimized for point lookups and visitor-level queries. This is where the identity graph references behavioral data.
| Characteristic | Analytics Engine (Events) | D1 (Scores) |
|---|---|---|
| Data model | Append-only events | Relational (SQL) |
| Write pattern | Every event (high volume) | Per session update (lower volume) |
| Query pattern | Aggregations, time series | Point lookups, joins |
| Retention | 90 days (auto-scrub) | Configurable per customer |
| Schema | 20 blobs + 20 doubles | Full SQL schema |
| Best for | Dashboards, trends, funnels | Identity resolution, visitor profiles |
The Analytics Engine schema maps behavioral data to the 20 blob and 20 double fields available per event. This mapping is the critical interface between the edge worker and the analytics dashboard:
| Field | Content | Example |
|---|---|---|
blob1 | Visitor ID | v_m2x7k9p4q_3f8h2j1n9 |
blob2 | Session ID | s_m2x7k9p4r |
blob3 | Event type | page_view |
blob4 | Page URL | /pricing |
blob5 | Page category | pricing |
blob6 | Referrer | /features |
blob7 | Entry source | organic |
blob8 | UTM source | google |
blob9 | UTM medium | cpc |
blob10 | UTM campaign | brand_2026 |
blob11 | Device type | desktop |
blob12 | Country (from CF headers) | US |
blob13 | Region | CA |
blob14 | City | San Francisco |
blob15 | Browser | Chrome 125 |
blob16 | OS | macOS 15 |
blob17 | Ad click ID (gclid/fbclid) | EAIaI... |
blob18 | Hashed email (if available) | a1b2c3... |
blob19 | Content affinity category | product |
blob20 | LTV tier | tier_3 |
| Field | Content | Range |
|---|---|---|
double1 | Engagement Score | 0–100 |
double2 | Frustration Score | 0–100 |
double3 | Intent Score | 0–100 |
double4 | Attention Score | 0–100 |
double5 | Navigation Fluency | 0–100 |
double6 | Cognitive Load | 0–100 |
double7 | Decision Readiness | 0–100 |
double8 | Urgency Signal | 0–100 |
double9 | Churn Risk | 0–100 |
double10 | Purchase Timing | 0–100 |
double11 | Bounce Probability | 0–100 |
double12 | Conversion Probability | 0–100 |
double13 | Content Consumption Depth | 0–100 |
double14 | Bot Probability | 0–100 |
double15 | Scroll depth (raw) | 0–1 |
double16 | Time on page (ms) | 0–600000 |
double17 | Click count (raw) | 0–N |
double18 | Session page count | 0–N |
double19 | Session duration (ms) | 0–N |
double20 | Processing time (ms) | 0–10 |
double20 is reserved for self-monitoring: it records the total time in milliseconds the edge worker spent computing all 26 scores. This enables real-time performance monitoring of the scoring pipeline itself.
ClickStream measures double20 (processing time) across all edge locations. The results across a 30-day window:
| Metric | Value |
|---|---|
| P50 processing time | 1.2 ms |
| P90 processing time | 2.1 ms |
| P99 processing time | 2.8 ms |
| P99.9 processing time | 4.1 ms |
| Max observed | 6.3 ms |
| Models executed per event | 26 |
| Features extracted per event | 22 |
| Average payload size | 1.8 KB |
The P99 of 2.8ms means that 99% of all events across all global edge locations complete all 26 scoring models in under 3ms. This is faster than a single database query in most traditional architectures.
Edge-computed behavioral intelligence represents a paradigm shift from batch-processed analytics to real-time signal extraction. By executing 26 scoring models in under 3ms at the edge, ClickStream transforms raw clickstream events into actionable behavioral signals before the HTTP response is returned to the client.
The 3-phase architecture ensures that models build on each other in a dependency chain — core behavioral measurements feed psychological inferences, which in turn power predictive models. The dual dataset approach (Analytics Engine for aggregation, D1 for relational queries) ensures that behavioral data is optimally stored for both dashboarding and identity resolution.
The practical implications are significant: frustration detection that triggers support chat in real-time, intent scoring that alerts sales teams to high-value prospects while they are still on the site, churn risk signals that activate retention campaigns before the customer disengages, and bot detection that protects ad spend on every page view.
All of this happens at the edge, in the same infrastructure that sets the first-party cookie and resolves the visitor identity. No batch pipeline. No external ML service. No origin server dependency. Just 26 models, 22 features, and under 3 milliseconds.
Identify high-intent visitors in under 3ms and trigger conversion opportunities before they leave. Turn behavioral data into revenue.
GET EARLY ACCESS