AES-256-GCM per-site encryption, dual-hash PII strategy, 90-day auto-scrubbing, Parquet exports from ClickStream-managed storage, and GDPR/CCPA compliance by design.
Data sovereignty is not a feature; it is an architectural decision that must be made at the foundation of any analytics platform. ClickStream's data architecture ensures that personally identifiable information (PII) is encrypted at rest with AES-256-GCM using per-site encryption keys, hashed with a dual-hash strategy (SHA-256 for internal identity resolution, MD5 for ad-tech compatibility), automatically scrubbed after a configurable 90-day TTL, and exportable in Parquet format from ClickStream-managed Cloudflare R2 infrastructure. This whitepaper details every layer of the privacy architecture, from the encryption envelope to GDPR Article 17 (right to erasure) compliance, CCPA opt-out mechanics, and Schrems II positioning for EU-US data transfers.
Most analytics platforms operate on a trust model where the customer sends their visitor data to a vendor-controlled infrastructure, hopes the vendor handles it properly, and has no meaningful audit capability. This model has failed repeatedly: data breaches at third-party analytics providers, unauthorized data sharing with advertising networks, and regulatory penalties for customers who delegated data processing to non-compliant vendors.
ClickStream's data sovereignty architecture addresses four threat categories:
Every customer site in ClickStream receives a unique 256-bit AES encryption key. All PII data is encrypted at the edge worker before being written to any data store.
| Aspect | Implementation |
|---|---|
| Key generation | 256-bit random key generated at site onboarding via Web Crypto API |
| Key storage | Cloudflare Workers KV, encrypted at rest by Cloudflare |
| Key rotation | Optional quarterly rotation; old keys retained for decryption of historical data |
| Key access | Only the edge worker serving that customer's CNAME has access to that key |
| Key deletion | Customer can request key destruction, rendering all stored data permanently undecryptable |
| Algorithm | AES-256-GCM with 96-bit IV and 128-bit authentication tag |
Not all data requires encryption. ClickStream classifies fields into three categories:
| Category | Examples | Treatment |
|---|---|---|
| PII (encrypted) | Email, phone, IP address, hashed email | AES-256-GCM encrypted before storage |
| Pseudonymous (hashed) | Visitor ID, session ID, device signature | Already pseudonymous; stored as-is |
| Behavioral (plaintext) | Scroll depth, click count, page URL, scores | Non-identifying; stored in plaintext for analytics |
The encryption boundary is the critical design decision: encrypt everything that could identify a natural person, hash everything that serves as an identity linkage, and leave behavioral data unencrypted for analytics performance.
When a visitor provides an email address (through form submission, login, or checkout), ClickStream generates two irreversible hashes:
The SHA-256 hash is used exclusively within ClickStream's identity graph. It links visitor sessions, resolves cross-device identities, and powers the behavioral profile. SHA-256 is cryptographically secure and computationally infeasible to reverse.
The MD5 hash exists solely for compatibility with ad-tech platforms. Most demand-side platforms expect MD5-hashed email addresses as their identity key. While MD5 is not cryptographically secure (collision attacks exist), it is an industry standard for this specific use case and does not expose raw PII.
| Attribute | SHA-256 (Internal) | MD5 (Ad-Tech) |
|---|---|---|
| Purpose | Identity graph resolution | Ad platform audience matching |
| Security level | Cryptographically secure | Industry-standard (not collision-resistant) |
| Stored where | D1 identity graph | Only sent to ad platforms on demand |
| Reversible | No | Theoretically via rainbow tables (mitigated by salting in transit) |
| Shared externally | Never | Only to configured ad-tech partners |
The raw email address is encrypted with AES-256-GCM and stored in the PII envelope. The hashes exist independently. Even if the encryption key is lost, the hashes remain functional for identity resolution.
All behavioral event data has a default time-to-live (TTL) of 90 days. After 90 days, event-level data is automatically and irreversibly deleted from both the Analytics Engine and D1.
| Data Type | TTL | After Expiry |
|---|---|---|
| Raw event data (page views, clicks, etc.) | 90 days | Permanently deleted from Analytics Engine |
| Behavioral scores per session | 90 days | Deleted from D1; aggregated scores retained |
| IP addresses (encrypted) | 90 days | Deleted; IP-based geo already resolved at write time |
| Identity graph nodes | Configurable (default: 400 days) | Node and all edges deleted |
| Aggregated analytics | No TTL | Retained indefinitely (non-identifying) |
| Parquet exports | Customer-controlled | Stored on ClickStream's R2 infrastructure; customers can export and manage their own retention |
Auto-scrubbing is implemented as a Cloudflare Cron Trigger that runs daily:
Customers can configure automatic daily exports of their analytics data to ClickStream-managed Cloudflare R2 infrastructure in Apache Parquet format. This ensures full data portability and export freedom.
| Attribute | Value |
|---|---|
| Format | Apache Parquet (columnar) |
| Compression | Snappy |
| Partitioning | By date (YYYY/MM/DD/) |
| Schema | Mirrors Analytics Engine blob/double mapping |
| PII handling | PII fields remain encrypted in export |
| File size | ~50–200 MB per day per site (depends on traffic) |
| Storage location | ClickStream-managed R2 infrastructure |
| Retention | Configurable; customers can export data at any time for their own retention needs |
Customers can query Parquet exports directly using standard tools: DuckDB, Apache Spark, Pandas, Polars, Amazon Athena, or any SQL engine that supports Parquet. This eliminates vendor lock-in — even if a customer leaves ClickStream, they retain all their historical data in an open format.
ClickStream classifies all data into four tiers with progressively stricter handling requirements:
| Tier | Classification | Examples | Encryption | TTL | Export |
|---|---|---|---|---|---|
| Tier 1 | Direct PII | Email, phone, name | AES-256-GCM | 90 days (or custom) | Encrypted in export |
| Tier 2 | Indirect PII | IP address, precise geolocation | AES-256-GCM | 90 days | Encrypted or omitted |
| Tier 3 | Pseudonymous | Visitor ID, session ID, hashes | None (already pseudonymous) | 400 days | Included |
| Tier 4 | Behavioral / Aggregate | Scores, page URLs, timestamps | None | 90 days (events) / indefinite (aggregates) | Included |
ClickStream's architecture addresses GDPR requirements structurally, not through policy alone:
| GDPR Article | Requirement | ClickStream Implementation |
|---|---|---|
| Art. 5(1)(e) | Storage limitation | 90-day TTL with automated scrubbing |
| Art. 5(1)(f) | Integrity and confidentiality | AES-256-GCM encryption at rest |
| Art. 6 | Lawful basis | CMP integration for consent management |
| Art. 7 | Conditions for consent | SDK checks consent before setting cookies |
| Art. 15 | Right of access | API endpoint for data subject access requests (DSAR) |
| Art. 17 | Right to erasure | Cascade delete across all data stores within 72 hours |
| Art. 20 | Data portability | Parquet exports in open format |
| Art. 25 | Data protection by design | Encryption, hashing, TTL, and minimization are architectural defaults |
| Art. 28 | Processor obligations | Standard DPA provided; sub-processor list: Cloudflare only |
| Art. 32 | Security of processing | AES-256-GCM + TLS 1.3 in transit + isolated worker environments |
| Art. 33 | Breach notification | Automated anomaly detection on data access patterns |
| Art. 35 | Data protection impact assessment | DPIA template provided to customers |
California's CCPA and its successor CPRA impose specific requirements for opt-out mechanics and data selling prohibitions:
navigator.globalPrivacyControl) and treats it as a valid opt-out. No cookies are set and only session-level, non-identifying data is collected.The Schrems II ruling invalidated the EU-US Privacy Shield, creating uncertainty around transatlantic data transfers. ClickStream's architecture is designed to minimize transfer risk:
Cloudflare's edge network processes data at the nearest edge location to the visitor. For EU visitors, data is processed at EU edge locations (Frankfurt, Amsterdam, Paris, Dublin, etc.). The data does not need to transit to US data centers for processing.
| Measure | Implementation |
|---|---|
| Encryption in transit | TLS 1.3 for all connections |
| Encryption at rest | AES-256-GCM with per-site keys |
| Pseudonymization | SHA-256 hashing of PII at ingestion |
| Data minimization | Only necessary fields collected; 90-day TTL |
| Access control | No human access to encrypted PII without customer authorization |
| Transparency | Sub-processor list limited to Cloudflare; updated with notice |
Enterprise customers can configure ClickStream to restrict data processing to EU-only edge locations. This is implemented via Cloudflare's Data Localization Suite, which pins data storage and processing to specific geographic regions. When enabled, no visitor data from EU visitors leaves EU jurisdiction, even in encrypted form.
When a data subject exercises their right to erasure (GDPR Art. 17 or CCPA right to delete), ClickStream executes a cascade deletion across all data stores:
| Data Store | Deletion Time | Method |
|---|---|---|
| D1 (relational) | < 1 hour | SQL DELETE with cascade |
| Workers KV (PII envelope) | < 1 hour | KV delete |
| Analytics Engine (events) | Up to 90 days (TTL) | Query-time exclusion + natural expiry |
| Parquet exports | N/A (customer-controlled) | Deletion manifest provided |
Data sovereignty cannot be bolted on after the fact. It must be architectural. ClickStream's approach — per-site AES-256-GCM encryption, dual-hash PII strategy, 90-day automatic scrubbing, Parquet exports from managed infrastructure, and structural GDPR/CCPA compliance — ensures that data sovereignty is not a marketing claim but a cryptographic reality.
The key principles are simple: encrypt everything identifying, hash everything that serves as a linkage, automatically delete everything beyond the retention window, and give customers full export capability in open formats. No vendor lock-in. No data hostage situations. No compliance theater.
In a regulatory landscape that is tightening globally — the EU AI Act, Brazil's LGPD, India's DPDP Act, China's PIPL — organizations need analytics infrastructure that is not just compliant today but architecturally resilient to future regulation. ClickStream's privacy-first architecture provides that foundation.
Full data ownership means your visitor intelligence stays yours — not a vendor's monetization asset. Keep your competitive edge private.
GET EARLY ACCESS