Whitepaper

Data Sovereignty & PII Architecture

AES-256-GCM per-site encryption, dual-hash PII strategy, 90-day auto-scrubbing, Parquet exports from ClickStream-managed storage, and GDPR/CCPA compliance by design.

ClickStream Research · March 2026 · 20 min read

Abstract

Data sovereignty is not a feature; it is an architectural decision that must be made at the foundation of any analytics platform. ClickStream's data architecture ensures that personally identifiable information (PII) is encrypted at rest with AES-256-GCM using per-site encryption keys, hashed with a dual-hash strategy (SHA-256 for internal identity resolution, MD5 for ad-tech compatibility), automatically scrubbed after a configurable 90-day TTL, and exportable in Parquet format from ClickStream-managed Cloudflare R2 infrastructure. This whitepaper details every layer of the privacy architecture, from the encryption envelope to GDPR Article 17 (right to erasure) compliance, CCPA opt-out mechanics, and Schrems II positioning for EU-US data transfers.

Table of Contents

  1. The Data Sovereignty Threat Model
  2. AES-256-GCM Per-Site Encryption
  3. Dual-Hash PII Strategy
  4. 90-Day TTL Auto-Scrubbing
  5. Parquet Exports to R2
  6. Data Classification Tiers
  7. GDPR Compliance Architecture
  8. CCPA/CPRA Compliance
  9. Schrems II and EU-US Data Transfers
  10. Right to Erasure Implementation
  11. Conclusion

1. The Data Sovereignty Threat Model

Most analytics platforms operate on a trust model where the customer sends their visitor data to a vendor-controlled infrastructure, hopes the vendor handles it properly, and has no meaningful audit capability. This model has failed repeatedly: data breaches at third-party analytics providers, unauthorized data sharing with advertising networks, and regulatory penalties for customers who delegated data processing to non-compliant vendors.

ClickStream's data sovereignty architecture addresses four threat categories:

2. AES-256-GCM Per-Site Encryption

Every customer site in ClickStream receives a unique 256-bit AES encryption key. All PII data is encrypted at the edge worker before being written to any data store.

2.1 Encryption Envelope

// Encryption at the edge worker async function encryptPII(plaintext: string, siteKey: CryptoKey): Promise<string> { const iv = crypto.getRandomValues(new Uint8Array(12)); // 96-bit IV const encoded = new TextEncoder().encode(plaintext); const ciphertext = await crypto.subtle.encrypt( { name: 'AES-GCM', iv: iv, tagLength: 128 }, siteKey, encoded ); // Concatenate IV + ciphertext + auth tag const result = new Uint8Array(iv.length + ciphertext.byteLength); result.set(iv); result.set(new Uint8Array(ciphertext), iv.length); return btoa(String.fromCharCode(...result)); }

2.2 Key Management

AspectImplementation
Key generation256-bit random key generated at site onboarding via Web Crypto API
Key storageCloudflare Workers KV, encrypted at rest by Cloudflare
Key rotationOptional quarterly rotation; old keys retained for decryption of historical data
Key accessOnly the edge worker serving that customer's CNAME has access to that key
Key deletionCustomer can request key destruction, rendering all stored data permanently undecryptable
AlgorithmAES-256-GCM with 96-bit IV and 128-bit authentication tag

2.3 What Gets Encrypted

Not all data requires encryption. ClickStream classifies fields into three categories:

CategoryExamplesTreatment
PII (encrypted)Email, phone, IP address, hashed emailAES-256-GCM encrypted before storage
Pseudonymous (hashed)Visitor ID, session ID, device signatureAlready pseudonymous; stored as-is
Behavioral (plaintext)Scroll depth, click count, page URL, scoresNon-identifying; stored in plaintext for analytics

The encryption boundary is the critical design decision: encrypt everything that could identify a natural person, hash everything that serves as an identity linkage, and leave behavioral data unencrypted for analytics performance.

3. Dual-Hash PII Strategy

When a visitor provides an email address (through form submission, login, or checkout), ClickStream generates two irreversible hashes:

3.1 SHA-256 (Internal Identity)

// SHA-256 for internal identity graph const sha256Hash = await crypto.subtle.digest( 'SHA-256', new TextEncoder().encode(email.toLowerCase().trim()) ); const internalId = Array.from(new Uint8Array(sha256Hash)) .map(b => b.toString(16).padStart(2, '0')) .join('');

The SHA-256 hash is used exclusively within ClickStream's identity graph. It links visitor sessions, resolves cross-device identities, and powers the behavioral profile. SHA-256 is cryptographically secure and computationally infeasible to reverse.

3.2 MD5 (Ad-Tech Compatibility)

// MD5 for ad-tech identity sync (DSPs, demand-side platforms) const md5Hash = md5(email.toLowerCase().trim());

The MD5 hash exists solely for compatibility with ad-tech platforms. Most demand-side platforms expect MD5-hashed email addresses as their identity key. While MD5 is not cryptographically secure (collision attacks exist), it is an industry standard for this specific use case and does not expose raw PII.

3.3 Hash Comparison

AttributeSHA-256 (Internal)MD5 (Ad-Tech)
PurposeIdentity graph resolutionAd platform audience matching
Security levelCryptographically secureIndustry-standard (not collision-resistant)
Stored whereD1 identity graphOnly sent to ad platforms on demand
ReversibleNoTheoretically via rainbow tables (mitigated by salting in transit)
Shared externallyNeverOnly to configured ad-tech partners

The raw email address is encrypted with AES-256-GCM and stored in the PII envelope. The hashes exist independently. Even if the encryption key is lost, the hashes remain functional for identity resolution.

4. 90-Day TTL Auto-Scrubbing

All behavioral event data has a default time-to-live (TTL) of 90 days. After 90 days, event-level data is automatically and irreversibly deleted from both the Analytics Engine and D1.

4.1 What Gets Scrubbed

Data TypeTTLAfter Expiry
Raw event data (page views, clicks, etc.)90 daysPermanently deleted from Analytics Engine
Behavioral scores per session90 daysDeleted from D1; aggregated scores retained
IP addresses (encrypted)90 daysDeleted; IP-based geo already resolved at write time
Identity graph nodesConfigurable (default: 400 days)Node and all edges deleted
Aggregated analyticsNo TTLRetained indefinitely (non-identifying)
Parquet exportsCustomer-controlledStored on ClickStream's R2 infrastructure; customers can export and manage their own retention

4.2 Scrubbing Implementation

Auto-scrubbing is implemented as a Cloudflare Cron Trigger that runs daily:

// Scheduled daily scrub (Cron Trigger) export default { async scheduled(event, env, ctx) { const cutoff = Date.now() - (90 * 24 * 60 * 60 * 1000); // Delete expired events from D1 await env.DB.prepare( `DELETE FROM events WHERE created_at < ?` ).bind(cutoff).run(); // Delete expired sessions await env.DB.prepare( `DELETE FROM sessions WHERE last_active < ?` ).bind(cutoff).run(); // Delete expired PII envelopes await env.DB.prepare( `DELETE FROM pii_store WHERE created_at < ?` ).bind(cutoff).run(); // Analytics Engine handles its own TTL natively } };

5. Parquet Exports to R2

Customers can configure automatic daily exports of their analytics data to ClickStream-managed Cloudflare R2 infrastructure in Apache Parquet format. This ensures full data portability and export freedom.

5.1 Export Format

AttributeValue
FormatApache Parquet (columnar)
CompressionSnappy
PartitioningBy date (YYYY/MM/DD/)
SchemaMirrors Analytics Engine blob/double mapping
PII handlingPII fields remain encrypted in export
File size~50–200 MB per day per site (depends on traffic)
Storage locationClickStream-managed R2 infrastructure
RetentionConfigurable; customers can export data at any time for their own retention needs

5.2 Export Path Structure

s3://customer-bucket/clickstream/ events/ 2026/03/10/events-20260310-001.parquet 2026/03/10/events-20260310-002.parquet scores/ 2026/03/10/scores-20260310-001.parquet identity/ 2026/03/10/identity-graph-20260310.parquet

Customers can query Parquet exports directly using standard tools: DuckDB, Apache Spark, Pandas, Polars, Amazon Athena, or any SQL engine that supports Parquet. This eliminates vendor lock-in — even if a customer leaves ClickStream, they retain all their historical data in an open format.

6. Data Classification Tiers

ClickStream classifies all data into four tiers with progressively stricter handling requirements:

TierClassificationExamplesEncryptionTTLExport
Tier 1Direct PIIEmail, phone, nameAES-256-GCM90 days (or custom)Encrypted in export
Tier 2Indirect PIIIP address, precise geolocationAES-256-GCM90 daysEncrypted or omitted
Tier 3PseudonymousVisitor ID, session ID, hashesNone (already pseudonymous)400 daysIncluded
Tier 4Behavioral / AggregateScores, page URLs, timestampsNone90 days (events) / indefinite (aggregates)Included

7. GDPR Compliance Architecture

ClickStream's architecture addresses GDPR requirements structurally, not through policy alone:

GDPR ArticleRequirementClickStream Implementation
Art. 5(1)(e)Storage limitation90-day TTL with automated scrubbing
Art. 5(1)(f)Integrity and confidentialityAES-256-GCM encryption at rest
Art. 6Lawful basisCMP integration for consent management
Art. 7Conditions for consentSDK checks consent before setting cookies
Art. 15Right of accessAPI endpoint for data subject access requests (DSAR)
Art. 17Right to erasureCascade delete across all data stores within 72 hours
Art. 20Data portabilityParquet exports in open format
Art. 25Data protection by designEncryption, hashing, TTL, and minimization are architectural defaults
Art. 28Processor obligationsStandard DPA provided; sub-processor list: Cloudflare only
Art. 32Security of processingAES-256-GCM + TLS 1.3 in transit + isolated worker environments
Art. 33Breach notificationAutomated anomaly detection on data access patterns
Art. 35Data protection impact assessmentDPIA template provided to customers

8. CCPA/CPRA Compliance

California's CCPA and its successor CPRA impose specific requirements for opt-out mechanics and data selling prohibitions:

9. Schrems II and EU-US Data Transfers

The Schrems II ruling invalidated the EU-US Privacy Shield, creating uncertainty around transatlantic data transfers. ClickStream's architecture is designed to minimize transfer risk:

9.1 Data Localization

Cloudflare's edge network processes data at the nearest edge location to the visitor. For EU visitors, data is processed at EU edge locations (Frankfurt, Amsterdam, Paris, Dublin, etc.). The data does not need to transit to US data centers for processing.

9.2 Supplementary Measures

MeasureImplementation
Encryption in transitTLS 1.3 for all connections
Encryption at restAES-256-GCM with per-site keys
PseudonymizationSHA-256 hashing of PII at ingestion
Data minimizationOnly necessary fields collected; 90-day TTL
Access controlNo human access to encrypted PII without customer authorization
TransparencySub-processor list limited to Cloudflare; updated with notice

9.3 EU-Only Processing Option

Enterprise customers can configure ClickStream to restrict data processing to EU-only edge locations. This is implemented via Cloudflare's Data Localization Suite, which pins data storage and processing to specific geographic regions. When enabled, no visitor data from EU visitors leaves EU jurisdiction, even in encrypted form.

10. Right to Erasure Implementation

When a data subject exercises their right to erasure (GDPR Art. 17 or CCPA right to delete), ClickStream executes a cascade deletion across all data stores:

10.1 Deletion Flow

  1. Request received: Via customer API or ClickStream dashboard. Identifier: visitor ID, email hash, or raw email.
  2. Identity resolution: The identity graph resolves all linked identifiers (visitor IDs, session IDs, email hashes, device signatures) for the data subject.
  3. D1 deletion: All rows in all tables referencing any of the resolved identifiers are deleted.
  4. PII envelope destruction: The encrypted PII envelope for the data subject is deleted.
  5. Analytics Engine: Event-level data in Analytics Engine is flagged for exclusion from queries (Analytics Engine does not support row-level deletion, but TTL will naturally expire these events within 90 days).
  6. Parquet exports: A deletion manifest is generated and included in subsequent Parquet exports, listing the identifiers to exclude from any downstream processing.
  7. Confirmation: A deletion receipt with timestamp and scope is returned to the customer.

10.2 Deletion Timing

Data StoreDeletion TimeMethod
D1 (relational)< 1 hourSQL DELETE with cascade
Workers KV (PII envelope)< 1 hourKV delete
Analytics Engine (events)Up to 90 days (TTL)Query-time exclusion + natural expiry
Parquet exportsN/A (customer-controlled)Deletion manifest provided

11. Conclusion

Data sovereignty cannot be bolted on after the fact. It must be architectural. ClickStream's approach — per-site AES-256-GCM encryption, dual-hash PII strategy, 90-day automatic scrubbing, Parquet exports from managed infrastructure, and structural GDPR/CCPA compliance — ensures that data sovereignty is not a marketing claim but a cryptographic reality.

The key principles are simple: encrypt everything identifying, hash everything that serves as a linkage, automatically delete everything beyond the retention window, and give customers full export capability in open formats. No vendor lock-in. No data hostage situations. No compliance theater.

In a regulatory landscape that is tightening globally — the EU AI Act, Brazil's LGPD, India's DPDP Act, China's PIPL — organizations need analytics infrastructure that is not just compliant today but architecturally resilient to future regulation. ClickStream's privacy-first architecture provides that foundation.

Own Your Data. Protect Your Revenue.

Full data ownership means your visitor intelligence stays yours — not a vendor's monetization asset. Keep your competitive edge private.

GET EARLY ACCESS