06.05.Scalability Analysis

06.05. Scalability Analysis

1. Executive Summary

This document provides a comprehensive analysis of the Car Pulse Tracker (CPT) system's scalability, aiming for "world-class" horizontal scalability capable of handling millions of global users (e.g., all Tesla owners) during a sudden surge.

The current architecture is a modern, containerized FastAPI application deployed on Google Cloud Run with an asynchronous core and Redis-based state management. While the foundation is solid, several critical bottlenecks and architectural patterns currently block true horizontal scalability.

Current Scalability Score: 6/10

Strengths: Containerized, Async (FastAPI), Cloud Run (Auto-scaling), Shared Session State (Redis).
Weaknesses: Ephemeral Local Storage (PDFs), BASIC Tier Redis (No HA), Per-Instance Rate Limiting, Missing Persistent Database, CPU-Intensive Background Tasks.

2. Horizontal Scalability Analysis

2.1 Backend (Cloud Run)

Current Status: Scales from 1 to 100 instances automatically. CPU/Memory limits are set to 2 vCPU and 2GB.
Scalability Limit: The current limit of 100 instances is sufficient for medium load (~8,000 concurrent users at 80 concurrency), but "world-class" scale may require increasing this to 1000+ or moving to Google Kubernetes Engine (GKE) for more granular control.
Efficiency: startup_cpu_boost is enabled, which is excellent for quick cold starts during traffic spikes.

2.2 Frontend (Cloud Storage + CDN)

Current Status: Served from a static bucket with a Load Balancer/CDN.
Scalability Limit: Virtually unlimited. GCS + Cloud CDN is the gold standard for high-traffic static sites.

2.3 State Management (Redis)

Current Status: Using Memorystore Redis (BASIC Tier, 1GB). All transient state (OAuth, Sessions, Payments) is in Redis.
Critical Issue: No High Availability (HA). In the BASIC tier, if the Redis instance fails, all active user sessions, OAuth states, and pending payments are lost.
Critical Issue: 1GB Capacity. A global surge of users will quickly exhaust 1GB of memory if session TTLs are long.

2.4 File Storage (Local Disk)

Current Status: Receipt and Report PDFs are written to storage/receipts and storage/reports on the container's ephemeral disk.
Fatal Scalability Blocker: Cloud Run instances do not share their local filesystem.
Inconsistency: User hits Instance A to generate a report, then hits Instance B to download it → 404 Not Found.
Data Loss: When Instance A scales down or restarts, all generated PDFs are deleted permanently.

3. Identified Bottlenecks

3.1 PDF Generation (CPU-Bound)

PDF generation via WeasyPrint is a CPU-intensive operation. While it's offloaded to threads using anyio.to_thread, it still consumes the instance's 2 vCPUs. Under heavy load, this will lead to: - Increased latency for all API requests. - Excessive scaling of instances just to handle the CPU load. - Potential instance crashes if memory spikes (PDF rendering is memory-heavy).

3.2 Rate Limiting (Instance-Level)

The slowapi limiter is currently configured with in-memory storage. - Effect: If the limit is 10 requests/min and there are 100 instances, the global limit is 1000 requests/min, but it's inconsistent (a user might hit 10 different instances and get 100 requests). - Result: Inconsistent user experience and potential for abuse/DOS.

3.3 External API Dependencies (Tesla/Stripe/PayPal)

The system is heavily dependent on external APIs. - Tesla Fleet API: Known for strict rate limits and occasional latency. A global surge will hit Tesla's rate limits regardless of CPT's scalability. - Latency: Each external call adds wait time, consuming one of the 80 concurrent request slots in Cloud Run.

3.4 Missing Persistent Database

The system currently has no persistent database (only Redis). - Issue: Historical data (payment records, long-term report access, user preferences) has no home. Redis is for transient data only.

4. "World-Class" Scaling Roadmap

To achieve total horizontal scalability, the following architectural changes are required:

Phase 1: Storage & State (Immediate)

Migrate to GCS: Replace local storage/ directory with Google Cloud Storage. All PDFs (receipts/reports) must be saved to and served from a shared bucket.
Redis HA: Upgrade Memorystore Redis to STANDARD_HA tier with cross-zonal replication.
Global Rate Limiting: Configure slowapi to use the existing Redis instance as its storage backend. This ensures rate limits are respected globally across all 100+ instances.

Phase 2: Decoupling & Background Tasks

Asynchronous Generation: Move PDF generation to a background task queue (e.g., Cloud Tasks or Pub/Sub).
- API returns 202 Accepted.
- Worker service (another Cloud Run instance) generates the PDF and uploads to GCS.
- Frontend polls or uses WebSockets/SSE to notify the user.
Dedicated PDF Service: Spin off the PDF generation logic into its own microservice to isolate the CPU-intensive work from the main API.

Phase 3: Persistent Data

Cloud SQL: Implement a persistent relational database (PostgreSQL/MySQL) for long-term records.
Multi-Region Deployment: For "world-class" scale, deploy the API in multiple GCP regions (e.g., US, EU, Asia) using a Global Load Balancer. This reduces latency and adds geographic redundancy.

Phase 4: Tesla API Optimization

Response Caching: Cache common Tesla API responses (that don't change often) in Redis to reduce the number of outgoing calls.
Token Refreshing Worker: Move token refreshing to a background process to ensure tokens are always fresh before the user requests a report.

5. Conclusion

The Car Pulse Tracker is well-positioned for scaling, but the ephemeral filesystem dependency is a critical failure point for horizontal scaling. By migrating to Cloud Storage for files and upgrading to HA Redis for state, the system can immediately scale to handle thousands of concurrent users. Moving to a decoupled background task model for PDF generation will finalize its journey to "world-class" performance.