06-Operations / 06.05.Scalability-Analysis

06.05.Scalability Analysis

06.05. Scalability Analysis

1. Executive Summary

This document provides a comprehensive analysis of the Car Pulse Tracker (CPT) system's scalability, aiming for "world-class" horizontal scalability capable of handling millions of global users (e.g., all Tesla owners) during a sudden surge.

The current architecture is a modern, containerized FastAPI application deployed on Google Cloud Run with an asynchronous core and Redis-based state management. While the foundation is solid, several critical bottlenecks and architectural patterns currently block true horizontal scalability.

Current Scalability Score: 6/10


2. Horizontal Scalability Analysis

2.1 Backend (Cloud Run)

2.2 Frontend (Cloud Storage + CDN)

2.3 State Management (Redis)

2.4 File Storage (Local Disk)


3. Identified Bottlenecks

3.1 PDF Generation (CPU-Bound)

PDF generation via WeasyPrint is a CPU-intensive operation. While it's offloaded to threads using anyio.to_thread, it still consumes the instance's 2 vCPUs. Under heavy load, this will lead to: - Increased latency for all API requests. - Excessive scaling of instances just to handle the CPU load. - Potential instance crashes if memory spikes (PDF rendering is memory-heavy).

3.2 Rate Limiting (Instance-Level)

The slowapi limiter is currently configured with in-memory storage. - Effect: If the limit is 10 requests/min and there are 100 instances, the global limit is 1000 requests/min, but it's inconsistent (a user might hit 10 different instances and get 100 requests). - Result: Inconsistent user experience and potential for abuse/DOS.

3.3 External API Dependencies (Tesla/Stripe/PayPal)

The system is heavily dependent on external APIs. - Tesla Fleet API: Known for strict rate limits and occasional latency. A global surge will hit Tesla's rate limits regardless of CPT's scalability. - Latency: Each external call adds wait time, consuming one of the 80 concurrent request slots in Cloud Run.

3.4 Missing Persistent Database

The system currently has no persistent database (only Redis). - Issue: Historical data (payment records, long-term report access, user preferences) has no home. Redis is for transient data only.


4. "World-Class" Scaling Roadmap

To achieve total horizontal scalability, the following architectural changes are required:

Phase 1: Storage & State (Immediate)

  1. Migrate to GCS: Replace local storage/ directory with Google Cloud Storage. All PDFs (receipts/reports) must be saved to and served from a shared bucket.
  2. Redis HA: Upgrade Memorystore Redis to STANDARD_HA tier with cross-zonal replication.
  3. Global Rate Limiting: Configure slowapi to use the existing Redis instance as its storage backend. This ensures rate limits are respected globally across all 100+ instances.

Phase 2: Decoupling & Background Tasks

  1. Asynchronous Generation: Move PDF generation to a background task queue (e.g., Cloud Tasks or Pub/Sub).
    • API returns 202 Accepted.
    • Worker service (another Cloud Run instance) generates the PDF and uploads to GCS.
    • Frontend polls or uses WebSockets/SSE to notify the user.
  2. Dedicated PDF Service: Spin off the PDF generation logic into its own microservice to isolate the CPU-intensive work from the main API.

Phase 3: Persistent Data

  1. Cloud SQL: Implement a persistent relational database (PostgreSQL/MySQL) for long-term records.
  2. Multi-Region Deployment: For "world-class" scale, deploy the API in multiple GCP regions (e.g., US, EU, Asia) using a Global Load Balancer. This reduces latency and adds geographic redundancy.

Phase 4: Tesla API Optimization

  1. Response Caching: Cache common Tesla API responses (that don't change often) in Redis to reduce the number of outgoing calls.
  2. Token Refreshing Worker: Move token refreshing to a background process to ensure tokens are always fresh before the user requests a report.

5. Conclusion

The Car Pulse Tracker is well-positioned for scaling, but the ephemeral filesystem dependency is a critical failure point for horizontal scaling. By migrating to Cloud Storage for files and upgrading to HA Redis for state, the system can immediately scale to handle thousands of concurrent users. Moving to a decoupled background task model for PDF generation will finalize its journey to "world-class" performance.