05-Specifications / 05.01.Puppeteer-PDF-Generation.spec

05.01.Puppeteer PDF Generation.spec

05.01. Puppeteer PDF Generation.spec

1. Objective

Introduce an alternative "proactive" control flow for PDF report generation. Instead of generating reports on-demand when the user clicks "Download", the system will automatically trigger a background PDF generation process as soon as Tesla vehicle data is successfully fetched. This minimizes perceived latency by having the PDF ready in a GCS bucket before the user even requests it.

2. Architecture & Components

2.1 New Service: pdf-api (Node.js)

2.2 Infrastructure: Storage Bucket

2.3 State Management: Redis

Redis is the single source of truth for both the Tesla session/report data and the PDF lifecycle metadata. No new datastore is introduced.

3. Control Flow: Proactive Generation

3.1 The "Trigger" Phase

  1. Data Fetch: User initiates a Tesla data fetch in the WUI.
  2. Success: The Python API (cpt-api) successfully retrieves and processes vehicle metrics. The session — including the per-VIN report data — is already in Redis.
  3. Mint print token (the "hook"): cpt-api mints a short-lived JWT (TTL 60s) with claims {session_id, vin, purpose: "print", exp} using the existing JWT_SECRET_KEY (see §5.5). The session_id is the UI-control hook: it is the same identifier the WUI dashboard already operates on, and it is the Redis lookup key for the session/report data. The print token is therefore a signed, time-boxed pointer from the UI control state into Redis.
  4. Proactive Call: Immediately before returning the JSON data to the WUI, cpt-api fires a non-blocking internal request to pdf-api via asyncio.create_task(...) so the user response is not delayed.
    • Endpoint: POST http://con-${ORG}-${APP}-pdf-api:3000/generate
    • Payload: { vin, user_id, locale, print_token }no data payload; the WUI re-fetches from Redis via the print token. (session_id is not duplicated in the payload because it is already a claim inside print_token.)
    • Auth: Internal shared secret validation header X-Internal-Secret. The secret is provisioned via step 029-create-gcp-secrets and injected into both containers as an env var.

3.2 The "Rendering" Phase (pdf-api)

  1. Receive: pdf-api accepts the request and immediately returns 202 Accepted. Rendering happens asynchronously on the worker pool.
  2. Configure viewport (visual parity with WUI desktop layout): Before navigation, the pool worker sets the Puppeteer page to a high-resolution desktop viewport so ECharts canvases, photos, and CSS-grid layouts render at the same dimensions a desktop user would see — no "mobile-collapsed" PDFs: js await page.setViewport({ width: 1920, height: 1080, deviceScaleFactor: 2 }); deviceScaleFactor: 2 produces 2× pixel density so chart strokes and rasterized images are crisp in the PDF. Both values are configurable via env (PDF_VIEWPORT_WIDTH, PDF_VIEWPORT_HEIGHT, PDF_DEVICE_SCALE_FACTOR) — defaults baked for the WUI's intended desktop breakpoint.
  3. Navigate with deterministic wait: The worker navigates to the WUI print route and explicitly waits for the page to declare itself ready: js await page.goto(`http://con-${ORG}-${APP}-wui/report/print/${vin}?token=${print_token}`, { waitUntil: 'networkidle0', timeout: 30_000 }); await page.waitForFunction('window.isRenderComplete === true', { timeout: 15_000 });
    • waitUntil: 'networkidle0' waits until there are no in-flight network requests for ≥500 ms — ensures all data fetches, image loads, and font swaps have settled.
    • waitForFunction(...) waits for an explicit page-side signal: the WUI print view sets window.isRenderComplete = true only after ReportContent* is mounted, all data is bound, ECharts have finished their finished event for every chart instance, and any photo <img> tags have fired load/error. This guards against blank charts caused by ECharts animations not having completed before the PDF snapshot.
    • If either wait times out → counts as a render failure (see step 5 Resilience).
  4. Live HTML strategy: The print route renders only ReportContent* components — no nav, no menus, no tabs, no buttons (see §5.2).
  5. Resilience:
    • If rendering fails (Puppeteer error, navigation timeout, waitForFunction timeout, non-2xx data fetch), pdf-api retries up to 3 times with exponential backoff.
    • If all 3 retries fail, pdf-api POSTs a failure notice to cpt-api (POST /api/v1/internal/pdf-render-failed with X-Internal-Secret). cpt-api records this in Redis (pdf:report:{vin}:failed = 1) so the download path knows to fall back to WeasyPrint without re-checking GCS.
  6. Snapshot to PDF: await page.pdf({ format: 'A4', printBackground: true, preferCSSPageSize: true })printBackground: true so the WUI's branded background colors/images survive in the PDF; preferCSSPageSize: true so any @page rules in the WUI's print stylesheet take precedence.
  7. Upload: The resulting PDF is uploaded to gs://${ORG}-${APP}-${ENV}-reports/pdf/report-{vin}-{timestamp}.pdf.
  8. Register: pdf-api POSTs success back to cpt-api (POST /api/v1/internal/pdf-render-complete with X-Internal-Secret and {session_id, vin, timestamp, gcs_object}); cpt-api:
    1. Generates an opaque download handle: random_id = secrets.token_urlsafe(16).
    2. SETs pdf:hook:{random_id} = JSON {session_id, vin, gcs_object, timestamp} with TTL 30m.
    3. SADD pdf:hooks:by-session:{session_id} {random_id} and EXPIRE to 30m (reverse index for cleanup).
    4. SETs pdf:report:{vin} = timestamp (presence marker) with TTL 30m.
    5. Returns the random_id in the response so the next polling call from the WUI can deliver it to the dashboard (see §3.3 step 2).

3.3 The "UI" Phase (WUI)

  1. Fetch Complete: WUI receives Tesla data and displays the dashboard. The Pinia store gains a reactive field pdfHookId: Ref<string | null> (initially null) and pdfFallbackMode: Ref<boolean> (initially false). The "Download PDF" button is rendered disabled while pdfHookId === null && !pdfFallbackMode.
  2. Poll for hook: After dashboard mount, the WUI polls GET /api/v1/tesla/pdf-status/{session_id}/{vin} every ~750ms (max ~10s budget). The endpoint inspects Redis:
    • If pdf:hook:* was registered for the (session_id, vin) pair: returns {ready: true, random_id: "<opaque>"}.
    • Else if pdf:report:{vin}:failed is set: returns {ready: false, fallback: true}.
    • Else: returns {ready: false} — WUI keeps polling until the budget elapses.
  3. Reactive hydration (no DOM mutation): When the WUI receives random_id, it sets store.pdfHookId = random_id. Vue's reactivity system handles the rest: a :id="…" binding on the button (:id="pdfHookId ? \but_${pdfHookId}_print_pdf` : 'but_pending_print_pdf'") re-renders synchronously, and awatch(() => store.pdfHookId, …)in the dashboard component enables the click handler atomically with the id update. **No manualdocument.getElementById(...)mutation.** This eliminates the race where a click between "hook lands" and "DOM rewrite completes" would otherwise hit the disabled placeholder. If the polling budget elapses with no hook (store.pdfFallbackMode = true`), the watch flips the button into fallback mode (see step 5b).
  4. 3-second grace: Independently of polling, the button is held disabled for at least the first 3 seconds after dashboard mount via a mountedAt timestamp + computed disabledUntilGrace flag in the store, so users can't click before either the proactive PDF lands or the fallback path is selected.
  5. Download click: The button's @click handler reads store.pdfHookId (the reactive source of truth, not the DOM) and calls:
    • GET /api/v1/tesla/pdf/{store.pdfHookId} (proactive path).
    • cpt-api parses {random_id}, looks up pdf:hook:{random_id} in Redis:
      • Hit → returns a short-lived signed GCS URL (or streams the PDF), then optionally deletes the hook (single-use) or leaves it for the 30m TTL.
      • Miss / expired → returns 410 Gone with {fallback: true}; the WUI then transparently calls the legacy on-the-fly endpoint (POST /api/v1/tesla/vehicle-report/pdf) which uses WeasyPrint.
    • 5b. Fallback mode (store.pdfFallbackMode === true, no random_id was ever issued): the button calls the legacy on-the-fly endpoint directly. WeasyPrint generates synchronously.
  6. No VIN, no session id, no PDF URL in any user-visible identifier: the only handle in the DOM is the opaque random_id (rendered via Vue's reactive :id binding, never via direct DOM manipulation).

4. Lifecycle & Cleanup

4.1 Automatic Deletion

5. Technical Requirements

5.1 Node.js Implementation (pdf-api)

5.2 WUI Refactoring (Vue 3)

5.3 Terraform (Infra)

5.4 Security

5.5 Print-Route Authentication — Short-Lived Print Token (the "hook")

6. Decisions Log

# Question Decision
1 Module placement of pdf-api Co-located at bnc-cpt-api/src/nodejs/pdf-api/
2 New bucket vs reuse existing Reuse existing ${ORG}-${APP}-${ENV}-reports (step 016-gcp-reports-bucket)
3 WUI routing for print view Extend isShadowReportRoute pattern (no vue-router, no separate SPA)
4 Cleanup trigger 30 min after Tesla OAuth token fetch success, scheduled task in cpt-api, anchored on tesla:oauth:success:{session_id}
5 Print-route auth Dedicated short-lived print JWT (60s TTL, purpose=print, VIN-bound) — not session JWT
6 Bucket name (-reports vs -reports-pdf) Accept existing -reports, prefix objects under pdf/
7 Data passing to pdf-api No data in payloadpdf-api receives only {vin, user_id, locale, print_token}; WUI re-fetches via print token
8 Print token claims Includes session_id — the print token is the UI-control "hook" tying the WUI dashboard state to the Redis-cached session/report data
9 Reuse existing report endpoint vs new one Reuse GET /api/v1/tesla/report/{session_id} — broaden auth to accept print token; no new endpoint
10 Download button identifier Opaque server-generated random_id (secrets.token_urlsafe(16)) embedded as but_{random_id}_print_pdf — never expose session_id or vin in DOM ids
11 Download flow WUI polls GET /api/v1/tesla/pdf-status/{session_id}/{vin} for hook readiness; click → GET /api/v1/tesla/pdf/{random_id} → signed GCS URL or 410 Gone triggering WeasyPrint fallback
12 Redis cleanup completeness Cleanup task explicitly deletes pdf:hook:{random_id} (via reverse-index pdf:hooks:by-session:{session_id}), pdf:report:{vin}, pdf:report:{vin}:failed, and the anchor — TTLs are a safety net only
13 Puppeteer viewport for visual parity 1920×1080, deviceScaleFactor: 2, emulateMediaType('print') — env-tunable; ensures ECharts/photos are crisp and desktop layout is used (no mobile collapse)
14 Puppeteer readiness signal Combined: waitUntil: 'networkidle0' plus waitForFunction('window.isRenderComplete === true') — WUI sets the flag after all ECharts 'finished' events + image loads + nextTick
15 WUI button binding mechanism Reactive Pinia state (store.pdfHookId) bound via :id / :disabled — no document.getElementById mutation; closes the click-during-rewrite race

Operational Specification v1.6.0 — Crisp-render Puppeteer config (1920×1080 @2×, networkidle0 + window.isRenderComplete) · Race-free reactive button binding · Explicit Redis cleanup with reverse-index · Hook-id download mechanism · Reuse /report/{session_id} for print fetch · Co-located pdf-api · Reuse existing reports bucket · Path-based print route · Short-lived print token · OAuth-anchored cleanup