Architecture
Author: Will Soltani
1) System Overview
This project combines a Next.js application with AWS managed services to provide secure file conversion and PDF filling.
Browser
|
| HTTPS (Next.js routes)
v
Next.js App Router (UI + API handlers)
| \
| \ StartExecution
| v
| Step Functions
| |
| v
| Convert Worker (Lambda container)
|
+--> DynamoDB (project/file metadata)
+--> S3 Raw Uploads (source objects)
+--> S3 Outputs (converted + filled artifacts)
2) Frontend Architecture
2.1 App shell and site pages
- Public marketing/site content is rendered from
app/, components/, sections/, and content/.
- Project tooling lives under
app/app/projects/* and is isolated from the public pages.
2.2 Project workspace
Primary orchestrator:
app/app/projects/[projectId]/ProjectDetailClient.tsx
State model (high level):
- Staged local files (pre-upload):
useStagedFiles
- Server file list (raw/output artifacts):
useServerFiles
- Signed URL cache/refresh:
useSignedUrls
- Per-item conversion settings and global defaults
- Selection state per list (ready/converted)
UI modules:
- Ready queue: source files pending conversion.
- Converted files: output artifact listing + reconvert controls.
- Conversion settings panel: full target list with per-source capability disable rules.
- Fill PDF page: dedicated client route at
/app/projects/[projectId]/fill/[fileId].
2.3 PDF fill architecture
Core client:
app/app/projects/[projectId]/fill/[fileId]/FillPdfClient.tsx
Support modules:
field-label-resolver.ts
field-type-rules.ts
field-validation.ts
Pipeline:
- Load source bytes via signed URL.
- Render pages/widgets with PDF.js (client-only import path).
- Maintain editable field + overlay state in React.
- Build output bytes with pdf-lib.
- Validate bytes (
%PDF-, minimum length), then download and persist.
3) Backend API Architecture (Next.js Route Handlers)
3.1 Auth/session boundary
- Cognito JWT cookie validation in
app/app/api/_lib/auth.ts.
- Route handlers call
requireUser() and scope all access by USER#{sub} partition key.
3.2 File APIs
- Upload create/complete endpoints create raw file rows and presigned writes.
- File list endpoint can reconcile Dynamo rows against S3 object existence.
- Download endpoint returns short-lived signed inline/download URLs.
- Delete endpoint deletes one exact Dynamo row and one exact S3 key (safety rail logging included).
3.3 Conversion submission API
app/app/api/projects/[projectId]/convert/route.ts
- Validates conversion jobs against centralized capability matrix (
conversion-support.ts).
- Writes output rows in
processing state.
- Starts Step Functions executions for worker processing.
3.4 Filled PDF artifact APIs
- Create endpoint reserves
artifactType=filled_pdf output row + presigned upload URL.
- Upload endpoint accepts PDF bytes.
- Complete endpoint validates and marks row done.
4) Conversion Pipeline (Worker)
Worker entrypoint:
infra/lib/lambdas/convert-worker/index.ts
Helpers/scripts:
- Python scripts for DOCX/PAGES/image-specialized operations.
lib/formats.ts for format/content-type/extension mapping.
High-level conversion flow:
- Read source metadata from DynamoDB.
- Download source object from raw bucket.
- Detect source kind (content-aware, not filename-only).
- Route to conversion path:
- image -> image/pdf
- document -> canonical PDF -> target outputs
- special input handling (SVG sanitization, AVIF/HEIC/ICO paths, PAGES fallback logic)
- Upload output to output bucket.
- Update output row (status, contentType, size, packaging/page metadata).
4.1 SVG handling
- Sanitization removes dangerous constructs while accepting common real-world SVG input.
- Rendering uses deterministic defaults for dimensions and transparency behavior.
4.2 PAGES handling
- Canonical PDF extraction pipeline attempts embedded preview PDF first.
- Falls back to preview-image assembly when PDF preview is missing.
- Canonical artifact feeds downstream format conversion.
5) Capability Matrix and Validation
Single source of truth:
app/app/_lib/conversion-support.ts
Contains:
- Supported input labels
- Supported output labels
- Conversion matrix by source label/content type
- Recommendation priority and popular target mapping
- Helper functions used by both UI and API
Rule: UI can disable options, but backend remains authoritative and rejects unsupported requests.
6) Data and Storage Model
6.1 DynamoDB entity model
Table: SecureDocApp
Primary patterns:
PK = USER#{sub}
SK = PROJECT#{projectId}
SK = FILE#{projectId}#{fileId}
File row fields (selected):
-
-
artifactType: conversion |
filled_pdf (output rows) |
-
status: queued |
processing |
done |
failed |
bucket, key, contentType, sizeBytes
sourceFileId, outputFormat, packaging, pageCount, outputCount
6.2 S3 key model
- Raw uploads bucket: private source objects.
- Output bucket: converted artifacts and filled PDFs, user/project namespaced.
6.3 Artifact lifecycle
Raw file: queued -> done (upload complete)
Output conversion: processing -> done|failed
Filled PDF: processing -> done|failed
Deleted artifact: explicit user delete only (no implicit cascade)
7) Orchestration and Infrastructure
CDK stack:
infra/lib/storage-stack.ts
Provisions:
- KMS key (encryption)
- DynamoDB table
- Raw/output S3 buckets with CORS and SSL enforcement
- Lambda DockerImageFunction for conversion worker
- Step Functions state machine invoking worker
Entrypoints:
infra/bin/app.ts (primary, referenced by infra/cdk.json)
infra/bin/infra.ts (alternate entrypoint present in repo)
8) Operational Boundaries and Contracts
- API routes are Node runtime (
runtime = "nodejs") where required.
- Signed URLs are short TTL and generated server-side.
- Conversion and fill persistence must write exact object keys; deletion is key-scoped.
- Generated directories (
.next, infra/dist, infra/cdk.out) are build artifacts and not source-of-truth.