CloudPro

Operations Guide

Author: Will Soltani

1) Production Topology

2) Monitoring and Alerting

2.1 Key signals

3) Logs and Diagnostics

3.1 API logs

3.2 Worker logs

3.3 Fill PDF logs

4) Common Failure Modes and Troubleshooting

4.1 Upload or completion failures

Checklist:

  1. Validate presigned URL expiration and method.
  2. Check required headers (Content-Type) on upload.
  3. Verify DynamoDB write permission and key format.

4.2 Conversion stuck in processing

Checklist:

  1. Confirm Step Functions execution started.
  2. Inspect worker Lambda logs for source detection/conversion errors.
  3. Verify RAW/OUTPUT bucket object permissions and KMS grants.
  4. Validate output row update path did not fail on reserved-name aliasing.

4.3 Download URL failures

Checklist:

  1. Confirm file row has valid bucket/key.
  2. Check object exists in S3 (HEAD).
  3. Confirm signed URL TTL and clock drift assumptions.

4.4 Filled PDF persistence failed after local download

Checklist:

  1. Verify create/upload/complete filled endpoints are reachable.
  2. Confirm output bucket policy allows write for app role.
  3. Confirm frontend sends bytes as binary payload (not stringified).

5) Performance Tuning Knobs

5.1 Lambda worker

5.2 Concurrency

5.3 Caching and fetch behavior

6) Security Checklist

7) Deployment and Rollback Notes

8) Unknown/Not Found