Back to Documentation
Reference Guides

AWS Quick Reference Guide

Rapid decision-making guide with quick reference checklists, decision trees, and cost estimation formulas for AWS architecture.

25 min read
Updated Dec 15, 2025
ReferenceQuick GuideDecision TreesCost Estimation

AWS Universal Architecture Framework - Quick Reference Guide

For rapid decision-making and architecture reviews


Part A: The 10-Dimension Rapid Assessment Checklist

Use this checklist when you have 30 minutes to understand a problem and make initial architectural decisions.

1. Business Intent (5 min)

  • What's the core value? (revenue driver, cost saving, risk mitigation)
  • Go-to-market timeline? (MVP weeks, scale months, mature years)
  • User base size? (10, 1k, 1M+)
  • Regulatory constraints? (HIPAA, PCI-DSS, GDPR, SOX, none)
  • Risk tolerance? (experimental, acceptable, mission-critical)

Output: 1-sentence business thesis


2. User & System Actors (3 min)

  • Users: How many? Which regions? Which devices/clients?
  • Concurrent users during peak? (10, 100, 1k, 10k+)
  • API integrations? (B2B partners, internal systems, mobile SDKs)
  • Automation integrators? (batch jobs, webhooks, event-driven)

Output: Actor matrix (type, count, geography, concurrency)


3. Data Characteristics (5 min)

  • Volume: GB? TB? PB?
  • Velocity: Batch (daily/hourly), streaming (continuous), real-time (milliseconds)?
  • Variety: SQL, JSON, images, time-series, unstructured?
  • Sensitivity: Public, internal, confidential, PII/regulated?
  • Retention: Transactional (months), historical (years), archive (indefinite)?

Output: Data classification (volume, velocity, variety, sensitivity)


4. Workload Type (3 min)

  • Synchronous? (API calls, user waits)
  • Asynchronous? (queues, events, background jobs)
  • Batch? (scheduled, high volume)
  • Streaming? (continuous, low-latency per event)
  • Long-running? (workflows, multi-step processes)

Output: Primary + secondary workload archetypes


5. Traffic & Scale (3 min)

  • Baseline requests/sec? (1, 10, 100, 1k, 10k+)
  • Peak requests/sec? (2x, 10x, 100x baseline)
  • Data transfer? (MB/s, GB/s)
  • Burst frequency? (never, daily, hourly, continuous)
  • Growth rate? (stable, linear, exponential)

Output: Traffic profile (baseline, peak, growth trajectory)


6. Availability & Durability (3 min)

  • RTO (Recovery Time Objective): Hours, minutes, seconds?
  • RPO (Recovery Point Objective): Days, hours, zero data loss?
  • Criticality: Development, non-critical, critical, mission-critical?
  • Failover: Manual, automatic, multi-region?
  • Data protection: Snapshots, replicas, event sourcing?

Output: Availability matrix (component, RTO, RPO, strategy)


7. Security & Compliance (3 min)

  • Data classification? (public, internal, confidential, PII)
  • Compliance frameworks? (HIPAA, PCI-DSS, GDPR, SOX, FedRAMP, none)
  • Encryption required? (at-rest, in-transit, both, none)
  • Access control model? (role-based, attribute-based, IP-restricted)
  • Audit/logging: None, basic, comprehensive?

Output: Security posture summary


8. Cost Sensitivity (2 min)

  • Budget: Unconstrained, $100/month, $1k/month, $10k+/month?
  • Cost model preference: CapEx, OpEx, or variable?
  • Commitment level: PAYG, 1-year, 3-year?
  • Cost optimization priority: Low, medium, high?

Output: Cost constraints and optimization targets


9. Operational Complexity (2 min)

  • Team maturity: Beginners, intermediate, advanced?
  • DevOps/SRE capability: None, basic, strong?
  • Operational burden tolerance: Very low, low, moderate?
  • Tools and processes: Existing, need to build, greenfield?

Output: Operational profile and staffing implications


10. Extensibility & Evolution (2 min)

  • Change frequency: Rarely, quarterly, monthly, weekly?
  • Integration needs: None, few, many?
  • Architecture path: Monolith OK, need microservices?
  • Technology lock-in tolerance: High, medium, low?

Output: Evolution roadmap outline


Part B: Service Selection Quick Decision Trees

Compute Selection

"How long does work run?"
├─ < 15 min
│  └─ "Variable load?"
│     ├─ Yes → Lambda (cost optimized)
│     └─ No → "Latency critical?"
│        ├─ Yes → ECS Fargate
│        └─ No → EC2 (if baseline cost justified)
├─ 15 min - 1 hour
│  └─ "Batch or service?"
│     ├─ Batch → Batch or EMR
│     └─ Service → ECS or EKS
└─ > 1 hour
   └─ EMR (distributed) or EC2 (standalone)

Cost Rule of Thumb:

  • Lambda: Best for < 100 req/sec, unpredictable
  • ECS: Best for 100-1000 req/sec, moderate peaks
  • EC2 Reserved: Best for > 1000 req/sec, predictable

Database Selection

"Access pattern?"
├─ Complex queries (joins, aggregations)
│  └─ "Scale (TB)?"
│     ├─ < 100 GB → RDS
│     └─ > 100 GB → Aurora or Redshift
├─ Key-value lookups, real-time
│  └─ DynamoDB
├─ Full-text search, logs
│  └─ OpenSearch
├─ Time-series metrics
│  └─ Timestream
└─ Graph relationships
   └─ Neptune

Cost Rule of Thumb:

  • DynamoDB on-demand: Best for variable traffic (0 baseline cost)
  • RDS with Savings Plans: Best for predictable SQL workloads
  • Redshift: Best for data warehouse (> 1TB, complex BI queries)

Storage Selection

"Data type?"
├─ Objects (files, media, archives)
│  └─ S3
├─ Block storage (EC2 volumes)
│  └─ EBS
├─ Shared filesystem
│  └─ "Windows required?"
│     ├─ Yes → FSx for Windows
│     └─ No → EFS
└─ Data lake (structured + unstructured)
   └─ S3 with Lake Formation

Integration Selection

"Communication pattern?"
├─ One-to-many (fanout)
│  └─ "Message history needed?"
│     ├─ Yes → EventBridge with archive
│     └─ No → SNS (simple)
├─ One-to-one (queue)
│  └─ "Ordering critical?"
│     ├─ Yes → SQS FIFO
│     └─ No → SQS Standard
├─ Streaming (continuous, ordered by shard)
│  └─ Kinesis
├─ Complex routing (rules, multiple conditions)
│  └─ EventBridge
└─ Multi-step workflow
   └─ Step Functions

Part C: Well-Architected Review Checklist (60 min)

Operational Excellence (12 min)

  • Are teams organized by business outcome (not technology)?
  • Can any team member quickly explain the system?
  • Is every component observable? (metrics, logs, traces)
  • Are deployments automated and safe? (blue-green, canary)
  • Do runbooks exist for common issues?
  • MTTR < 15 min for P1 incidents?
  • Incident postmortems conducted and improvements tracked?

Scoring: Count checks. 6-7 = Excellent; 4-5 = Good; < 4 = Needs improvement


Security (12 min)

  • All principals (users, roles, services) authenticated?
  • IAM policies follow least-privilege (not wildcards, not *)?
  • All data encrypted at-rest (KMS) and in-transit (TLS)?
  • All API calls logged to CloudTrail?
  • Data classification defined and enforced?
  • Secrets (DB passwords, API keys) in Secrets Manager, not code?
  • PII/sensitive data protected per regulation?

Scoring: 6-7 = Excellent; 4-5 = Good; < 4 = Needs improvement


Reliability (12 min)

  • Can any single service/component fail without total outage?
  • Are failure recovery steps tested quarterly?
  • RTO/RPO targets defined and met?
  • Backups automated and verified?
  • Multi-AZ deployment for critical components?
  • Circuit breakers and retry logic in place?
  • Graceful degradation if capacity exceeded?

Scoring: 6-7 = Excellent; 4-5 = Good; < 4 = Needs improvement


Performance Efficiency (12 min)

  • API latency p99 < target (e.g., 200ms)?
  • Caching used for expensive operations? (CloudFront, ElastiCache)
  • Database queries optimized? (indexes, query plans reviewed)
  • Async patterns used where synchronous not required?
  • Compute resources right-sized? (not overprovisioned)
  • Network optimized? (local processing, VPC Endpoints for AWS services)
  • No N+1 queries or polling loops?

Scoring: 6-7 = Excellent; 4-5 = Good; < 4 = Needs improvement


Cost Optimization (12 min)

  • Current monthly cost documented and justified?
  • Cost per transaction calculated? (trending down?)
  • 70%+ of compute cost on Reserved/Savings Plans?
  • Right-sizing recommendations from Trusted Advisor implemented?
  • Unused resources (unattached volumes, stopped instances, idle databases) cleaned up?
  • Data in cheaper tiers? (Glacier for archive, Intelligent-Tiering for unknown access)
  • Cost allocation tags applied? (business unit, application, environment)

Scoring: 6-7 = Excellent; 4-5 = Good; < 4 = Needs improvement


Sustainability (8 min)

  • Managed services used where possible? (no idle infrastructure)
  • Auto-scaling configured? (no over-provisioned capacity)
  • Data stored in efficient tiers? (lifecycle policies active)
  • No unnecessary data copies or inter-region transfers?
  • Instance types modern? (Graviton, Trainium considered)

Scoring: 4-5 = Excellent; 2-3 = Good; < 2 = Needs improvement


Overall Score: (OE + Security + Reliability + Performance + Cost + Sustainability) / 6

  • 90-100: Well-architected; ready for production
  • 75-89: Good; address gaps in lower-scoring pillars
  • 60-74: Needs improvement; prioritize security, reliability, cost
  • < 60: Critical issues; address before production

Part D: Cost Estimation Quick Formulas

Lambda

Cost = (Requests × 0.0000002) + (GB-seconds × 0.0000166667)

Example: 1M requests/month, 1GB, 100ms execution
= (1M × 0.0000002) + (1M × 0.1 sec × (1GB/1024) × 0.0000166667)
= 0.2 + 1.63
= ~$1.83/month

DynamoDB

On-Demand Cost = (Writes × 1.25) + (Reads × 0.25) per million

Example: 100k writes, 1M reads/month
= (100k × 1.25) + (1M × 0.25) / 1M
= 0.125 + 0.25
= $0.375/month (CHEAP!)

Provisioned Cost = (WCU × 1.25 × 730 hours) + (RCU × 0.25 × 730 hours) + Storage

Example: 10 WCU, 100 RCU, 100 GB
= (10 × 1.25 × 730) + (100 × 0.25 × 730) + (100 × 0.25)
= 9,125 + 18,250 + 25
= ~$27,400/month (EXPENSIVE!)

Breakeven: On-demand cheaper if < ~1M writes/day or < 3M reads/day

RDS

Cost = (Instance-hours × instance_rate) + (Storage GB × $0.10/month) + Backups

Example: db.t3.small (0.66/hour), 100GB storage
= (730 hours × $0.066) + (100 × $0.10)
= 48.18 + 10
= ~$58/month

Redshift

Cost = (Nodes × node_cost/hour × 730 hours) + (Storage × $0.10/month)

Example: 2 dc2.large nodes
= (2 × $1.26 × 730) + storage
= $1,841/month + storage

S3

Cost = (Storage GB × tier_rate) + (Requests × request_rate) + Transfer

Example: 1000 GB Standard, 1M requests/month, 100 GB outbound
= (1000 × $0.023) + (1M × 0.0004) + (100 × $0.09)
= 23 + 0.4 + 9
= ~$32.40/month

Part E: Decision Documentation Template

Use this template for every major architectural decision.

1# Architecture Decision Record (ADR) 2 3## Title: [Service Choice or Pattern Adoption] 4 5### Status: Proposed | Accepted | Deprecated 6 7### Context 8- Business intent: [What problem are we solving?] 9- Scale requirements: [Traffic, data volume, growth] 10- Constraints: [Budget, compliance, team skill, timeline] 11 12### Decision 13We will use **[AWS Service(s)]** to [implement pattern/solve problem]. 14 15### Rationale 161. Functional requirement alignment: [How does it solve the problem?] 172. Non-functional trade-offs: [Cost, latency, operational burden, scalability] 183. Alternative considered: [Why not Lambda/ECS/RDS/etc?] 194. Cost analysis: [\$X/month at baseline, \$Y/month at 2x peak] 205. Well-Architected alignment: 21 - **Operational Excellence**: [Observability, automation, procedures] → ✓ / ✗ 22 - **Security**: [Encryption, access control, auditing] → ✓ / ✗ 23 - **Reliability**: [Failover, backups, RTO/RPO] → ✓ / ✗ 24 - **Performance**: [Latency, throughput, caching] → ✓ / ✗ 25 - **Cost**: [Cost-effective at scale?] → ✓ / ✗ 26 - **Sustainability**: [Efficient resource use?] → ✓ / ✗ 27 28### Consequences 29- Positive: [Faster time-to-market, lower cost, better scalability] 30- Negative: [Higher operational burden, vendor lock-in, cold starts] 31- Risks: [Single-region dependency, cache invalidation complexity] 32- Mitigation: [Add multi-AZ, circuit breaker, monitoring] 33 34### Evolution Path 351. MVP (Months 0-6): Use this service as-is 362. Scale (Months 6-12): Migrate to [alternative] if [specific metrics exceed thresholds] 373. Mature (Year 2+): Consider [next-level optimization] 38 39### Approval 40- Reviewed by: [Architect name] 41- Approved by: [Tech lead, manager] 42- Date: [YYYY-MM-DD]

Part F: Production Readiness Checklist

Before going live, verify:

Code & Deployment (Week before launch)

  • Code reviewed and merged to main
  • Unit tests pass (> 80% coverage)
  • Integration tests pass
  • CI/CD pipeline builds and deploys successfully
  • Secrets (DB passwords, API keys) in AWS Secrets Manager, not code
  • Blue-green or canary deployment tested
  • Rollback procedure documented and tested

Infrastructure & Security (Week before launch)

  • All infrastructure as code (CloudFormation/Terraform/CDK)
  • Security group rules follow least-privilege
  • Data encrypted at-rest (KMS) and in-transit (TLS)
  • IAM roles follow least-privilege (no wildcards)
  • Secrets Manager configured for credential rotation
  • VPC design reviewed (public/private subnets correct)
  • CloudTrail enabled for audit logging
  • WAF configured for web services

Monitoring & Alerting (Week before launch)

  • CloudWatch dashboards created
  • Key metrics identified (latency p99, error rate, throughput)
  • Alarms configured for:
    • Error rate > 1%
    • Latency p99 > acceptable threshold
    • CPU > 80%
    • Database connection pool > 80%
    • Auto-scaling triggered
  • On-call team assigned
  • Escalation procedure documented

Data & Backups (Week before launch)

  • Backup strategy defined and tested (RTO/RPO verified)
  • Multi-AZ replication enabled
  • Point-in-time recovery tested
  • Cross-region disaster recovery plan documented
  • Data retention policies configured

Documentation & Runbooks (Week before launch)

  • Architecture diagram created and shared
  • API documentation complete (OpenAPI/Swagger)
  • Runbooks for common operational procedures:
    • Incident response
    • Scaling up/down
    • Failover procedures
    • Database migration
  • Deployment procedures documented
  • Rollback procedures documented

Load Testing (3 days before launch)

  • Load test at 2x expected peak traffic
  • Latency p99 < target
  • No errors under load
  • Auto-scaling triggers correctly
  • Circuit breakers prevent cascading failures

Security Audit (3 days before launch)

  • Penetration testing completed (if applicable)
  • OWASP Top 10 vulnerabilities checked
  • IAM permissions reviewed
  • No hardcoded secrets in code
  • Compliance requirements (HIPAA, PCI, GDPR) met

Final Sign-off (Day of launch)

  • Business owner approves launch
  • Tech lead approves launch
  • Security team approves launch
  • On-call team briefed
  • Incident response plan activated
  • Gradual rollout plan confirmed (5% → 25% → 50% → 100%)

Part G: Service Cost Comparison Matrix (Monthly Estimates)

Use CaseLambdaECS FargateEC2 (On-Demand)EC2 (Reserved)RDSDynamoDB
API Backend (100 req/sec)$300$600$800$300$60$50
API Backend (1000 req/sec)$2,500$1,200$2,400$900$200$400
API Backend (10000 req/sec)$25,000$6,000$12,000$4,000$800$2,000
Batch Job (daily)$20$200$500$100N/AN/A
Streaming (1M events/day)$100$400VariesVariesN/A$50
Data Warehouse (100GB)N/AN/AN/AN/A$100N/A
Data Lake (1TB)N/AN/AN/AN/AN/AN/A

Note: Estimates are rough; actual costs depend on specific implementation details. Use AWS Pricing Calculator for accuracy.


Key Takeaways

  1. Business intent first: Let business drivers (speed to market, cost, reliability) guide architecture
  2. Measure before optimizing: Don't over-engineer; prove the requirements justify the complexity
  3. Use managed services by default: Reduce operational burden; AWS optimizes for efficiency
  4. Right-size continuously: Monitor utilization; resize monthly to match actual demand
  5. Design for failure: Test recovery scenarios; prepare runbooks before incidents
  6. Align with Well-Architected: Security and operational excellence are non-negotiable
  7. Document decisions: ADRs help teams understand rationale and avoid repeated debates
  8. Plan for evolution: MVP → Scale → Mature path provides clarity for multi-year roadmap
  9. Monitor costs relentlessly: Cost anomalies reveal inefficiencies
  10. Learn from incidents: Postmortems and blameless cultures drive continuous improvement

Last Updated: December 15, 2024
Framework Version: AWS Well-Architected (June 2024)

Back to all documentation
Last updated: Dec 15, 2025