Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 23 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ This repository is intended to demonstrate backend/platform engineering depth, n
- PostgreSQL-backed business data and audit persistence
- structured runtime and PDP audit logging
- Docker Compose local development workflow
- AWS deployment with ECR, ECS/Fargate, ALB, RDS PostgreSQL, Secrets Manager, IAM, and CloudWatch
- AWS deployment with ECR, ECS/Fargate, ALB, private ECS networking, VPC endpoints, RDS PostgreSQL, Secrets Manager, IAM, and CloudWatch
- rerunnable SQL migrations
- one-off ECS operational tasks for RDS migrations and dev credential seeding
- local and AWS MCP smoke-test helpers
Expand Down Expand Up @@ -278,7 +278,11 @@ Implemented AWS infrastructure:
- HTTP listener
- target group registration for ECS tasks
- VPC, public subnets, private app subnets, and private DB subnets
- security groups for ALB, app tasks, and RDS
- private app route table for private ECS task subnets
- VPC endpoints for private AWS service access:
- interface endpoints for ECR API, ECR Docker registry, CloudWatch Logs, and Secrets Manager
- S3 gateway endpoint associated with the private app route table
- security groups for ALB, app tasks, AWS service interface endpoints, and RDS
- private RDS PostgreSQL instance
- RDS-managed database password secret
- manually-created Secrets Manager secret for `AGENT_CREDENTIAL_HASH_SECRET`
Expand Down Expand Up @@ -309,16 +313,21 @@ Verified AWS smoke tests:
- deployed `docs_tool` denies `doc2` with `DEFAULT_DENY`
- the deployed tool path resolves a DB-backed registered-agent credential through `X-Agent-Api-Key`

Current AWS development limitation:
Current AWS networking posture:

- ECS app tasks currently run in public subnets with `assignPublicIp=ENABLED`.
- This avoids NAT Gateway or VPC endpoints during the first runnable AWS slice.
- Inbound access remains restricted through security groups:
- Internet -> ALB on port `80`
- ALB -> ECS app task on port `8000`
- ECS app task -> RDS on port `5432`
- ALB nodes remain in public subnets and provide the public HTTP entry point.
- ECS/Fargate app tasks run in private app subnets with `assignPublicIp=DISABLED`.
- Running app tasks have private IPs only; they are registered with the ALB target group by private task IP.
- RDS PostgreSQL remains in private DB subnets.
- Private app task access to required AWS services uses VPC endpoints rather than a NAT Gateway:
- interface endpoints for ECR API, ECR Docker registry, CloudWatch Logs, and Secrets Manager
- S3 gateway endpoint associated with the private app route table
- App task security group egress is limited to required paths:
- RDS PostgreSQL on port `5432`
- AWS service interface endpoint security group on port `443`
- S3 endpoint prefix list on port `443`

This is a deliberate development-stage trade-off, not the intended production networking posture.
No NAT Gateway is currently deployed. That is intentional for this slice because the app does not yet need general outbound internet access to third-party APIs or arbitrary external services.

## Operational helper scripts

Expand Down Expand Up @@ -512,7 +521,7 @@ It is not currently trying to be:
- IdP integration
- database-backed policy authoring/storage
- production credential registry UI
- production-grade AWS networking hardening
- full production-grade AWS hardening beyond the current portfolio/dev deployment

The emphasis is on doing a smaller set of backend/platform concerns properly:

Expand All @@ -538,8 +547,7 @@ Credible next improvements include:
- extend immutable image tagging consistently across manual and Terraform-driven deployment paths
- HTTPS listener with ACM certificate
- optional HTTP-to-HTTPS redirect
- private ECS task networking without public task IPs
- NAT Gateway or VPC endpoints for outbound AWS service access
- optional NAT Gateway or controlled egress path only if future app behaviour requires general external access
- Terraform remote state backend
- migration version tracking
- production-grade registered-agent credential registration and rotation workflow
Expand All @@ -564,6 +572,8 @@ Current status:
- local Docker/PostgreSQL path works
- local and CI tests pass
- AWS ECS/RDS/ALB deployment path works
- ECS app tasks run in private app subnets with no public IP
- VPC endpoints provide private AWS service access for ECR, CloudWatch Logs, Secrets Manager, and S3
- AWS RDS migrations run through one-off ECS tasks
- AWS dev registered-agent credential seeding/rotation works
- deployed MCP allow and deny paths have been smoke-tested
Expand Down
39 changes: 26 additions & 13 deletions docs/TRACKER.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@ The supporting engineering story is backend/platform implementation depth:
- Docker-based local development
- SQL migrations and seed data
- AWS ECS/Fargate deployment
- private ECS/Fargate task networking
- VPC endpoints for private AWS service access
- RDS PostgreSQL runtime configuration
- Secrets Manager runtime secret injection
- CloudWatch log collection
Expand Down Expand Up @@ -215,10 +217,18 @@ Implemented AWS infrastructure includes:
- private DB subnets
- internet gateway
- public route table
- private app route table
- DB subnet group
- VPC endpoints for:
- ECR API
- ECR Docker registry
- CloudWatch Logs
- Secrets Manager
- S3
- security groups for:
- ALB
- ECS app task
- AWS service interface endpoints
- RDS PostgreSQL
- private RDS PostgreSQL instance
- RDS-managed database password secret
Expand Down Expand Up @@ -257,26 +267,27 @@ Verified AWS checks:
- The deployed ECS task definition uses the Git commit SHA image tag, not `latest`.
- The CD workflow waits for ECS service stability and checks `/health` after deployment.

Current intentional AWS development limitation:
Current AWS networking posture:

- ECS app tasks currently run in public subnets with `assignPublicIp=ENABLED`.
- This avoids adding NAT Gateway or VPC endpoints during the first runnable AWS vertical slice.
- Inbound access is still controlled by security groups:
- Internet -> ALB on port `80`
- ALB -> ECS app task on port `8000`
- ECS app task -> RDS on port `5432`
- ALB nodes run in public subnets and provide the public HTTP entry point.
- ECS/Fargate app tasks run in private app subnets with `assignPublicIp=DISABLED`.
- Running app tasks have no public IP.
- RDS PostgreSQL runs in private DB subnets.
- Private app task access to required AWS services is provided by VPC endpoints:
- interface endpoints for ECR API, ECR Docker registry, CloudWatch Logs, and Secrets Manager
- S3 gateway endpoint associated with the private app route table
- App task egress is restricted to RDS, the AWS service interface endpoint security group, and the S3 endpoint prefix list.
- No NAT Gateway is currently deployed; add one later only if the app needs general outbound access to external/non-AWS services.

Deferred AWS hardening:

- HTTPS listener with ACM certificate
- optional HTTP-to-HTTPS redirect
- private ECS task networking without public task IPs
- NAT Gateway or VPC endpoints for outbound AWS service access
- immutable image tags instead of deploying `latest`
- Terraform remote state backend
- migration version tracking
- production-grade credential registration/rotation workflow
- CI/CD deployment workflow
- CI-before-deploy safety clarification and deployment guardrails
- Terraform image tag handling alignment with SHA-based CD

---

Expand Down Expand Up @@ -366,7 +377,7 @@ The project is not currently trying to implement:
- SQLAlchemy/Alembic unless direct SQL becomes a real limitation
- broad AI governance platform features
- production credential registry UI
- production-grade AWS networking hardening
- full production-grade AWS hardening beyond the current portfolio/dev deployment

These are deliberate scope boundaries, not forgotten requirements.

Expand All @@ -384,7 +395,7 @@ Good next candidates:
- keep README, tracker, and AWS deployment docs aligned with the implemented runtime
- extend immutable image tagging consistently across manual and Terraform-driven deployment paths
- add HTTPS/ACM support for the ALB
- add private ECS task networking using NAT Gateway or VPC endpoints
- add NAT Gateway or another explicit egress path only if future external API access requires it
- add Terraform remote state
- add migration version tracking if migration reruns become harder to reason about
- formalize production-style registered-agent credential registration and rotation
Expand All @@ -410,6 +421,8 @@ The current stable implementation demonstrates:
- structured runtime/audit logging
- local Docker/PostgreSQL runtime
- AWS ECS/Fargate/RDS/ALB runtime
- private ECS task networking with no public task IP
- VPC endpoint-based AWS service access without NAT Gateway
- RDS-backed registered-agent identity resolution
- HMAC-hashed API-key identity adapter
- one-off ECS operational tasks
Expand Down
58 changes: 44 additions & 14 deletions docs/aws_deployment_target.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,17 @@

## Purpose

Define the AWS runtime shape for `aws-python-service-platform` before implementing Terraform.
Define the implemented AWS runtime shape for `aws-python-service-platform`.

## Target runtime path

```text
Client / MCP caller
-> Application Load Balancer
-> ECS Fargate service
-> ECS Fargate service in private app subnets
-> FastAPI + FastMCP app
-> RDS PostgreSQL
-> CloudWatch logs
-> RDS PostgreSQL in private DB subnets
-> CloudWatch logs via VPC endpoint
```

## AWS services
Expand All @@ -24,6 +24,7 @@ Client / MCP caller
| Database | RDS PostgreSQL |
| Secrets | Secrets Manager or SSM Parameter Store |
| Logs | CloudWatch Logs |
| Private AWS service access | VPC endpoints for ECR, CloudWatch Logs, Secrets Manager, and S3 |
| Runtime permissions | ECS task role |
| Infrastructure | Terraform |

Expand All @@ -43,9 +44,9 @@ The AWS deployment should keep the same application configuration contract used

The application code should continue reading configuration through the existing settings module. Terraform and ECS are responsible for supplying the correct runtime values.

## Initial deployment scope
## Implemented deployment scope

The first AWS deployment will run the existing service using RDS-backed configuration and CloudWatch logging.
The current AWS deployment runs the existing service using RDS-backed configuration, CloudWatch logging, private ECS task networking, and VPC endpoints for required AWS-service access.

## Deferred scope

Expand Down Expand Up @@ -74,15 +75,18 @@ flowchart TB
ECSService["ECS Service<br/>desired task count"]
TaskDef["Task Definition<br/>image + env + CPU/memory"]
ECR["ECR Repository<br/>container image"]
Logs["CloudWatch Logs<br/>container logs"]
Secrets["Secrets Manager<br/>runtime secrets"]
S3["S3<br/>ECR image layers"]
end

subgraph VPC["VPC: private network boundary"]

ALB["Logical Application Load Balancer"]
Listener["ALB Listener<br/>HTTP/HTTPS"]
TG["Target Group<br/>registered task IPs + health state"]
TG["Target Group<br/>registered private task IPs + health state"]

subgraph PublicA["Public Subnet A"]
subgraph PublicA["Public subnet A"]
ALBNodeA["ALB node / network interface<br/>AZ: eu-west-2a<br/>public-facing IP"]
end

Expand All @@ -91,11 +95,19 @@ flowchart TB
end

subgraph PrivateAppA["Private app subnet A"]
TaskA["Fargate task<br/>FastAPI container<br/>AZ: eu-west-2a<br/>private IP: 10.0.11.x:8000"]
TaskA["Fargate task<br/>FastAPI container<br/>AZ: eu-west-2a<br/>private IP only<br/>no public IP"]
end

subgraph PrivateAppB["Private app subnet B"]
TaskB["Fargate task<br/>FastAPI container<br/>AZ: eu-west-2b<br/>private IP: 10.0.12.x:8000"]
TaskB["Fargate task<br/>FastAPI container<br/>AZ: eu-west-2b<br/>private IP only<br/>no public IP"]
end

subgraph VPCEndpoints["VPC endpoints for AWS service access"]
EcrApiVpce["Interface endpoint<br/>ECR API"]
EcrDkrVpce["Interface endpoint<br/>ECR Docker registry"]
LogsVpce["Interface endpoint<br/>CloudWatch Logs"]
SecretsVpce["Interface endpoint<br/>Secrets Manager"]
S3GatewayVpce["Gateway endpoint<br/>S3 via private app route table"]
end

subgraph PrivateDB["Private DB subnets"]
Expand All @@ -111,15 +123,33 @@ flowchart TB
ALBNodeA --> Listener
ALBNodeB --> Listener
Listener --> TG
TG -->|"healthy target"| TaskA
TG -->|"healthy target"| TaskB
TG -->|"healthy private target"| TaskA
TG -->|"healthy private target"| TaskB

ECSCluster --> ECSService
ECSService --> TaskDef
TaskDef --> ECR
ECSService -->|"starts/registers tasks"| TaskA
ECSService -->|"starts/registers tasks"| TaskB
ECSService -->|"starts/registers private tasks"| TaskA
ECSService -->|"starts/registers private tasks"| TaskB

TaskA --> RDS
TaskB --> RDS

TaskA -->|"HTTPS 443"| EcrApiVpce
TaskA -->|"HTTPS 443"| EcrDkrVpce
TaskA -->|"HTTPS 443"| LogsVpce
TaskA -->|"HTTPS 443"| SecretsVpce
TaskA -->|"S3 route"| S3GatewayVpce

TaskB -->|"HTTPS 443"| EcrApiVpce
TaskB -->|"HTTPS 443"| EcrDkrVpce
TaskB -->|"HTTPS 443"| LogsVpce
TaskB -->|"HTTPS 443"| SecretsVpce
TaskB -->|"S3 route"| S3GatewayVpce

EcrApiVpce --> ECR
EcrDkrVpce --> ECR
LogsVpce --> Logs
SecretsVpce --> Secrets
S3GatewayVpce --> S3
```
Loading