Infrastructure

The Subscription Service is deployed entirely on AWS using Terraform. All infrastructure is defined in deploy/ and managed per-environment (test, prod). The application identifier is subscription-service. Terraform state is stored in S3 with DynamoDB locking (terraform-locking table, key terraform/subscription-service/terraform.tfstate). AWS and Datadog providers are required; the AWS provider assumes a GitHub Actions role (site-github-actions-services-role) for all operations.

Lambda Functions

Function Trigger Timeout Memory Key IAM Permissions

site-subscription-service-api

API Gateway (site-subscription-service, AWS IAM)

900s

512 MB

DynamoDB read/write/delete (billing-activity, billing-activity-history, locks); execute-api:Invoke (Payments, User Service, TXN Service, Underwriting); Secrets Manager (Segment, GrowthBook, AppsFlyer)

site-subscription-service-collections-worker

SQS (collections-scheduled, collections-retry, collections-pause); max concurrency 50

54s (90% of queue visibility timeout of 60s)

default

DynamoDB read/write (billing-activity, billing-activity-history, locks); SQS SendMessage/DeleteMessage/Receive (all three collection queues); execute-api:Invoke (TXN Service, Payments, User Service, Insight Service); SageMaker InvokeEndpoint; Secrets Manager (Segment, GrowthBook, Iterable, AppsFlyer)

site-subscription-service-collections-job

EventBridge rules (subscription-collections-scheduled, subscription-collections-retry, subscription-collections-pause); SQS paging queues (scheduled-collections-pages, retry-collections-pages, pause-collections-pages)

810s (90% of paging queue visibility timeout of 900s)

2048 MB

SQS SendMessage (collections-scheduled, collections-retry, collections-pause); SQS full access on paging queues; DynamoDB Query (billing-activity)

site-subscription-service-memberships

Kinesis (site-user-service-users) — filtered to 12 event types

840s

default

Kinesis DescribeStream/GetRecords/GetShardIterator/ListStreams (users stream); DynamoDB read/write/BatchWriteItem (billing-activity, billing-activity-history, locks); execute-api:Invoke (User Service)

site-subscription-service-ach-handler

Kinesis (site-payments) — filtered to 4 subscription event types

840s

default

Kinesis DescribeStream/GetRecords/GetShardIterator/ListStreams (payments stream); DynamoDB read/write/BatchWriteItem (billing-activity, billing-activity-history, locks); execute-api:Invoke (User Service, Admin API, TXN Service); Secrets Manager (Segment, Iterable, AppsFlyer)

site-subscription-service-kinesis-feeder

DynamoDB Stream on billing-activity table

840s

default

Kinesis PutRecord/PutRecords (site-subscriptions); DynamoDB stream read (DescribeStream, GetRecords, GetShardIterator, ListStreams, ListShards). Deployed in the DynamoDB table region via dedicated AWS provider alias.

site-subscription-service-batch-worker

SQS (site-subscription-service-batch-worker); max concurrency 5

54s (90% of queue visibility timeout of 60s)

default

DynamoDB read/write (billing-activity, billing-activity-history, locks); SQS DeleteMessage/GetQueueAttributes/ReceiveMessage/SendMessage; execute-api:Invoke (Payments, User Service); Secrets Manager (Segment)

site-subscription-service-webhook-worker

SQS (site-subscription-service-income-event-tap); max concurrency 5

810s (90% of queue visibility timeout of 900s)

default

DynamoDB read/write (billing-activity, billing-activity-history, locks); SQS DeleteMessage/GetQueueAttributes/ReceiveMessage; execute-api:Invoke (Payments, TXN Service, User Service); Secrets Manager (Segment, GrowthBook)

site-subscription-service-webhook-balance-worker

SQS (site-subscription-service-balance-event-tap); max concurrency 5

810s (90% of queue visibility timeout of 900s)

default

DynamoDB read/write (billing-activity, billing-activity-history, locks); SQS DeleteMessage/GetQueueAttributes/ReceiveMessage; execute-api:Invoke (Payments, User Service); Secrets Manager (Segment, GrowthBook, AppsFlyer)

site-subscription-service-notifier-scheduler

EventBridge rule (pre_subscription_notifier, cron(0 12 ? * * *)); SQS paging queue (site-subscription-service-pre-subscription-notifier-scheduler)

810s (90% of paging queue visibility timeout of 900s)

default

DynamoDB Query (billing-activity); SQS DeleteMessage/GetQueueAttributes/ReceiveMessage/SendMessage (scheduler and worker queues). Enabled prod-only.

site-subscription-service-notifier-worker

SQS (site-subscription-service-pre-subscription-notifier-worker); max concurrency 5

810s (90% of queue visibility timeout of 900s)

default

execute-api:Invoke (TXN Service, Payments, User Service); SQS DeleteMessage/GetQueueAttributes/ReceiveMessage; Secrets Manager (Iterable, GrowthBook)

DynamoDB Tables

Table Region Streams Consumed By Notes

billing-activity

Configured via dynamo_legacy_region variable (us-east-1 in prod)

Yes — stream ARN used by kinesis-feeder

api, collections-worker, collections-job, memberships, ach-handler, batch-worker, webhook-worker, webhook-balance-worker, notifier-scheduler

Primary subscription state table. All subscription records and billing activity stored here. Stream drives the kinesis-feeder Lambda which publishes to site-subscriptions Kinesis.

billing-activity-history

Configured via dynamo_legacy_region variable (us-east-1 in prod)

No

api, collections-worker, memberships, ach-handler, batch-worker, webhook-worker, webhook-balance-worker

Historical billing activity records. Written alongside billing-activity for audit and history queries.

locks

Configured via dynamo_legacy_region variable (us-east-1 in prod)

No

api, collections-worker, memberships, ach-handler, batch-worker, webhook-worker, webhook-balance-worker

Distributed locking table (cirello.io/dynamolock pattern). All collection Lambdas acquire locks before processing to serialise concurrent attempts. Lock operations require GetItem, PutItem, UpdateItem, Query, and DeleteItem (for release).

All three tables are in the legacy DynamoDB region (configured via dynamo_legacy_region). They pre-date per-environment namespacing and do not carry an environment prefix. A dedicated aws.dynamodb provider alias is used to deploy resources and read data sources in that region.

SQS Queues

Queue Visibility Timeout Max Receive Count DLQ Purpose

site-subscription-service-collections-scheduled

60s

5

site-subscription-service-collections-scheduled-dlq (900s)

Receives scheduled subscription collection jobs from collections-job. Consumed by collections-worker (batch size 10, ReportBatchItemFailures).

site-subscription-service-collections-retry

60s

5

site-subscription-service-collections-retry-dlq (900s)

Receives retry collection jobs from collections-job. Consumed by collections-worker (batch size 10, ReportBatchItemFailures).

site-subscription-service-collections-pause

60s

5

site-subscription-service-pause-retry-dlq (900s)

Receives pause-state collection jobs from collections-job. Consumed by collections-worker (batch size 10, ReportBatchItemFailures).

site-subscription-service-batch-worker

60s

None configured

None

Receives manual batch collection requests. Consumed by batch-worker (batch size 10).

site-subscription-service-income-event-tap

900s

1

site-subscription-service-income-event-tap-dlq (900s)

Receives income detection events routed from EventBridge (income_txn events where amount < -$75.00). Consumed by webhook-worker (batch size 10, batching window 10s in prod, ReportBatchItemFailures).

site-subscription-service-balance-event-tap

900s

1

site-subscription-service-balance-event-tap-dlq (900s)

Receives balance update events routed from EventBridge (new_account events from txn-service.feeder). Consumed by webhook-balance-worker (batch size 10, batching window 10s in prod, ReportBatchItemFailures). Has a 20s delivery delay in test to avoid race conditions with integration tests.

site-subscription-service-pre-subscription-notifier-scheduler

900s

5

site-subscription-service-pre-subscription-notifier-scheduler-dlq (900s)

Paging queue for notifier-scheduler. Used by notifier-scheduler to paginate through subscriptions requiring pre-subscription notifications (batch size 1).

site-subscription-service-pre-subscription-notifier-worker

900s

5

site-subscription-service-pre-subscription-notifier-worker-dlq (900s)

Receives individual user notification jobs from notifier-scheduler. Consumed by notifier-worker (batch size 10, max concurrency 5, ReportBatchItemFailures).

site-subscription-service-scheduled-collections-pages

900s

5

site-subscription-service-scheduled-collections-pages-dlq (900s)

Paging queue for collections-job scheduled run. Enables paginated DynamoDB scans for subscriptions due for scheduled collection (batch size 1).

site-subscription-service-retry-collections-pages

900s

5

site-subscription-service-retry-collections-pages-dlq (900s)

Paging queue for collections-job retry run. Enables paginated DynamoDB scans for subscriptions eligible for retry collection (batch size 1).

site-subscription-service-pause-collections-pages

900s

5

site-subscription-service-pause-collections-pages-dlq (900s)

Paging queue for collections-job pause run. Enables paginated DynamoDB scans for subscriptions in paused state eligible for collection (batch size 1).

Kinesis Streams

Stream Direction Producers Consumers Purpose

site-user-service-users

Inbound (external)

User Service

site-subscription-service-memberships

Membership lifecycle events from the User Service. The memberships Lambda filters for 12 event types: UPGRADE, DOWNGRADE, CANCEL, RETRACT, AUTODOWNGRADED, GONETOCOLLECTIONS, PAYNOW, SUB_PAUSED, UNPAUSE, UNPAUSE_CHARGE, CLOSEACCOUNT, REACTIVATE. Batch size 100, starting position LATEST, BisectBatchOnFunctionError enabled.

site-payments

Inbound (external)

Payments Service

site-subscription-service-ach-handler

ACH settlement events from the Payments Service. The ach-handler filters for 4 event types: SUBSCRIPTION_COMPLETED, SUBSCRIPTION_RETURNED, SUBSCRIPTION_REFUNDED, SUBSCRIPTION_CHARGED_BACK. Batch size 100, parallelization factor 2, starting position LATEST, BisectBatchOnFunctionError enabled, ReportBatchItemFailures enabled.

site-subscriptions

Outbound (internal)

site-subscription-service-kinesis-feeder (from billing-activity DynamoDB stream)

Downstream consumers (e.g., other FloatMe services)

Internal subscription event stream. The kinesis-feeder Lambda reads every change from the billing-activity DynamoDB stream and publishes records to this Kinesis stream. Starting position LATEST. Deployed in the DynamoDB region.

EventBridge

Rule / Pattern Bus Purpose

subscription-collections-scheduled
cron(0 8 ? * MON-FRI *)

Default event bus

Fires at 08:00 UTC Mon–Fri. Triggers collections-job with {"detail": {"process": "scheduled"}}. Runs the scheduled subscription collection pass. Enabled in prod only.

subscription-collections-retry
cron(0 7 ? * MON-FRI *)

Default event bus

Fires at 07:00 UTC Mon–Fri. Triggers collections-job with {"detail": {"process": "retry"}}. Runs the daily retry collection pass. Enabled in prod only.

subscription-collections-pause
cron(0 22 ? * MON-FRI *)

Default event bus

Fires at 22:00 UTC Mon–Fri (17:00 CST). Triggers collections-job with {"detail": {"process": "pause"}}. Runs the pause-state collection pass. Always enabled.

site-subscription-service-income-detected
Event pattern: detail-type: ["income_txn"], source: ["insight-service.income"], detail.amount < -7500 (cents, i.e., < -$75.00)

Default event bus

Routes income detection events from the Insight Service where transaction amount is less than -$75.00 (i.e. detail.amount < -7500 cents — deposits of $75.00 or more) to the site-subscription-service-income-event-tap SQS queue. Consumed by the webhook-worker.

site-subscription-service-balance-detected
Event pattern: detail-type: ["new_account"], source: ["txn-service.feeder"], detail.is_main: [true], one of balances.available >= 0, balances.current >= 0, or balances.calc_available >= 0

Default event bus

Routes balance update events from the TXN Service feeder for main accounts with a non-negative balance to the site-subscription-service-balance-event-tap SQS queue. Consumed by the webhook-balance-worker.

pre_subscription_notifier
cron(0 12 ? * * *)

Default event bus

Fires daily at 12:00 UTC. Triggers notifier-scheduler with {"detail": {"status": "SCHEDULED"}} to initiate the pre-subscription notification run. Enabled in prod only.

Secrets Manager

All secrets are namespaced by environment (site/…​).

Secret Path Purpose

site/segment

Segment write key for analytics events. Used by api, collections-worker, ach-handler, webhook-worker, webhook-balance-worker, and batch-worker Lambdas.

site/iterable

Iterable API key for transactional email and push notifications. Used by ach-handler and notifier-worker Lambdas.

site/growthbook

GrowthBook SDK key for feature flag evaluation. Used by api, collections-worker, webhook-worker, webhook-balance-worker, and notifier-worker Lambdas.

site/appsflyer

AppsFlyer API key for mobile attribution events. Used by api, ach-handler, and webhook-balance-worker Lambdas.

site/datadog/terraform

Datadog API and app keys. Used by Terraform only (not injected into Lambda environments) to configure Datadog SLOs and the service catalog entry.

API Gateway

Gateway Auth Purpose

site-subscription-service

AWS IAM (SigV4)

Internal API for all subscription management operations. All routes (ANY /{proxy+}) forward to the site-subscription-service-api Lambda with payload format version 1.0 and an integration timeout of 30,000ms. Access logs are written to CloudWatch Logs (/aws/api-gateway/site-subscription-service, 30-day retention). Accessed by other FloatMe backend services using IAM-signed requests.

Scheduled Jobs

Lambda Schedule Purpose

site-subscription-service-collections-job

cron(0 8 ? * MON-FRI *) (08:00 UTC) — prod only

Paginates through billing-activity DynamoDB table (up to 50 pages per invoke) to find subscriptions eligible for scheduled collection. Enqueues matching records to collections-scheduled SQS for processing by collections-worker.

site-subscription-service-collections-job

cron(0 7 ? * MON-FRI *) (07:00 UTC) — prod only

Paginates through billing-activity to find subscriptions eligible for retry collection. Enqueues matching records to collections-retry SQS for processing by collections-worker.

site-subscription-service-collections-job

cron(0 22 ? * MON-FRI *) (22:00 UTC) — always enabled

Paginates through billing-activity to find subscriptions in paused state eligible for collection. Enqueues matching records to collections-pause SQS for processing by collections-worker.

site-subscription-service-notifier-scheduler

cron(0 12 ? * * *) (12:00 UTC daily) — prod only

Paginates through billing-activity (up to 50 pages per invoke) to find users in SCHEDULED status for upcoming subscription billing. Enqueues individual notification jobs to pre-subscription-notifier-worker SQS for processing by notifier-worker.

Monitoring

Datadog SLOs are defined across three dimensions for the subscription-service Lambdas. All SLOs use service:subscription-service and env:site tags.

Error SLO

[AWS][site-subscription-service] Lambda Errors SLO — 99.9% target / 99.99% warning over 7-day and 30-day windows.

Covers: collections-worker, collections-job, memberships, ach-handler, api, kinesis-feeder, notifier-scheduler, notifier-worker, webhook-worker.

Throughput SLO

[AWS][site-subscription-service] Lambda Throughput SLO — 99.9% target / 99.99% warning over 7-day and 30-day windows.

Covers: collections-job, memberships, ach-handler, api, kinesis-feeder, notifier-scheduler, webhook-worker.

Latency SLO

[AWS][site-subscription-service] Lambda Latency SLO — 99.9% target / 99.99% warning over 7-day and 30-day windows.

Covers: ach-handler, api, kinesis-feeder, notifier-scheduler, webhook-worker.

The Datadog service catalog entry (datadog_service_definition_yaml) registers the service as tier 1, team devops, with links to the GitHub source repo and the Antora documentation site at https://docs.floatme.io/subscription-service.

Terraform Structure

All infrastructure is defined in deploy/:

File Contents

terraform.tf

Terraform version constraint (>= 1.0.0), required provider versions (AWS ~> 6.0, Datadog ~> 4.3), S3 backend configuration (state key terraform/subscription-service/terraform.tfstate, DynamoDB locking table terraform-locking).

main.tf

AWS provider config with a dynamodb alias for cross-region DynamoDB access. Locals for all derived names (API Gateway names, Kinesis stream names, SageMaker endpoint name, assume-role ARN). Data sources for all external resources: billing-activity and billing-activity-history DynamoDB tables, locks table, site-subscriptions and site-payments Kinesis streams, and API Gateways for Admin, User Service, Payments, TXN, Insight, and Underwriting services.

variables.tf

All configurable parameters: environment, application (default: subscription-service), company (default: floatme), dynamo_legacy_region, lambda_dist_path, kinesis_stream_batch_size (default 100), user_kinesis_stream_parallelization_factor (default 2), enable_failure_notification (default false), old_subscription_days (default 61), service_version, and service override variables for underwriting API Gateway name and ID.

lambda.tf

collections-worker, collections-job, memberships, ach-handler, api, kinesis-feeder, notifier-scheduler, and notifier-worker Lambda modules with IAM policy statements, event source mappings, allowed triggers, and environment variables. Also includes the api_gateway module (HTTP API v2) and its CloudWatch log group.

sqs.tf

All SQS queues and their DLQs: collections-scheduled, collections-retry, collections-pause (60s visibility, maxReceiveCount 5); pre-subscription-notifier-scheduler, pre-subscription-notifier-worker (900s visibility, maxReceiveCount 5); and three paging queues (scheduled-collections-pages, retry-collections-pages, pause-collections-pages) with DLQs (all 900s visibility, maxReceiveCount 5).

batch_collections.tf

site-subscription-service-batch-worker SQS queue (60s visibility, no DLQ) and the batch-worker Lambda module with IAM policies and SQS event source mapping.

webhook_collections.tf

site-subscription-service-income-event-tap SQS queue and DLQ (900s, maxReceiveCount 1). site-subscription-service-income-detected EventBridge rule filtering income_txn events where amount < -7500 cents. SQS queue policy allowing EventBridge to send messages. webhook-worker Lambda module with IAM policies and SQS event source mapping.

webhook_collections_balance.tf

site-subscription-service-balance-event-tap SQS queue and DLQ (900s, maxReceiveCount 1; 20s delivery delay in test). site-subscription-service-balance-detected EventBridge rule filtering new_account events from txn-service.feeder for main accounts with non-negative balance. SQS queue policy allowing EventBridge to send messages. webhook-balance-worker Lambda module with IAM policies and SQS event source mapping.

cloudwatch_rules.tf

EventBridge rules for subscription-collections-scheduled (cron(0 8 ? * MON-FRI *), prod-only), subscription-collections-retry (cron(0 7 ? * MON-FRI *), prod-only), subscription-collections-pause (cron(0 22 ? * MON-FRI *), always enabled), and pre_subscription_notifier (cron(0 12 ? * * *), prod-only). Each rule has a CloudWatch event target pointing at the appropriate Lambda with an input transformer that injects the process type or status into the event detail.

secrets.tf

Secrets Manager data source references for site/segment, site/iterable, site/growthbook, site/appsflyer, and the site/datadog/terraform secret version used by the Datadog provider.

datadog.tf

Datadog provider configuration (reads credentials from the site/datadog/terraform secret). Three datadog_service_level_objective resources (error, throughput, latency SLOs). One datadog_service_definition_yaml resource registering the service in the Datadog service catalog as tier 1.

outputs.tf

(empty — no outputs defined)

  • Architecture — System context diagram and Lambda component overview

  • Event Flows — EventBridge events published and consumed, SQS queue flow details

  • ACH Processing — Kinesis-based ACH settlement callbacks

  • Collections Engine — How scheduled and webhook collection runs use these queues

  • DynamoDB Tables — Full schemas and access patterns for billing-activity and billing-activity-history