Infrastructure

The Float Service is deployed entirely on AWS using Terraform. All infrastructure is defined in deploy/ and managed per-environment (test, prod). The application identifier is floats.

Lambda Functions

Function Trigger Timeout Memory Key IAM Permissions

prod-floats-api

API Gateway (prod-floats, AWS IAM)

300s

default

DynamoDB read/write (float-service table, collection-history, locks, requirements-bypass); EventBridge PutEvents (floatme-events bus); execute-api:Invoke (Payments, Underwriting, User Service, TXN Service, Insight Service, Subscription Service, LOC Service); RDS, GrowthBook, Segment, Iterable, AppsFlyer secrets

prod-floats-collections-scheduler

EventBridge (float-collections at 9:30 UTC, float-collections-retry at 8:30 UTC Mon–Fri)

900s

1024 MB

SQS SendMessage (prod-floats-collections); RDS replica secret

prod-floats-collections-worker

SQS (prod-floats-collections)

540s

512 MB

DynamoDB read/write (collection-history, locks); execute-api:Invoke (Payments, Underwriting, User Service, TXN Service); RDS main/replica, Segment, Iterable, GrowthBook, AppsFlyer secrets

prod-floats-ach-handler

Kinesis (prod-payments) — filtered to FLOAT_DEBIT/CREDIT_COMPLETED/RETURNED and FLOAT_DEBIT_CHARGED_BACK

840s

default

DynamoDB read/write (collection-history); execute-api:Invoke (User Service, Underwriting, Admin API); RDS main/replica, Segment, Iterable, AppsFlyer secrets

prod-floats-webhook-worker

SQS (prod-floats-income-event-tap)

540s

512 MB

DynamoDB read/write (collection-history, locks); execute-api:Invoke (Payments, User Service, TXN Service, Underwriting); RDS main/replica, Segment, Iterable, GrowthBook, AppsFlyer secrets

prod-floats-webhook-worker-balance

SQS (prod-floats-balance-event-tap)

810s

512 MB

DynamoDB read/write (collection-history, locks); execute-api:Invoke (Payments, User Service, TXN Service, Underwriting); RDS main/replica, Segment, Iterable, GrowthBook, AppsFlyer secrets

prod-floats-prenote-scheduler

EventBridge (float-prenote-scheduler at 12:30 UTC Mon–Fri)

900s

512 MB

SQS SendMessage (prod-floats-prenotes); RDS replica and GrowthBook secrets

prod-floats-prenote-worker

SQS (prod-floats-prenotes, concurrency cap 3)

540s

512 MB

SQS Receive/Delete/GetQueueAttributes (prod-floats-prenotes); execute-api:Invoke (Payments, User Service); RDS replica secret

prod-floats-batch-worker

SQS (prod-floats-batch-worker)

54s

default

DynamoDB read/write (collection-history, locks); execute-api:Invoke (Payments, User Service, Underwriting); RDS main/replica, Segment, Iterable, AppsFlyer secrets

prod-floats-reporter

EventBridge schedule (daily 12:00 UTC; disabled in non-prod)

900s

128 MB

RDS replica, Slack secrets

All Lambdas run inside the shared FloatMe VPC (private subnets, PrivateSG security group) to access RDS.

DynamoDB Tables

Table Region Entity Types Streams Notes

prod-float-service

us-east-2 (primary)

Bypass records, float-service single-table entities

No

Primary single-table for the float service. Region is configurable via float_service_table_region.

collection-history (legacy)

us-east-1 in prod; current region in test

Collection attempt log entries

No

Read/written by all collection Lambdas. Keys by loan_id.

locks (legacy)

us-east-1 in prod; current region in test

Distributed lock records (cirello.io/dynamolock)

No

Used to serialise concurrent collection attempts. 60s lease, 1s heartbeat.

requirements-bypass (legacy)

us-east-1 in prod; current region in test

Per-user bypass records

No

Written by the API Lambda. Read during float creation to skip underwriting.

Legacy tables (collection-history, locks, requirements-bypass) do not carry an environment prefix — they were created before per-environment namespacing and are shared by multiple services.

SQS Queues

Queue Visibility Timeout Max Receive Count DLQ Purpose

prod-floats-collections

600s

1

prod-floats-collections-dlq

Receives float collection jobs from the scheduler. Consumed by the collections-worker.

prod-floats-batch-worker

60s

(none configured)

None

Receives batch collection requests. Consumed by the batch-worker.

prod-floats-income-event-tap

900s

1

prod-floats-income-event-tap-dlq

Receives income detection events from EventBridge (source: insight-service.income). Consumed by the webhook-worker.

prod-floats-balance-event-tap

900s

1

prod-floats-balance-event-tap-dlq

Receives balance update events from EventBridge (source: txn-service.feeder, new_account type). Consumed by the webhook-worker-balance.

prod-floats-webhook-worker

600s

1

prod-floats-webhook-worker-dlq

Legacy webhook queue. Event source mapping is disabled — the webhook-worker now consumes from prod-floats-income-event-tap instead.

prod-floats-prenotes

600s

1

prod-floats-prenotes-dlq

Receives prenote submissions from the prenote-scheduler. Consumed by the prenote-worker with batch size 10 and reserved concurrency of 3.

Kinesis Streams

Stream Direction Producers Consumers Purpose

prod-payments

Inbound (external)

Payments Service

prod-floats-ach-handler

ACH settlement events from the Payments Service. The ach-handler filters for five event types: FLOAT_DEBIT_COMPLETED, FLOAT_DEBIT_RETURNED, FLOAT_CREDIT_COMPLETED, FLOAT_CREDIT_RETURNED, FLOAT_DEBIT_CHARGED_BACK.

ACH handler Kinesis configuration:

  • Batch size: 100 (configurable via kinesis_stream_batch_size)

  • Parallelization factor: 2 (configurable via user_kinesis_stream_parallelization_factor)

  • Starting position: LATEST

  • Error handling: BisectBatchOnFunctionError enabled

EventBridge

Bus / Rule Purpose

floatme-events (default bus)

Internal domain event bus. The API Lambda publishes user_float_created events here after a float record is written to RDS.

float-collections rule

cron(30 9 ? * MON-FRI *) — fires at 9:30 UTC Mon–Fri with input {"time": 6}. Triggers prod-floats-collections-scheduler for the T-1 Day and Due Date collection runs.

float-collections-retry rule

cron(30 8 ? * MON-FRI *) — fires at 8:30 UTC Mon–Fri with input {"time": 5}. Triggers prod-floats-collections-scheduler for the Daily Retry run.

site-floats-income-detected rule

Filters income_txn events from insight-service.income where amount < -$150. Routes matching events to prod-floats-income-event-tap SQS.

site-floats-balance-detected rule

Filters new_account events from txn-service.feeder where the account is a main account and has a non-negative available, current, or calculated balance. Routes to prod-floats-balance-event-tap SQS.

reporter rule

cron(0 12 ? * * *) — fires daily at 12:00 UTC. Enabled in prod only. Triggers prod-floats-reporter.

float-prenote-scheduler rule

cron(30 12 ? * MON-FRI *) — fires weekdays at 12:30 UTC (3 hours after the collections runs). Triggers prod-floats-prenote-scheduler.

Secrets Manager

All secrets are namespaced by environment (site/…​).

Secret Path Purpose

site/rds/main

RDS main instance connection credentials (host, port, user, password, database). Used by the API, collections-worker, ach-handler, webhook-worker, webhook-worker-balance, and batch-worker Lambdas.

site/rds/replica

RDS read-replica connection credentials. Used by the collections-scheduler, prenote-scheduler, and reporter Lambdas (read-only queries), and by all collection Lambdas as a fallback for float lookups.

site/segment

Segment write key for analytics events. Used by the API, collections-worker, ach-handler, webhook-worker, webhook-worker-balance, and batch-worker Lambdas.

site/iterable

Iterable API key for transactional email and push notifications. Used by the API, collections-worker, ach-handler, webhook-worker, webhook-worker-balance, and batch-worker Lambdas.

site/appsflyer

AppsFlyer API key for mobile attribution events. Used by the API, collections-worker, ach-handler, webhook-worker, webhook-worker-balance, and batch-worker Lambdas.

site/growthbook

GrowthBook SDK key for feature flag evaluation. Used by the API, collections-worker, webhook-worker, webhook-worker-balance, and prenote-scheduler Lambdas.

site/slack

Slack webhook URL for posting daily origination reports. Used only by the reporter Lambda.

site/datadog/terraform

Datadog API and app keys. Used by Terraform to configure Datadog SLOs and the service catalog entry.

API Gateway

Gateway Auth Purpose

prod-floats

AWS IAM (SigV4)

Internal API for all float management operations. All routes (ANY /{proxy+}) forward to the prod-floats-api Lambda. Accessed by FloatMe backend services (User Service, Admin, mobile gateway) and the main global FloatMe API Gateway for authenticated mobile requests.

Scheduled Jobs

Lambda Schedule Purpose

prod-floats-collections-scheduler

cron(30 9 ? * MON-FRI *) (9:30 UTC)

Queries RDS for floats in SCHEDULING status due today or tomorrow. Enqueues them to prod-floats-collections for the T-1 Day and Due Date collection runs.

prod-floats-collections-scheduler

cron(30 8 ? * MON-FRI *) (8:30 UTC)

Queries RDS for floats in RETRY, FAILED, ACHFAILED, or UNCOLLECTABLE status whose due date has passed. Enqueues them for the Daily Retry run.

prod-floats-prenote-scheduler

cron(30 12 ? * MON-FRI *) (12:30 UTC Mon–Fri)

Queries the RDS read replica for floats in SCHEDULING status whose due date is 4 business days out (Mon: +4 calendar days; Tue/Wed/Thu/Fri: +6 calendar days to skip the weekend). Filters by the floats.prenotes GrowthBook flag per user and enqueues gated-in users to prod-floats-prenotes.

prod-floats-reporter

cron(0 12 ? * * *) (12:00 UTC daily) — prod only

Queries the RDS read replica for float origination data over the last 10 days, grouped by loan type. Posts a summary to Slack #feed-fm-collections.

Monitoring

Datadog SLOs are defined for five Lambdas (api, collections-scheduler, collections-worker, ach-handler, webhook-worker):

  • Error SLO — 99.9% target / 99.99% warning over 7-day and 30-day windows

  • Throughput SLO — 99.9% target / 99.99% warning over 7-day and 30-day windows

  • Latency SLO — 99.9% target / 99.99% warning over 7-day and 30-day windows (scheduler and webhook-worker only)

Terraform Structure

All infrastructure is defined in deploy/:

File Contents

main.tf

AWS provider config (6 aliases for cross-region resources), locals for all derived names and region overrides, data sources for all external resources (DynamoDB tables, API Gateways, Kinesis stream, EventBridge bus).

api.tf

prod-floats-api Lambda, prod-floats API Gateway v2, VPC link definition.

lambda.tf

prod-floats-collections-scheduler, prod-floats-collections-worker, and prod-floats-ach-handler Lambda definitions with IAM policies and event source mappings.

batch_collections.tf

prod-floats-batch-worker SQS queue and Lambda.

webhook_collections.tf

prod-floats-webhook-worker and prod-floats-income-event-tap SQS queues, site-floats-income-detected EventBridge rule and SQS policy, prod-floats-webhook-worker Lambda.

webhook_balance_collections.tf

prod-floats-balance-event-tap SQS queue, site-floats-balance-detected EventBridge rule and SQS policy, prod-floats-webhook-worker-balance Lambda.

reporter.tf

reporter EventBridge schedule rule and prod-floats-reporter Lambda.

prenotes.tf

float-prenote-scheduler EventBridge rule, prod-floats-prenotes SQS queue and DLQ, and prod-floats-prenote-scheduler and prod-floats-prenote-worker Lambdas.

sqs.tf

prod-floats-collections queue and its DLQ.

cloudwatch_rules.tf

float-collections and float-collections-retry EventBridge rules with Lambda targets and schedule inputs.

secrets.tf

Secrets Manager data source references for all secrets used by the service.

vpc.tf

Data sources for the shared FloatMe VPC, private subnets, and PrivateSG security group.

datadog.tf

Datadog SLOs (error, throughput, latency) and the Datadog service catalog definition.

variables.tf

All configurable parameters: environment, application, table/stream/queue names, service region overrides, collection thresholds, app version constraints, and float amount limits.

  • Architecture — System context diagram and component overview

  • Event Flows — EventBridge events published and consumed, SQS queue flow details

  • ACH Processing — Kinesis-based ACH settlement callbacks

  • Collections Engine — How scheduled and webhook collection runs use these queues

  • DynamoDB Tables — Full schemas for collection-history, locks, and requirements-bypass