Collections Engine
The collections engine attempts to recover outstanding subscription fees through three parallel paths: a scheduled path driven by CloudWatch rules, and two webhook-driven paths that respond to real-time income and balance events from EventBridge.
Overview
Scheduled Path
CloudWatch (Mon-Fri, 07:00 and 08:00 UTC)
│
▼
collections-job ──paging──► SQS: *-collections-scheduled ──► collections-worker ──┐
SQS: *-collections-retry ──────────────────► collections-worker ──┤
SQS: *-collections-pause ──────────────────► collections-worker ──┤
│
Webhook Path (Income) ▼
EventBridge: income_txn (> $75 deposit) Payments Service
│ │
▼ │
SQS: *-income-event-tap ──► webhook-worker ────────────────────────► │
│
Webhook Path (Balance Update) │
EventBridge: new_account (primary account update) ▼
│ Subscription record
▼ updated in DynamoDB
SQS: *-balance-event-tap ──► webhook-balance-worker ──────────► (SCHEDULED → ACHSENT
or COMPLETED)
The collections-job Lambda handles three distinct processes (scheduled, retry, pause) — each triggered by a separate CloudWatch rule. It pages through DynamoDB and enqueues subscriptions in batches onto the corresponding SQS worker queues. The webhook-worker and webhook-balance-worker Lambdas react to external events from the Insight Service and Transactions Service without polling, targeting subscriptions already in ERROR status.
Scheduled Collections
Three CloudWatch rules drive the scheduled path, all targeting the same collections-job Lambda with a process field in the event detail that selects the collection type.
CloudWatch Schedule
| Rule Name | Schedule (UTC) | Process |
|---|---|---|
|
Mon–Fri 07:00 |
|
|
Mon–Fri 08:00 |
|
|
Mon–Fri 22:00 (17:00 CST) |
|
Paging Architecture
The collections-job Lambda uses a DynamoDB paginated scan to handle large subscription volumes within a single Lambda invocation window. Each invocation processes up to pagesPerInvocation pages. If more pages remain, the job sends itself a continuation message via the corresponding paging SQS queue:
-
site-subscription-service-scheduled-collections-pages -
site-subscription-service-retry-collections-pages -
site-subscription-service-pause-collections-pages
Each page of subscriptions is enqueued to the worker queue:
-
site-subscription-service-collections-scheduled(visibility timeout: 60 s, max receive count: 5) -
site-subscription-service-collections-retry(visibility timeout: 60 s, max receive count: 5) -
site-subscription-service-collections-pause(visibility timeout: 60 s, max receive count: 5)
Scheduled Collection Decision Tree
When the collections-worker dequeues a SCHEDULED subscription, checkScheduled runs the following logic:
Subscription is SCHEDULED
│
▼
User is employee? ──Yes──► Waive subscription; schedule next; return
│ No
▼
User is active? ──No──► Set status CANCELLED; return
│ Yes
▼
User has pending cancellation?
│
├── Yes, sub date after cancel date ──► Mark UpdatedEvent = CancelEvent
│
└── No / sub date before cancel date ──► Continue
│
▼
Acquire DynamoDB lock (locks table, key: "subscription-billing:user_id:{id}")
│ Failed (already locked)
└──► Skip silently; return
│ Acquired
▼
User on payments blocklist? ──Yes──► Set status ERROR; return
│ No
▼
Fetch Plaid account balance (via txn-service)
│ Error (no Plaid / Plaid error)
└──► Set status ERROR; return
│ Success
▼
balance >= subscription_amount? (with optional ML override)
│ No ──► Set status ERROR; return
│ Yes
▼
Institution in pinless pilot? ──Yes──► Debit card valid?
│ │ Yes ──► Submit pinless debit
│ │ │
│ │ COMPLETED ──► emit receipt + Appsflyer event
│ │ │
│ │ ERROR ──► set status ERROR
│ └── No ──► fall through to ACH
│ No
▼
Submit ACH
│
ACHSENT ──► set status ACHSENT (awaits settlement callback from ach-handler)
│
ERROR ──► set status ERROR
│
(defer) Schedule next SCHEDULED subscription record
Retry Collection Decision Tree
When the collections-worker dequeues an ERROR subscription (from the retry queue), checkRetry runs:
Subscription is in ERROR status
│
▼
ACH attempts (ACHSENT count in history) >= 2? ──Yes──► Skip (no status change)
│ No
▼
User on payments blocklist? ──Yes──► Skip
│ No
▼
Plaid balance >= subscription_amount? ──No──► Skip
│ Yes
▼
Institution is pinless-only (neobank)? ──Yes──► Skip
│ (low pinless retry success rate)
│ No
▼
Submit ACH
│
ACHSENT ──► set status ACHSENT
│
ERROR ──► set status ERROR
The ACHRetryLimit constant is set to 2 in pkg/collections/worker.go. Pinless-only institutions (configured via the subscriptions.pinless.institutions GrowthBook flag) are skipped in retry because their ACH return rates are high and webhook-triggered collection performs significantly better for them.
|
Pause Collection
When the collections-worker dequeues a PAUSED subscription, handlePaused runs:
Subscription is PAUSED
│
▼
Handle any pending tier downgrade (call user-service FinalizeDowngradeMembershipTier)
│
▼
Decrement PauseDurationMonths by 1
│
├── PauseDurationMonths was 0 (indefinite pause) ──► keep at -1
│
└── PauseDurationMonths reaches 0 ──► set next sub to SCHEDULED
set UpdatedEvent = PausePendingResume
│
▼
Set current sub to PAUSED_SKIPPED (UpdatedEvent = PauseSkipped)
│
▼
Write next subscription record at PauseDurationMonths - 1
When the next subscription record is processed and UpdatedEvent is PausePendingResume, the worker transitions it to active collection via UpdateEventTypePauseResume, which triggers the user-service to remove the pause tag from the user’s account.
Webhook-Triggered Collection (Income)
The webhook-worker Lambda responds to income deposit events published to EventBridge by the Insight Service. Only deposits larger than $75 (encoded as amount < -7500 cents, since amounts are negative) match the EventBridge filter rule and reach the SQS queue site-subscription-service-income-event-tap.
income_txn event emitted by Insight Service (amount > $75 deposit)
│
▼
EventBridge rule: income_txn, source=insight-service.income, amount < -7500
│
▼
SQS: {env}-subscription-service-income-event-tap
│
▼
webhook-worker parses CloudWatch event detail → { user_id, amount }
│
▼
GrowthBook flag "subscriptions.webhook.balance.authoritative" enabled? ──Yes──► Skip (balance worker handles it)
│ No
▼
Acquire DynamoDB lock (subscription-billing:user_id:{id})
│ Already locked ──► return (no error)
│ Acquired
▼
User has ERROR subscriptions created within last 2 months?
│ No ──► release lock; return
│ Yes
▼
User is active? ──No──► Set all ERROR subs to INACTIVE; return
│ Yes
▼
Debit card valid?
│
├── Yes ──► Submit pinless debit
│ │
│ COMPLETED ──► set status COMPLETED; send debit receipt
│ │
│ ERROR ──► set status ERROR
│
└── No ──► Run ACH eligibility checks:
│
income event amount > -$100 threshold? ──Yes──► Skip (income too low)
│ No (high enough deposit)
▼
ACH attempts in past month >= 3? ──Yes──► Skip
│ No
▼
Cached available balance < $200 threshold? ──Yes──► Skip
│ No (balance sufficient)
▼
User on payments blocklist? ──Yes──► Skip
│ No
▼
Submit ACH (same-day: false for regular ACH)
│
ACHSENT ──► set status ACHSENT
│
ERROR ──► set status ERROR
The income-worker ACH threshold values (ACH_INCOME_EVENT_THRESHOLD = -10000 cents, ACH_BALANCE_THRESHOLD = 20000 cents, ACH_ATTEMPTS_PER_MONTH = 3) are environment variables set in deploy/webhook_collections.tf. The income event threshold of -10000 cents means the income deposit must exceed $100 to trigger an ACH attempt.
|
Balance-Update Collection
The webhook-balance-worker Lambda responds to new_account events published by the Transactions Service when a user’s primary bank account balance is updated. The EventBridge filter passes events where at least one of available, current, or calc_available is >= 0.
This path is controlled by the GrowthBook flag subscriptions.webhook.balance.authoritative. When the flag is disabled for a user, the webhook-worker (income path) handles them instead. Both Lambdas receive the same events; the flag determines which one acts.
new_account event from txn-service.feeder (primary account updated)
│
▼
EventBridge rule: new_account, source=txn-service.feeder, is_main=true, balance >= 0
│
▼
SQS: {env}-subscription-service-balance-event-tap (max concurrency: 5)
│
▼
webhook-balance-worker parses CloudWatch event detail → txnModels.Account
│
▼
GrowthBook flag "subscriptions.webhook.balance.authoritative" enabled for user? ──No──► return nil
│ Yes
▼
Acquire DynamoDB lock (subscription-billing:user_id:{id})
│ Locked ──► return error (SQS retry)
│ Acquired
▼
Populate collection state (balance from event, institution ID, debit card, blocklist status)
│
▼
User is active? ──No──► return (no status change)
│ Yes
▼
User has ERROR subscriptions within last 2 months? ──No──► return
│ Yes
▼
For each overdue subscription (in order):
│
▼
Run rule chain: active, pinless-if-not-set, ach-if-no-debit-card, blocklist checks,
balance threshold checks (via GrowthBook flags), ACH limit check
│ Any rule fails ──► skip this subscription (log reason + Datadog metric)
│ All rules pass
▼
Submit payment (pinless or ACH per rule outcome)
│
├── Pinless success ──► COMPLETED; send debit receipt via Segment
│
├── Pinless error code 51 (NSF) and "ach_if_51" flag enabled ──► retry as ACH
│
└── ACH success ──► ACHSENT (awaits ach-handler settlement callback)
│
If multiple subscriptions: sleep 11 seconds between attempts (avoid duplicate USIO transactions)
The balance thresholds used in the rule chain are GrowthBook feature flags:
-
subscriptions.webhook.balance.threshold— pinless balance threshold (standard tier) -
subscriptions.webhook.balance.threshold_premium— pinless balance threshold (premium tier) -
subscriptions.webhook.balance.ach_threshold— ACH balance threshold -
subscriptions.webhook.balance.use_calculated_balance— usecalc_availableinstead ofavailable -
subscriptions.webhook.balance.ach_if_51— fall back to ACH on pinless error code 51
Collection Outcomes
| Status | Set By | Description |
|---|---|---|
|
Subscription creation (activation, next-cycle scheduling) |
Initial status. The subscription is queued for collection on or after its |
|
Scheduled worker, retry worker, webhook-worker, webhook-balance-worker (ACH path) |
ACH debit submitted to the payment processor. Awaiting a settlement callback via the |
|
Pinless success in any worker path |
Subscription collected via pinless debit. Immediate confirmation — no settlement wait required. The |
|
Any collection path on failure (insufficient balance, payment rejected, no payment method, Plaid error, blocklist) |
Collection attempted but unsuccessful. The |
|
Membership PAUSE event via |
User membership is paused. The collection is skipped each cycle; a new |
|
Pause collection worker |
Written for the current billing cycle when a paused subscription is processed. Serves as an audit record that the cycle was intentionally skipped. |
|
Collection worker (user not active or pending cancellation past cancel date) |
Subscription will not be collected. Set when the user-service reports the user is no longer active or the membership cancel event was processed. |
|
Collection worker (employee waiver path) |
Subscription waived because the user is a FloatMe employee. A new |
|
Manual / operational use |
Subscription record is stale and will not be collected. |
|
Webhook-worker (user not active on webhook event) |
User was found inactive at the time a webhook event was processed. |
Distributed Locking
Before processing any subscription, each collection Lambda acquires a per-user distributed lock stored in the locks DynamoDB table. This prevents two concurrent collection paths — for example, a scheduled retry run and an incoming income webhook event — from attempting simultaneous payments for the same user’s subscription.
The lock implementation lives in pkg/dynamo/lock.go and uses the cirello.io/dynamolock/v2 library.
| Property | Value |
|---|---|
DynamoDB table |
|
Lock key format |
|
Lease duration |
60 seconds |
Heartbeat period |
1 second |
Acquire mode |
|
If LockSubscriptionForBilling returns ErrAlreadyLocked, the calling Lambda returns nil (no error, no status change) — the subscription will be reprocessed on the next invocation. Any other lock error is surfaced as a wrapped error and causes the SQS message to be retried.
The lock is always released via a deferred call to ReleaseSubscription. If the lock is lost during processing (the lease expired before release), a warning is logged and the release call is a no-op.
ML Decider Integration
The pkg/mldecider package provides a SageMaker-backed binary classifier (SubscriptionMLDecider) that predicts whether a subscription collection attempt will succeed. It is integrated into the scheduled checkScheduled path (not the retry or webhook paths).
The decider is controlled by two GrowthBook feature flags:
-
subscriptions.initial.ml_evaluation— enables ML evaluation for a user (default: off) -
subscriptions.initial.override_balance_with_ml— when enabled, the ML prediction can override a failing balance check and proceed with collection (default: off) -
subscriptions.initial.ml_threshold— confidence threshold the model result must meet to returntrue(default: 1.0)
The ML evaluation runs unconditionally when enabled so that predictions are always logged, even when the override flag is off. This supports A/B analysis of model accuracy before enabling the override in production.
Feature Engineering
The Decide method assembles a SubCollectionMLRequest from the following features, then calls the SageMaker endpoint via pkg/sagemaker.Client.InvokeEndpoint:
| Feature | Source |
|---|---|
|
Daily available balances for the 7 days prior to the collection attempt, fetched from txn-service. Missing days are encoded as |
|
Linear regression slope over the 7-day balance series (computed via |
|
Aggregate statistics over the 7-day series. |
|
Number of days with valid (non-missing) balance data. |
|
Days since |
|
Numeric institution ID extracted from the Plaid institution string ( |
|
Day of week at submission time (0=Monday … 6=Sunday, matching SageMaker training format). |
|
UTC hour at submission time. |
|
Days since |
|
Days since the most recent |
|
The balance value observed by the scheduled worker at collection time. |
|
Whether the |
If ErrNoDataForML is returned (no 7-day balance history available), the decider logs a warning and returns true — collection proceeds normally without ML influence.
Related Pages
-
Collection Processes — detailed walkthrough of the scheduled, retry, and webhook collection processes
-
Collections Architecture — system context and subsystem architecture diagrams
-
Module Architecture — Go package structure and dependency graph for the collections subsystem
-
Subscription Lifecycle — subscription status state machine and end-to-end lifecycle
-
ACH Processing — Kinesis ACH settlement callbacks and status transitions
-
DynamoDB Tables —
billing-activity,billing-activity-history, andlockstable schemas