Collections Engine

The collections engine attempts to recover outstanding subscription fees through three parallel paths: a scheduled path driven by CloudWatch rules, and two webhook-driven paths that respond to real-time income and balance events from EventBridge.

Overview

Scheduled Path
  CloudWatch (Mon-Fri, 07:00 and 08:00 UTC)
        │
        ▼
  collections-job ──paging──► SQS: *-collections-scheduled ──► collections-worker ──┐
                    SQS: *-collections-retry ──────────────────► collections-worker ──┤
                    SQS: *-collections-pause ──────────────────► collections-worker ──┤
                                                                                      │
Webhook Path (Income)                                                                 ▼
  EventBridge: income_txn (> $75 deposit)                              Payments Service
        │                                                                      │
        ▼                                                                      │
  SQS: *-income-event-tap ──► webhook-worker ────────────────────────►        │
                                                                               │
Webhook Path (Balance Update)                                                  │
  EventBridge: new_account (primary account update)                            ▼
        │                                                           Subscription record
        ▼                                                           updated in DynamoDB
  SQS: *-balance-event-tap ──► webhook-balance-worker ──────────►  (SCHEDULED → ACHSENT
                                                                     or COMPLETED)

The collections-job Lambda handles three distinct processes (scheduled, retry, pause) — each triggered by a separate CloudWatch rule. It pages through DynamoDB and enqueues subscriptions in batches onto the corresponding SQS worker queues. The webhook-worker and webhook-balance-worker Lambdas react to external events from the Insight Service and Transactions Service without polling, targeting subscriptions already in ERROR status.

Scheduled Collections

Three CloudWatch rules drive the scheduled path, all targeting the same collections-job Lambda with a process field in the event detail that selects the collection type.

CloudWatch Schedule

Rule Name Schedule (UTC) Process

subscription-collections-retry

Mon–Fri 07:00

retry — queries subscriptions in ERROR status with a subscription date older than one month

subscription-collections-scheduled

Mon–Fri 08:00

scheduled — queries subscriptions in SCHEDULED status with a subscription date up to today

subscription-collections-pause

Mon–Fri 22:00 (17:00 CST)

pause — queries subscriptions in PAUSED status, advancing the pause counter

Paging Architecture

The collections-job Lambda uses a DynamoDB paginated scan to handle large subscription volumes within a single Lambda invocation window. Each invocation processes up to pagesPerInvocation pages. If more pages remain, the job sends itself a continuation message via the corresponding paging SQS queue:

  • site-subscription-service-scheduled-collections-pages

  • site-subscription-service-retry-collections-pages

  • site-subscription-service-pause-collections-pages

Each page of subscriptions is enqueued to the worker queue:

  • site-subscription-service-collections-scheduled (visibility timeout: 60 s, max receive count: 5)

  • site-subscription-service-collections-retry (visibility timeout: 60 s, max receive count: 5)

  • site-subscription-service-collections-pause (visibility timeout: 60 s, max receive count: 5)

Scheduled Collection Decision Tree

When the collections-worker dequeues a SCHEDULED subscription, checkScheduled runs the following logic:

Subscription is SCHEDULED
        │
        ▼
User is employee? ──Yes──► Waive subscription; schedule next; return
        │ No
        ▼
User is active? ──No──► Set status CANCELLED; return
        │ Yes
        ▼
User has pending cancellation?
        │
        ├── Yes, sub date after cancel date ──► Mark UpdatedEvent = CancelEvent
        │
        └── No / sub date before cancel date ──► Continue
        │
        ▼
Acquire DynamoDB lock (locks table, key: "subscription-billing:user_id:{id}")
        │ Failed (already locked)
        └──► Skip silently; return
        │ Acquired
        ▼
User on payments blocklist? ──Yes──► Set status ERROR; return
        │ No
        ▼
Fetch Plaid account balance (via txn-service)
        │ Error (no Plaid / Plaid error)
        └──► Set status ERROR; return
        │ Success
        ▼
balance >= subscription_amount? (with optional ML override)
        │ No ──► Set status ERROR; return
        │ Yes
        ▼
Institution in pinless pilot? ──Yes──► Debit card valid?
        │                                 │ Yes ──► Submit pinless debit
        │                                 │           │
        │                                 │     COMPLETED ──► emit receipt + Appsflyer event
        │                                 │           │
        │                                 │        ERROR ──► set status ERROR
        │                                 └── No ──► fall through to ACH
        │ No
        ▼
Submit ACH
        │
  ACHSENT ──► set status ACHSENT (awaits settlement callback from ach-handler)
        │
  ERROR  ──► set status ERROR
        │
(defer) Schedule next SCHEDULED subscription record

Retry Collection Decision Tree

When the collections-worker dequeues an ERROR subscription (from the retry queue), checkRetry runs:

Subscription is in ERROR status
        │
        ▼
ACH attempts (ACHSENT count in history) >= 2? ──Yes──► Skip (no status change)
        │ No
        ▼
User on payments blocklist? ──Yes──► Skip
        │ No
        ▼
Plaid balance >= subscription_amount? ──No──► Skip
        │ Yes
        ▼
Institution is pinless-only (neobank)? ──Yes──► Skip
        │                                   (low pinless retry success rate)
        │ No
        ▼
Submit ACH
        │
  ACHSENT ──► set status ACHSENT
        │
  ERROR  ──► set status ERROR
The ACHRetryLimit constant is set to 2 in pkg/collections/worker.go. Pinless-only institutions (configured via the subscriptions.pinless.institutions GrowthBook flag) are skipped in retry because their ACH return rates are high and webhook-triggered collection performs significantly better for them.

Pause Collection

When the collections-worker dequeues a PAUSED subscription, handlePaused runs:

Subscription is PAUSED
        │
        ▼
Handle any pending tier downgrade (call user-service FinalizeDowngradeMembershipTier)
        │
        ▼
Decrement PauseDurationMonths by 1
        │
        ├── PauseDurationMonths was 0 (indefinite pause) ──► keep at -1
        │
        └── PauseDurationMonths reaches 0 ──► set next sub to SCHEDULED
                                              set UpdatedEvent = PausePendingResume
        │
        ▼
Set current sub to PAUSED_SKIPPED (UpdatedEvent = PauseSkipped)
        │
        ▼
Write next subscription record at PauseDurationMonths - 1

When the next subscription record is processed and UpdatedEvent is PausePendingResume, the worker transitions it to active collection via UpdateEventTypePauseResume, which triggers the user-service to remove the pause tag from the user’s account.

Webhook-Triggered Collection (Income)

The webhook-worker Lambda responds to income deposit events published to EventBridge by the Insight Service. Only deposits larger than $75 (encoded as amount < -7500 cents, since amounts are negative) match the EventBridge filter rule and reach the SQS queue site-subscription-service-income-event-tap.

income_txn event emitted by Insight Service (amount > $75 deposit)
        │
        ▼
EventBridge rule: income_txn, source=insight-service.income, amount < -7500
        │
        ▼
SQS: {env}-subscription-service-income-event-tap
        │
        ▼
webhook-worker parses CloudWatch event detail → { user_id, amount }
        │
        ▼
GrowthBook flag "subscriptions.webhook.balance.authoritative" enabled? ──Yes──► Skip (balance worker handles it)
        │ No
        ▼
Acquire DynamoDB lock (subscription-billing:user_id:{id})
        │ Already locked ──► return (no error)
        │ Acquired
        ▼
User has ERROR subscriptions created within last 2 months?
        │ No ──► release lock; return
        │ Yes
        ▼
User is active? ──No──► Set all ERROR subs to INACTIVE; return
        │ Yes
        ▼
Debit card valid?
        │
        ├── Yes ──► Submit pinless debit
        │               │
        │         COMPLETED ──► set status COMPLETED; send debit receipt
        │               │
        │            ERROR ──► set status ERROR
        │
        └── No ──► Run ACH eligibility checks:
                        │
                  income event amount > -$100 threshold? ──Yes──► Skip (income too low)
                        │ No (high enough deposit)
                        ▼
                  ACH attempts in past month >= 3? ──Yes──► Skip
                        │ No
                        ▼
                  Cached available balance < $200 threshold? ──Yes──► Skip
                        │ No (balance sufficient)
                        ▼
                  User on payments blocklist? ──Yes──► Skip
                        │ No
                        ▼
                  Submit ACH (same-day: false for regular ACH)
                        │
                  ACHSENT ──► set status ACHSENT
                        │
                  ERROR  ──► set status ERROR
The income-worker ACH threshold values (ACH_INCOME_EVENT_THRESHOLD = -10000 cents, ACH_BALANCE_THRESHOLD = 20000 cents, ACH_ATTEMPTS_PER_MONTH = 3) are environment variables set in deploy/webhook_collections.tf. The income event threshold of -10000 cents means the income deposit must exceed $100 to trigger an ACH attempt.

Balance-Update Collection

The webhook-balance-worker Lambda responds to new_account events published by the Transactions Service when a user’s primary bank account balance is updated. The EventBridge filter passes events where at least one of available, current, or calc_available is >= 0.

This path is controlled by the GrowthBook flag subscriptions.webhook.balance.authoritative. When the flag is disabled for a user, the webhook-worker (income path) handles them instead. Both Lambdas receive the same events; the flag determines which one acts.

new_account event from txn-service.feeder (primary account updated)
        │
        ▼
EventBridge rule: new_account, source=txn-service.feeder, is_main=true, balance >= 0
        │
        ▼
SQS: {env}-subscription-service-balance-event-tap (max concurrency: 5)
        │
        ▼
webhook-balance-worker parses CloudWatch event detail → txnModels.Account
        │
        ▼
GrowthBook flag "subscriptions.webhook.balance.authoritative" enabled for user? ──No──► return nil
        │ Yes
        ▼
Acquire DynamoDB lock (subscription-billing:user_id:{id})
        │ Locked ──► return error (SQS retry)
        │ Acquired
        ▼
Populate collection state (balance from event, institution ID, debit card, blocklist status)
        │
        ▼
User is active? ──No──► return (no status change)
        │ Yes
        ▼
User has ERROR subscriptions within last 2 months? ──No──► return
        │ Yes
        ▼
For each overdue subscription (in order):
        │
        ▼
Run rule chain: active, pinless-if-not-set, ach-if-no-debit-card, blocklist checks,
                balance threshold checks (via GrowthBook flags), ACH limit check
        │ Any rule fails ──► skip this subscription (log reason + Datadog metric)
        │ All rules pass
        ▼
Submit payment (pinless or ACH per rule outcome)
        │
        ├── Pinless success ──► COMPLETED; send debit receipt via Segment
        │
        ├── Pinless error code 51 (NSF) and "ach_if_51" flag enabled ──► retry as ACH
        │
        └── ACH success ──► ACHSENT (awaits ach-handler settlement callback)
        │
If multiple subscriptions: sleep 11 seconds between attempts (avoid duplicate USIO transactions)

The balance thresholds used in the rule chain are GrowthBook feature flags:

  • subscriptions.webhook.balance.threshold — pinless balance threshold (standard tier)

  • subscriptions.webhook.balance.threshold_premium — pinless balance threshold (premium tier)

  • subscriptions.webhook.balance.ach_threshold — ACH balance threshold

  • subscriptions.webhook.balance.use_calculated_balance — use calc_available instead of available

  • subscriptions.webhook.balance.ach_if_51 — fall back to ACH on pinless error code 51

Collection Outcomes

Status Set By Description

SCHEDULED

Subscription creation (activation, next-cycle scheduling)

Initial status. The subscription is queued for collection on or after its subscription_date.

ACHSENT

Scheduled worker, retry worker, webhook-worker, webhook-balance-worker (ACH path)

ACH debit submitted to the payment processor. Awaiting a settlement callback via the site-payments Kinesis stream, consumed by the ach-handler Lambda.

COMPLETED

Pinless success in any worker path

Subscription collected via pinless debit. Immediate confirmation — no settlement wait required. The transaction_id field holds the confirmation ID from the Payments Service.

ERROR

Any collection path on failure (insufficient balance, payment rejected, no payment method, Plaid error, blocklist)

Collection attempted but unsuccessful. The usio_error field records the failure reason. The subscription will be retried by the retry CloudWatch schedule or on the next qualifying webhook event.

PAUSED

Membership PAUSE event via memberships Lambda

User membership is paused. The collection is skipped each cycle; a new PAUSED record is written with a decremented pause_duration_months. When the duration reaches 0 the next record is written as SCHEDULED.

PAUSED_SKIPPED

Pause collection worker

Written for the current billing cycle when a paused subscription is processed. Serves as an audit record that the cycle was intentionally skipped.

CANCELLED

Collection worker (user not active or pending cancellation past cancel date)

Subscription will not be collected. Set when the user-service reports the user is no longer active or the membership cancel event was processed.

WAIVED

Collection worker (employee waiver path)

Subscription waived because the user is a FloatMe employee. A new SCHEDULED subscription is written for the next billing cycle.

STALE

Manual / operational use

Subscription record is stale and will not be collected.

INACTIVE

Webhook-worker (user not active on webhook event)

User was found inactive at the time a webhook event was processed.

Distributed Locking

Before processing any subscription, each collection Lambda acquires a per-user distributed lock stored in the locks DynamoDB table. This prevents two concurrent collection paths — for example, a scheduled retry run and an incoming income webhook event — from attempting simultaneous payments for the same user’s subscription.

The lock implementation lives in pkg/dynamo/lock.go and uses the cirello.io/dynamolock/v2 library.

Property Value

DynamoDB table

locks (legacy table, hard-coded)

Lock key format

subscription-billing:user_id:{userID}

Lease duration

60 seconds

Heartbeat period

1 second

Acquire mode

FailIfLocked — does not wait; returns ErrAlreadyLocked immediately if held

If LockSubscriptionForBilling returns ErrAlreadyLocked, the calling Lambda returns nil (no error, no status change) — the subscription will be reprocessed on the next invocation. Any other lock error is surfaced as a wrapped error and causes the SQS message to be retried.

The lock is always released via a deferred call to ReleaseSubscription. If the lock is lost during processing (the lease expired before release), a warning is logged and the release call is a no-op.

ML Decider Integration

The pkg/mldecider package provides a SageMaker-backed binary classifier (SubscriptionMLDecider) that predicts whether a subscription collection attempt will succeed. It is integrated into the scheduled checkScheduled path (not the retry or webhook paths).

The decider is controlled by two GrowthBook feature flags:

  • subscriptions.initial.ml_evaluation — enables ML evaluation for a user (default: off)

  • subscriptions.initial.override_balance_with_ml — when enabled, the ML prediction can override a failing balance check and proceed with collection (default: off)

  • subscriptions.initial.ml_threshold — confidence threshold the model result must meet to return true (default: 1.0)

The ML evaluation runs unconditionally when enabled so that predictions are always logged, even when the override flag is off. This supports A/B analysis of model accuracy before enabling the override in production.

Feature Engineering

The Decide method assembles a SubCollectionMLRequest from the following features, then calls the SageMaker endpoint via pkg/sagemaker.Client.InvokeEndpoint:

Feature Source

balance_1d through balance_7d

Daily available balances for the 7 days prior to the collection attempt, fetched from txn-service. Missing days are encoded as -9999.

balance_slope

Linear regression slope over the 7-day balance series (computed via gonum/stat.LinearRegression).

avg_balance_7d, max_balance_7d, min_balance_7d

Aggregate statistics over the 7-day series.

count_balance_days

Number of days with valid (non-missing) balance data.

user_tenure_days

Days since user.date_joined.

institution_id

Numeric institution ID extracted from the Plaid institution string (ins_{id}).

submit_dayofweek

Day of week at submission time (0=Monday … 6=Sunday, matching SageMaker training format).

submit_hour

UTC hour at submission time.

days_delinquent

Days since subscription.subscription_date (positive = past due).

days_since_last_successful_collection

Days since the most recent COMPLETED subscription, derived from the transaction ID date prefix (YYMMDD…​).

worker_observed_balance

The balance value observed by the scheduled worker at collection time.

ml_override

Whether the override_balance_with_ml flag is active for this user.

If ErrNoDataForML is returned (no 7-day balance history available), the decider logs a warning and returns true — collection proceeds normally without ML influence.