Forecasts & Payday

Overview

The prod-insight-api Lambda exposes two related capabilities via its REST API:

Forecasts (GET /{user_id}/forecasts) — a combined view of a user’s recurring income, recurring expenses, and ritual expenses, assembled either from Pave-cached data or from real-time FloatMe-generated analysis depending on GrowthBook flags.
Payday prediction (called internally during float creation and by GET /{user_id}/insights/employment/payday) — predicts the user’s next payday date using up to three algorithms run in parallel and compared before returning a result.

Both capabilities depend on the same upstream data sources: Pave-cached insight entities in DynamoDB, employment records in RDS, and recent transactions from the Transactions Service.

Forecasts (`GET /{user_id}/forecasts`)

The forecasts endpoint assembles a Forecasts response from income sources and recurring expenses. There is a single data path: the user’s last 93 days of transactions are fetched from the Transactions Service and run through FloatMe’s own recurring detection algorithm. No Pave-cached data is read, and no GrowthBook flag selects between paths.

Repository or build failures return 500. A user with no detectable recurring activity gets a 200 with empty arrays rather than an error.

ritual is always an empty array: ritual (discretionary recurring) expenses were a Pave-only concept and FloatMe detection does not produce them. The field is retained so the response shape is unchanged for clients.

Because the response is computed per request rather than read from a cache, it always reflects the user’s current transaction history.

FloatMe Recurring Detection

The FloatMe forecasts algorithm analyses raw transaction history directly, without Pave:

Fetches up to 93 days of transactions from the Transactions Service.
Groups transactions by account, then by transaction name.
Within each group, separates credits (income) from debits (expenses) before analysis — this prevents income transactions with the same merchant name as an expense from being mixed.
Applies a Jaro-Winkler similarity threshold (≥ 0.90) to group similarly-named transactions together.
For expense groups: confirms recurrence by checking that transactions in the group span multiple calendar dates; builds a RecurringExpense entity if recurring.
For income groups: confirms recurrence similarly and validates against Plaid category data.

Data-Capture Writes

When the FloatMe path runs alongside the Pave path, the result is written to the fmdatacapture DynamoDB table with:

Field Value

Field	Value
Event name	`CASHFLOW_ANALYTICS`
Sort key	`CASHFLOW_ANALYTICS#{today’s date (UTC)}`
TTL	2 days (172,800 seconds)

Event name

CASHFLOW_ANALYTICS

Sort key

CASHFLOW_ANALYTICS#{today’s date (UTC)}

TTL

2 days (172,800 seconds)

This is a shared cross-service data-capture table — the Insight Service is a producer, not the owner.

Payday Prediction

When It Runs

Payday prediction (EvaluateNextPayday) is called in two contexts:

During float creation — called by the Float Service with a non-empty loan_id.
Direct API query — GET /{user_id}/insights/employment/payday calls it with an empty loan_id.

Both paths behave identically. Float creation has no special case: it reads the same cache, runs the same chain, and writes the same records.

GET /{user_id}/insights/employment/payday takes an optional JSON request body (earliest_date, loan_id). To use the current date as the earliest payday, send an empty JSON object {} or a completely empty body — the NormalizeEmptyPaydayBody middleware rewrites an empty body to {} before decoding, so the mobile app’s no-body call and an internal caller’s {} behave identically. A non-empty but malformed body or an invalid earliest_date returns 400; EvaluateNextPayday failures return 500.

Same-Day Cache

A prediction is computed at most once per user per calendar day. The first request of the day runs the chain below and writes the result to the payday DynamoDB entity — one row per user, overwritten in place, keyed by PAYDAY#<user_id> / PREDICTION. Every later request that day is a single GetItem.

Validity comes from the row’s prediction_date matching the current UTC day, never from its TTL. DynamoDB deletes expired items lazily — up to 48 hours late — so an expired row can still come back from GetItem; the 2-day TTL exists only to stop abandoned rows accumulating.

A cache hit still writes a data-capture record, so BI keeps one row per float even when the prediction itself was served from cache. Those records carry from_cache: true.

The cache is bypassed in both directions when the caller supplies an explicit earliest_date: the prediction is a function of that date, so a user-keyed row would be wrong for it, and writing one would poison the default path for every other caller. The app’s normal call sends no body at all, which is the cacheable path.

Any change to a user’s employment record deletes their cached row immediately — see Cache Invalidation.

Prediction Algorithms

The algorithms form a fallback chain rather than running unconditionally. FM Legacy is always computed because it is arithmetic over an employment record already in hand; the others are guarded.

Pave has two selection strategies over one API call: the fetched RecurringIncomeSources list is run through both V1 and V2, and a rollout flag decides which one feeds the chain. Note the naming collision — "Pave V2" is a way of picking an income source, unrelated to the "FloatMe V2" algorithm that sits below Pave in the chain.

Algorithm Key Description

Algorithm	Key	Description
Pave V1	Skipped, with V2, when the pay-frequency gate excludes the user	Fetches the user’s `RecurringIncomeSources` from Pave via the API. Tries to find a source whose name matches the user’s employment record (Jaro-Winkler ≥ 0.90); falls back to any valid income source if no match is found. Generates next payday and collection dates from the selected source’s `next_date` and `normalized_frequency`.
Pave V2	Skipped with V1; returned only when `insight.payday_prediction.pave_v2.rollout = true`	Selects the best recurring income source without matching on the (possibly stale) employer name. Filters the shared `RecurringIncomeSources` list to sources that are `status = active`, have a payday-plausible `normalized_frequency` (monthly or more frequent: `daily`, `weekly`, `biweekly`, `semimonthly`, `monthly` — `bimonthly`/`quarterly`/`semi_annual`/`annual` are excluded), have a usable `next_date`, and have an average amount ≥ $50 (below that, recurring credits are almost entirely bank interest/dividend postings). It then ranks the survivors by, in order: matches the pay cadence stated on the user’s employment record (`pay_frequency`, with a stated `bi-monthly` treated as semi-monthly — twice a month — for this comparison only); is not a wage-advance stream (tags containing `Earned Wage Access` / `Cash Advance`); `recurrence_strength` (VERY_HIGH → VERY_LOW; INSUFFICIENT_DATA/unknown last); transaction `count`; amount magnitude. The payday is generated from the top source; payday/collection date generation is shared with V1. The ranking was selected by validating candidate strategies against 32k production samples (see the pave-validate proposal doc).
FloatMe Legacy (FM)	(always runs)	Uses the user’s RDS employment record (`pay_frequency`, `start_date`, `employer_name`) to calculate the next payday deterministically. Computes instant (24h buffer), standard (96h buffer), and extended payback dates.
FloatMe V2	Runs only when Pave produced nothing	Fetches up to 93 days of transactions from the Transactions Service, filters them to payroll-like candidates using four passes (non-integer amounts, recurring names, override keywords, blacklist exclusions), then runs a 5-phase DOW/DOM analysis pipeline to predict the cadence and next payday date. Does not use employment records or Pave. See FloatMe V2 Payday Analyzer for the full algorithm walkthrough.

Pave V1

Skipped, with V2, when the pay-frequency gate excludes the user

Fetches the user’s RecurringIncomeSources from Pave via the API. Tries to find a source whose name matches the user’s employment record (Jaro-Winkler ≥ 0.90); falls back to any valid income source if no match is found. Generates next payday and collection dates from the selected source’s next_date and normalized_frequency.

Pave V2

Skipped with V1; returned only when insight.payday_prediction.pave_v2.rollout = true

Selects the best recurring income source without matching on the (possibly stale) employer name. Filters the shared RecurringIncomeSources list to sources that are status = active, have a payday-plausible normalized_frequency (monthly or more frequent: daily, weekly, biweekly, semimonthly, monthly — bimonthly/quarterly/semi_annual/annual are excluded), have a usable next_date, and have an average amount ≥ $50 (below that, recurring credits are almost entirely bank interest/dividend postings). It then ranks the survivors by, in order: matches the pay cadence stated on the user’s employment record (pay_frequency, with a stated bi-monthly treated as semi-monthly — twice a month — for this comparison only); is not a wage-advance stream (tags containing Earned Wage Access / Cash Advance); recurrence_strength (VERY_HIGH → VERY_LOW; INSUFFICIENT_DATA/unknown last); transaction count; amount magnitude. The payday is generated from the top source; payday/collection date generation is shared with V1. The ranking was selected by validating candidate strategies against 32k production samples (see the pave-validate proposal doc).

FloatMe Legacy (FM)

(always runs)

Uses the user’s RDS employment record (pay_frequency, start_date, employer_name) to calculate the next payday deterministically. Computes instant (24h buffer), standard (96h buffer), and extended payback dates.

FloatMe V2

Runs only when Pave produced nothing

Fetches up to 93 days of transactions from the Transactions Service, filters them to payroll-like candidates using four passes (non-integer amounts, recurring names, override keywords, blacklist exclusions), then runs a 5-phase DOW/DOM analysis pipeline to predict the cadence and next payday date. Does not use employment records or Pave. See FloatMe V2 Payday Analyzer for the full algorithm walkthrough.

Result Selection

GetItem PAYDAY#<uid> / PREDICTION
  └─ prediction_date == today (UTC) ──▶ return cached prediction

Pave — skipped entirely when the pay-frequency gate excludes the user
  └─ one API call, run through both selectors
       chosen = Pave V2 when insight.payday_prediction.pave_v2.rollout = true
                AND Pave V2 produced a payday; otherwise Pave V1
  └─ non-empty payday, no error ──▶ return chosen Pave prediction

FloatMe V2 — only computed because Pave produced nothing
  └─ non-empty payday ──▶ return V2 prediction

FloatMe Legacy
  └─▶ return FM Legacy prediction

The Pave V2 rollout flag only changes which Pave source is selected; when V2 yields nothing it falls back to Pave V1, and the fallback to FM Legacy for no Pave data at all is unchanged. The chosen Pave prediction is also what gets persisted via PaydayRepo.SavePayday.

All results (and any errors) are logged for offline comparison via the "Payback prediction comparison" log line (which includes the Pave V1, Pave V2, FM Legacy, and FloatMe V2 predictions) and the PAYDAY_PREDICTION data-capture payload, which records used_pave_v2 alongside detection_used and from_cache.

Running FloatMe V2 only on Pave failure keeps its cost proportional to the population that needs it: it is the expensive algorithm, costing a full transaction pull plus its own data capture. As the pay-frequency gate rolls out, more users are excluded from Pave and V2 becomes their primary predictor — which is why the cache matters, since without it V2 would run on every payday call.

A cached row may legitimately have no V2 value, and no Pave value. That is the chain short-circuiting, not a stale row.

Cache Invalidation

The only user-editable input to a prediction is the Postgres employment record, so the cached row is deleted wherever that record is written or deleted:

CreateEmployment and UpdateEmployment (both insert a new row)
DeleteEmployment
SaveIncomeVerificationInfo (saving a manual verification deletes the employment record)
the institution-change handler, when an account change deletes the record

Failures are logged and never surfaced — a stale prediction is a smaller problem than losing the user’s edit. UpdateIncomeVerificationInfo needs no invalidation: it only updates the DynamoDB verification row, which the predictor never reads.

Persistence

Store What is written

Store	What is written
`prod-pave` DynamoDB (`payday` entity)	The winning prediction — chosen Pave, FloatMe V2, or FM Legacy — named by `detection_used`, alongside whatever each algorithm produced (`pave`, `fm`, `v2`), each carrying its own instant, standard, and extended collection payday dates. `pave` and `v2` are nil-able because the chain short-circuits. Written via `PaydayRepo.SavePayday` on the day’s first evaluation, for float creation and direct calls alike.
`fmdatacapture` DynamoDB	Three entries per evaluation (float creation and direct calls alike): `PAYDAY_PREDICTION` — full structured payload: all predictions where available (Pave V1, Pave V2, FM Legacy, FloatMe V2), the `used_v2`/`used_pave_v2` flags, all errors, `loan_id`, `prediction_date`. TTL: 2 days. `FM_PAYDAY_FILTERED_TXNS#{date}` — the list of payroll-candidate transactions used by V2 (one record per user per day). `FM_PAYDAY_PREDICTED_PAYDAY#{date}` — the V2 predicted date and cadence (one record per user per day).

prod-pave DynamoDB (payday entity)

The winning prediction — chosen Pave, FloatMe V2, or FM Legacy — named by detection_used, alongside whatever each algorithm produced (pave, fm, v2), each carrying its own instant, standard, and extended collection payday dates. pave and v2 are nil-able because the chain short-circuits. Written via PaydayRepo.SavePayday on the day’s first evaluation, for float creation and direct calls alike.

fmdatacapture DynamoDB

Three entries per evaluation (float creation and direct calls alike):

PAYDAY_PREDICTION — full structured payload: all predictions where available (Pave V1, Pave V2, FM Legacy, FloatMe V2), the used_v2/used_pave_v2 flags, all errors, loan_id, prediction_date. TTL: 2 days.
FM_PAYDAY_FILTERED_TXNS#{date} — the list of payroll-candidate transactions used by V2 (one record per user per day).
FM_PAYDAY_PREDICTED_PAYDAY#{date} — the V2 predicted date and cadence (one record per user per day).

Collection Date Calculation

The FM Legacy predictor calculates four collection dates beyond the raw payday:

Field Buffer from request date

Field	Buffer from request date
`next_instant_payday`	Next payday after `today + 24h`
`next_standard_payday`	Next payday after `today + 96h`
`extended_instant_payday`	Next payday after `today + EXTENDED_PAYBACK_START_DAYS` (configured via env var)
`extended_standard_payday`	Same as `extended_instant_payday` (currently identical)

next_instant_payday

Next payday after today + 24h

next_standard_payday

Next payday after today + 96h

extended_instant_payday

Next payday after today + EXTENDED_PAYBACK_START_DAYS (configured via env var)

extended_standard_payday

Same as extended_instant_payday (currently identical)

GrowthBook Flags

Flag Key Default Effect

Flag Key	Default	Effect
`insight.pave.pay_frequency_gate.rollout`	`false`	Per-user flag: when true, the Pave income call is skipped for users whose pay frequency is outside `insight.pave.upload.pay_frequencies.config`, making FM V2 their primary predictor

insight.pave.pay_frequency_gate.rollout

false

Per-user flag: when true, the Pave income call is skipped for users whose pay frequency is outside insight.pave.upload.pay_frequencies.config, making FM V2 their primary predictor

Architecture — System context and Lambda inventory
FloatMe V2 Payday Analyzer — Full walkthrough of the V2 algorithm (filter pipeline, DOW/DOM grids, 5-phase prediction)
Pave Mining — How recurring, ritual, and income entities are written to DynamoDB by the miner
DynamoDB Tables — payday, recurring, ritual, income entity schemas
PostgreSQL Schema — Employment table used by FM Legacy predictors
Feature Flags — Full GrowthBook flag reference