FloatMe V2 Payday Analyzer

The FloatMe V2 Payday Analyzer is the Insight Service’s transaction-history-based payday prediction algorithm. Unlike the Pave predictor (which calls Pave’s RecurringIncomeSources API) and the FM Legacy predictor (which calculates deterministically from an employment record in RDS), the V2 algorithm predicts payday purely from the user’s raw transaction history over the last 93 days.

V2 is the most complex of the three predictors and is gated behind two GrowthBook flags. It was ported from a Python reference implementation and preserves that implementation’s algorithm step-for-step.

See Forecasts & Payday for how V2 fits alongside the Pave and FM Legacy predictors, and how result selection works.

Code Layout

File Role

pkg/payday/predictor.go

Top-level Predictor.EvaluateNextPayday — decides whether to run V2 based on GrowthBook flags and loan_id presence. V2 is invoked via p.FloatMePredictor.EvaluatePayday inside EvaluateNextPayday.

pkg/payday/floatme_predictor.go

FloatMePredictor.EvaluatePayday — the V2 entrypoint. Fetches transactions, filters them to income candidates, runs the analysis, and returns a NextPayday result. Also contains FilterIncomeTxns (the 4-pass filter pipeline).

pkg/payday/floatme_analyzer.go

The analyzer core. BuildDOWAnalysis and BuildDOMAnalysis produce the two grids; PredictPayday runs the 5-phase prediction pipeline over them.


Entry Conditions

V2 only runs when both conditions are true:

insight.payday_prediction.v2.enabled[user_id] == true   (GrowthBook flag, per-user)
  AND
loan_id != ""                                          (i.e. during float creation)

Per Forecasts & Payday, direct GET /{user_id}/insights/employment/payday calls with no loan_id do not trigger V2. This is because V2 is expensive (it fetches 93 days of transaction history per call) and this code path runs during every payday evaluation.

Even when V2 runs, its result is only returned to the caller if insight.payday_prediction.v2.rollout[user_id] == true. Otherwise the V2 result is computed, captured for offline comparison via the insight.payday.v2.comparison Datadog metric, and then discarded — the Pave or FM Legacy result is returned instead.


Pipeline Overview

EvaluatePayday(ctx, userID)
  │
  ├─▶ 1. Fetch 93 days of transactions from Transactions Service
  │     └─ Filter to primary account + credits only, flip sign to positive
  │
  ├─▶ 2. FilterIncomeTxns (4-pass filter)
  │     ├─ Non-integer amount pass ($300+ with cents)
  │     ├─ Recurring name pass ($800+ round, ≥ 2 distinct dates)
  │     ├─ Override keyword pass ("payroll")
  │     └─ Blacklist keyword pass (gambling, advances, reversals, brokerages)
  │
  ├─▶ 3. Gate: total volume ≥ $3,000 AND ≥ 3 income transactions
  │
  ├─▶ 4. BuildDOWAnalysis  ← weeks × 7 grid + Z-scores
  │     └─ Global DOW stats (all weeks) + Recent DOW stats (last 6 weeks)
  │
  ├─▶ 5. BuildDOMAnalysis  ← months × 31 grid
  │     └─ Per-day count, vol%, 3-day rolling vol%, EOM/BOM hit counts
  │
  └─▶ 6. PredictPayday (5 phases)
        ├─ Phase 0: Gatekeeper (total volume)
        ├─ Phase 1: Signal harvesting (anchors)
        ├─ Phase 2: Conflict detection (Whale vs Habit)
        ├─ Phase 3: Semi-monthly veto
        ├─ Phase 4: DOW waterfall  ← returns Weekly or Bi-Weekly
        └─ Phase 5: DOM peak fallback ← returns Monthly or Semi-Monthly

The pipeline is implemented as a series of pure functions that share state via the analysisState struct — this keeps the analyzer deterministic and trivially testable.


Stage 1: Transaction Fetch

getTxnsForFMForecasts (in floatme_predictor.go):

  1. Calls TxnsService.GetTransactions for the range [today-93d, today].

  2. Fetches the user’s primary account (the FloatMe-designated main bank account).

  3. Filters transactions to:

    • Amount < 0 (credits only, since Plaid/Transactions Service represents credits as negative)

    • AccountId == primaryAccount.AccountId (exclude secondary accounts)

  4. Flips the sign — amounts are stored as absolute positive cents from this point forward.

All downstream analysis operates on positive cents (int64). This matches the Python reference implementation’s payroll_proxy column.


Stage 2: FilterIncomeTxns — The 4-Pass Filter

Raw credit transactions contain far more than just payroll: refunds, transfers between accounts, gambling winnings, investment dividends, early-wage-access advances, etc. FilterIncomeTxns whittles them down to a set of payroll-like candidates using four sequential passes, then dedupes to one transaction per calendar day.

Base Masks (applied to every pass)

Every pass applies two base masks before its own logic:

Mask Condition

hist_mask

Transaction date is strictly before today at midnight. Excludes today’s transactions (they may not be settled).

cap_mask

Transaction amount is less than maxPayrollCap = $5,000. Excludes large transfers, refunds, and one-off loans.

The $5,000 cap is a hard ceiling. Real paychecks above $5,000 net/paycheck exist (high earners paid weekly or biweekly), but the false-positive risk from transfers/loans is judged more damaging than the false-negative on high earners.

Pass 1 — Non-Integer Amount (the "cents" heuristic)

Constant Value

nonIntMinVal

$300 (30000 cents)

Condition

amount >= $300 AND amount % 100 != 0

Real net-pay amounts almost always have cents (tax withholding, 401k contributions, and benefit deductions rarely produce round dollar amounts). A $1,204.32 deposit is strong evidence of a paycheck; $1,000.00 is more likely a transfer.

Pass 2 — Recurring Name (round-number rescue)

Round-number deposits are only trusted if the same cleaned transaction name appears across multiple distinct calendar days.

Constant Value

nameMinValue

$800 (80000 cents)

nameMinRecurrences

2 distinct calendar days

Condition

amount >= $800 AND cleaned_name appears on ≥ 2 distinct dates (all ≥ $800)

This rescues round-number payrolls (salaried employees paid in round thousands) without opening the door to one-off round transfers.

Pass 3 — Override Keywords (always include)

Transactions whose cleaned name contains "payroll" are always included (subject to base masks), regardless of amount or recurrence.

overrideKeywords = ["payroll"]

This catches low-dollar payroll edge cases (first paychecks, partial periods, bonus deposits) that would otherwise fall below the amount thresholds.

Pass 4 — Blacklist Keywords (always exclude)

Transactions whose cleaned name matches any blacklist keyword are always excluded, even if they passed one of the previous passes.

Category Keywords

Gambling

betfair, draftkings, sptsbk, casino, casears, fanatics, sportsbook, betmg, bet365, hard rock, jackpot, legendz, funzpoints, bingo

Early Wage Access / Cash Advance

dailypay, daily pay, dave.com, dave inc, moneylion, advance, moneytree

Reversals / Failures

return, refund, overdraft, reversal, reverse, rejected, nsf

Investment / Brokerage

fidelity, charles schwab, coinbase, robinhood, vanguard, betterment, etrade

Each keyword is a lowercase plain substring match against the cleaned transaction name (cleanTxnName strips non-alpha characters and normalises whitespace).

Combine + Dedup

The three "include" passes are unioned (keyed by TransactionID), then the blacklist is applied as a final subtractive filter. Finally, results are sorted by (date ASC, amount DESC) and deduped by calendar date — only the highest-amount transaction per day survives.

This mirrors the Python reference’s drop_duplicates(subset=['event_date']) behavior and ensures that a single day contributes exactly one data point to the downstream analysis.

Example input (same day, user has 2 accounts → we already filtered to primary):
  2024-01-12  $2,403.52  "ACME CORP PAYROLL"      → Pass 1 ✓
  2024-01-12  $  500.00  "VENMO TRANSFER"         → Pass 1 ✗ (round), Pass 2 ✗ (< $800)

After dedup: only ACME $2,403.52 remains for 2024-01-12.

Data Capture

The filtered transaction list is written to the fmdatacapture DynamoDB table under:

SK: FM_PAYDAY_FILTERED_TXNS#{today}

This produces one record per user per day (same-day re-runs overwrite the SK) for offline analysis and debugging.


Stage 3: Volume & Count Gates

Before any analysis runs, two gates must pass:

Gate Condition

Total volume

sum(amount) >= minTotalVolReq = $3,000 (300,000 cents)

Minimum count

len(filtered_txns) >= minIncomeTransactionsRequired = 3

If either gate fails, V2 returns ErrNotEnoughIncomeData without running the analysis. The caller (Predictor.EvaluateNextPayday) logs the failure and falls back to the Pave or FM Legacy result.

The earliest-due date is also set here: p.EarliestDue = today + 2 days. This 2-day buffer is used later to ensure the predicted payday is not too soon (a 24h or same-day prediction would collapse the loan repayment window).


Stage 4: Day-of-Week (DOW) Analysis

BuildDOWAnalysis constructs a weeks × 7 matrix (DowRawGrid) where each cell holds the transaction amount for that (week, day-of-week) pair.

Grid Construction

The grid’s row 0 is aligned to the Sunday on or before the earliest filtered transaction. Row count extends to the current week. Columns are [Sun, Mon, Tue, Wed, Thu, Fri, Sat].

StartDate = Sunday on or before earliest txn
NumWeeks  = floor((today - StartDate) / 7 days) + 1
DowRawGrid[weekIdx][dayIdx] = amount (cents) for that slot, 0 if none

Example filtered txns:
  2024-01-10 Wed  $500
  2024-01-15 Mon  $500
  2024-01-24 Wed  $500

StartDate = 2024-01-07 (Sun)

           Sun   Mon   Tue   Wed   Thu   Fri   Sat
         ┌─────┬─────┬─────┬─────┬─────┬─────┬─────┐
  wk 0   │ $0  │ $0  │ $0  │$500 │ $0  │ $0  │ $0  │
  wk 1   │ $0  │$500 │ $0  │ $0  │ $0  │ $0  │ $0  │
  wk 2   │ $0  │ $0  │ $0  │$500 │ $0  │ $0  │ $0  │
         └─────┴─────┴─────┴─────┴─────┴─────┴─────┘

Bypass Filtering

Before populating the grid, weeklyBypassKeywords = ["social security", "vacp", "ssa treas"] are stripped from the pattern-matching input. These recurring government deposits have very distinctive patterns (1st/3rd of month) that would otherwise distort the DOW analysis. They are still counted in GrandTotalFull (the global denominator) so their volume isn’t lost — they just don’t influence the grid.

Z-Score Calculation

For each DOW column, DowZScores[week][day] = (amount - mean) / stddev is computed over the non-zero cells in that column. This surfaces anomalies — a week where a Friday paycheck was $10,000 vs. the usual $2,500 will have z ≈ +3 and will be excluded from downstream "valid week" counts.

Columns with ≤ 1 non-zero value are skipped (insufficient data for z-scores).

The z-score acceptance band is [dowZLow, dowZHigh] = [-1.5, +2.5]. Asymmetric on purpose: underpayments (partial weeks, delayed deposits) are more tolerated than overpayments (bonuses, tax refunds) because the latter signal non-payroll.

Global DOW vs Recent DOW

The grid is split into two overlapping views:

View Rows included

Purpose

GlobalDOW

All rows

Long-term pay rhythm. Vol% denominator is GrandTotalFull (bypass-inclusive).

RecentDOW

Last recentWindowWeeks = 6 rows

For each DOW column, both views compute:

  • Count — number of valid weeks (non-zero amount AND z-score in [-1.5, +2.5])

  • VolPct — fraction of total volume that fell on this DOW

  • DensityCount / NumWeeks (global) or Count / recentWindowWeeks (recent)

The dual view is the heart of V2’s "identity vs recent-change" reasoning in Phase 4.


Stage 5: Day-of-Month (DOM) Analysis

BuildDOMAnalysis constructs a months × 31 matrix (DomGrid) where each cell holds the transaction amount for that (month, day-of-month) pair.

mIdx     = (txn.Year - startMonth.Year)*12 + (txn.Month - startMonth.Month)
dayIdx   = txn.Day - 1                                     (0-based)
DomGrid[mIdx][dayIdx] = amount

           day:  1    2    3    ···   15    ···   29   30   31
        dayIdx:  0    1    2          14          28   29   30
               ┌─────┬────┬────┬ ··· ┬─────┬ ··· ┬────┬────┬────┐
  mIdx=0  Jan  │$2500│    │    │     │$2500│     │    │    │$100│
  mIdx=1  Feb  │$2450│    │    │     │$2450│     │    │$100│    │
  mIdx=2  Mar  │$2600│    │    │     │$2600│     │    │    │$100│
               └─────┴────┴────┴ ··· ┴─────┴ ··· ┴────┴────┴────┘

Per-Day Stats

For each day column (1–31), the analysis computes:

Stat Meaning

Count

Number of distinct months that had a transaction on this day-of-month.

VolPct

Fraction of total DOM volume that fell on this specific day.

Roll3dPct

Fraction of total DOM volume in a 3-day rolling window ending on this day. Handles "I got paid on the 14th this month because the 15th was a Sunday" — calendar anchors naturally smear ±1 day due to weekend shifts, so a rolling window is more robust than a point count.

EOM and BOM Hit Counts

Two additional scalars are tracked:

  • EomScore — number of transactions with day >= LastDayOfMonth - 2 (i.e. the last 3 calendar days of the month)

  • BomScore — number of transactions with day ⇐ 3 (i.e. the first 3 calendar days)

These are used by Phase 5 to classify the pattern as end-of-month vs beginning-of-month anchoring.


Stage 6: PredictPayday — The 5-Phase Pipeline

PredictPayday is the prediction state machine. It uses the two analyses to classify the user’s pay pattern and project the next payday date.

Phase 0: The Gatekeeper

if GrandTotal < $3,000  →  return nil (no prediction)

A redundant guard (EvaluatePayday already checks this before calling PredictPayday), but kept here so PredictPayday is safe to call directly in tests.

Phase 1: Signal Harvesting (Anchors)

Scan the 31 DOM columns. A day is an anchor if:

  • Count >= domAnchorCount = 2 (occurred in at least 2 distinct months)

  • VolPct >= supplementalAnchorThreshold = 0.10 (contributed ≥ 10% of total DOM volume)

Anchors are the calendar days that look like "scheduled pay dates" — typically the 1st, 15th, and/or last day of month.

Phase 2: Conflict Detection — "The Whale" vs "The Habit"

The analyzer must resolve a classic tension:

  • The Whale (Calendar Spike) — one or more calendar days concentrate ≥ 25% of total volume. Signals a salaried employee paid on fixed dates (1st, 15th, EOM).

  • The Habit (DOW Density) — one day-of-week shows up in most weeks. Signals a wage worker paid every Friday regardless of the calendar.

When both signals exist, the analyzer applies three tests to decide which to trust:

Signal Threshold Meaning

hasCalendarSpike

any DOM day with VolPct >= 0.25

A single calendar day concentrates a quarter of all income. Strong calendar signal.

isHabitUnbreakable

any DOW with GlobalDOW.VolPct >= 0.60 OR any DOW with RecentDOW.Density >= 0.50

A single weekday concentrates > 60% of all volume, OR that weekday fires in > half of recent weeks. Either way, the weekly habit is too strong to overrule.

hasSomeHabit

any DOW with GlobalDOW.Density >= 0.35 OR RecentDOW.Density >= 0.35

A weaker weekly signal — a weekday shows up in at least a third of weeks. Used as a tiebreaker.

Resolution:

trustCalendar = !isHabitUnbreakable && hasCalendarSpike

If the habit is unbreakable, the calendar signal is ignored regardless of how strong it looks.

Phase 3: The Semi-Monthly Veto

A user paid on both the 1st and the 15th produces two DOM anchors AND a weekly rhythm (since the 1st-to-15th gap is 14 days, it often aligns with a single DOW). Without the veto, Phase 4 would see a weekly pattern and miss the true semi-monthly cadence.

isSemiMonthlyPattern = len(anchors) >= 2
                       OR (len(anchors) == 1 AND EomScore >= 2)

forceCalendarMode = switch {
    isHabitUnbreakable:       false     // habit wins regardless
    isSemiMonthlyPattern:     !hasSomeHabit   // force calendar unless
                                              // there's a weekly habit
                                              // overriding it
    default:
        hasSomeHabit ? false : trustCalendar
}

forceCalendarMode = true means skip Phase 4 entirely and go straight to Phase 5.

Phase 4: DOW Habit Waterfall (Weekly / Bi-Weekly)

Runs only if forceCalendarMode == false. Implemented in runDOWWaterfall + ResolveDOW.

Both Recent and Global DOW stats are ranked by VolPct descending. Then:

bestRecentDOW = argmax(RecentDOW.VolPct)
bestGlobalDOW = argmax(GlobalDOW.VolPct)
isIdentityMatch = (bestRecentDOW == bestGlobalDOW)

Path A — Recent history is strong:
  if RecentDOW[bestRecent].VolPct >= 0.35:
    candidates = isIdentityMatch ? [bestRecent] : all 7 recent-ranked DOWs
    for each candidate DOW (Count >= 2):
      return ResolveDOW(dowIdx, isIdentityMatch)

Path B — Global history is strong (fallback):
  if GlobalDOW[bestGlobal].VolPct >= 0.35:
    for each global-ranked DOW (Count >= 2):
      return ResolveDOW(dowIdx, forceGlobalCadence=true)

The isIdentityMatch branch handles the "new job" case: if bestRecent != bestGlobal, the user has likely changed employers. The waterfall then iterates through all recent DOWs in rank order rather than just the top one — this tolerates a noisy "first few weeks of new job" signal.

ResolveDOW: Cadence + Projection

Given a target DOW index, determine whether the pattern is Weekly or Bi-Weekly:

densityScore = forceGlobalCadence ? max(Recent.Density, Global.Density)
                                  : Recent.Density

cadence = (densityScore >= 0.60) ? WEEKLY    (7 days)
                                 : BIWEEKLY  (14 days)

Then find the last valid transaction on that weekday (z-score in [-1.5, +2.5]) and project forward:

nextPayday = lastValidDate + cadenceDays
while nextPayday < EarliestDue:   // today + 2 days
    nextPayday += cadenceDays

The "roll forward" loop handles the case where the last valid txn was recent enough that adding 7 or 14 days still lands before EarliestDue — skip forward one cadence at a time until we’re past the buffer.

Phase 5: DOM Peak Fallback (Monthly / Semi-Monthly)

Runs only if Phase 4 returned nil OR forceCalendarMode == true. Implemented in RunDOMPeakFallback + ResolveDOM.

Gated by: EomScore >= 2 OR len(anchors) > 0. If neither is true, V2 returns nil (no prediction).

Every DOM day is scored:

peak.Score    = DomStats.VolPct + DomStats.Roll3dPct
peak.IsAnchor = (day in anchors)
peak.Vol      = DomStats.VolPct

Peaks are sorted with anchors prioritised, then by score descending. The top peak is the primary peak. A secondary peak is searched for:

for each peak (ranked, starting from #2):
    if |peak.Day - primary.Day| >= peakSeparationDays = 10  AND
       peak.Vol >= supplementalAnchorThreshold = 0.10:
        secondary = peak.Day; break

The 10-day separation avoids a "phantom secondary" from a day adjacent to the primary peak (e.g. 14th and 15th should be treated as a single anchor, not two).

Case Result

Secondary peak found

Cadence = SEMIMONTHLY. ResolveDOM is called with both days; the earlier resulting date is returned.

No secondary peak

Cadence = MONTHLY. ResolveDOM is called with just the primary day.

ResolveDOM: Weekend & Holiday Adjustment

For each target day:

  1. Construct the target date in the current month (safeDate clamps day 31 to Feb 28/29, etc.).

  2. Apply weekend adjustment based on shift direction (see below).

  3. If the adjusted date is before EarliestDue, advance to next month and re-adjust.

The shift direction is determined per-day by isForwardShiftDay:

isForwardShiftDay(day):
    govt_vol = sum(txn.amount for txn on this DOM matching bypass keywords)
    total_vol = sum(txn.amount for txn on this DOM)
    return (govt_vol / total_vol) > 0.50

The rationale: government deposits (Social Security, VA benefits) shift Sat→Mon and Sun→Mon (forward), while corporate payrolls shift Sat→Fri and Sun→Fri (backward). The shift direction is decided per-calendar-day because a user could have a mix of both (e.g. SSA on the 3rd + private payroll on the 15th).

AdjustWeekendBackward (corporate):     AdjustWeekendForward (government):
  Sat → Fri (-1 day)                     Sat → Mon (+2 days)
  Sun → Fri (-2 days)                    Sun → Mon (+1 day)
V2 does not adjust for federal bank holidays. The FM Legacy predictor does; V2 does not. This is a known gap — a payday that lands on a Monday holiday will currently be predicted as that Monday, not Tuesday.

Output

On success, EvaluatePayday returns:

{
  "payday":         "2024-01-26",
  "frequency":      "Fri BIWEEKLY",
  "payday_cadence": "BIWEEKLY",
  "predictor":      "FLOATME_V2_PAYDAY_CALCULATION"
}

frequency is human-readable; payday_cadence is the enum used by downstream code. The prediction is also captured:

fmdatacapture DynamoDB:
  SK: FM_PAYDAY_PREDICTED_PAYDAY#{today}
  Body: { "next_payday": NextPayday }

One record per user per day (same-day re-runs overwrite).


Cadence Summary

The four possible cadence outcomes and where they originate:

Cadence Phase Typical User

WEEKLY

Phase 4

Hourly worker paid every Friday. DOW density ≥ 0.60 on one weekday.

BIWEEKLY

Phase 4

Salaried employee paid every other Friday. DOW vol% ≥ 0.35 on one weekday, density < 0.60.

SEMIMONTHLY

Phase 5

Paid on 1st and 15th (or 15th and EOM). Two DOM anchors separated by ≥ 10 days.

MONTHLY

Phase 5

Paid on a single fixed calendar day (e.g. the 15th, or EOM).

UNKNOWN

(enum only)

Never emitted by the current pipeline — V2 returns nil (which becomes ErrNotEnoughIncomeData) rather than UNKNOWN.


Known Limitations

Limitation Detail

Federal bank holidays

Not adjusted for. A predicted Monday that is a federal holiday will not shift to Tuesday.

$5,000 amount cap

High earners with single paychecks above $5K are excluded entirely from the analysis.

93-day window

Users with fewer than 93 days of transaction history at their primary account will have thinner grids and may fail the volume/count gates.

Primary-account only

Users whose payroll goes to a non-primary account receive no prediction.

3-txn minimum

Monthly-paid users with < 93 days of history may have fewer than 3 transactions and fail the count gate.

EOM vs calendar-day-31

The EOM classification uses day >= LastDayOfMonth - 2, which for a 31-day month is days 29, 30, 31. For a 28-day February, it’s days 26, 27, 28 — which can catch non-EOM anchors in short months.


Debugging a Prediction

When a V2 prediction needs investigation, the two fmdatacapture DynamoDB records for the user on the prediction date provide a complete trail:

SK prefix Contents

FM_PAYDAY_FILTERED_TXNS#{date}

The full list of income candidates that made it through FilterIncomeTxns. Compare against raw Transactions Service data to spot filter gaps.

FM_PAYDAY_PREDICTED_PAYDAY#{date}

The final prediction (payday, cadence, frequency). Doesn’t capture the intermediate grids; for those, re-run locally with the filtered txns from the first record.

The insight.payday.v2.comparison Datadog metric tags every prediction with is_equal:{bool}, fm_off_by:{days}, and pave_empty:{bool} for offline analysis of V2 vs Pave disagreement.


  • Forecasts & Payday — how V2 fits alongside Pave and FM Legacy, result-selection priority

  • Feature Flagsinsight.payday_prediction.v2.enabled and insight.payday_prediction.v2.rollout

  • DynamoDB Tablesfmdatacapture record schemas used for data capture

  • Architecture — Transactions Service dependency and the api Lambda that hosts the V2 predictor