FloatMe V2 Payday Analyzer
The FloatMe V2 Payday Analyzer is the Insight Service’s transaction-history-based payday prediction algorithm. Unlike the Pave predictor (which calls Pave’s RecurringIncomeSources API) and the FM Legacy predictor (which calculates deterministically from an employment record in RDS), the V2 algorithm predicts payday purely from the user’s raw transaction history over the last 93 days.
V2 is the most complex of the three predictors and is gated behind two GrowthBook flags. It was ported from a Python reference implementation and preserves that implementation’s algorithm step-for-step.
| See Forecasts & Payday for how V2 fits alongside the Pave and FM Legacy predictors, and how result selection works. |
Code Layout
| File | Role |
|---|---|
pkg/payday/predictor.go |
Top-level |
pkg/payday/floatme_predictor.go |
|
pkg/payday/floatme_analyzer.go |
The analyzer core. |
Entry Conditions
V2 only runs when both conditions are true:
insight.payday_prediction.v2.enabled[user_id] == true (GrowthBook flag, per-user)
AND
loan_id != "" (i.e. during float creation)
Per Forecasts & Payday, direct GET /{user_id}/insights/employment/payday calls with no loan_id do not trigger V2. This is because V2 is expensive (it fetches 93 days of transaction history per call) and this code path runs during every payday evaluation.
Even when V2 runs, its result is only returned to the caller if insight.payday_prediction.v2.rollout[user_id] == true. Otherwise the V2 result is computed, captured for offline comparison via the insight.payday.v2.comparison Datadog metric, and then discarded — the Pave or FM Legacy result is returned instead.
Pipeline Overview
EvaluatePayday(ctx, userID)
│
├─▶ 1. Fetch 93 days of transactions from Transactions Service
│ └─ Filter to primary account + credits only, flip sign to positive
│
├─▶ 2. FilterIncomeTxns (4-pass filter)
│ ├─ Non-integer amount pass ($300+ with cents)
│ ├─ Recurring name pass ($800+ round, ≥ 2 distinct dates)
│ ├─ Override keyword pass ("payroll")
│ └─ Blacklist keyword pass (gambling, advances, reversals, brokerages)
│
├─▶ 3. Gate: total volume ≥ $3,000 AND ≥ 3 income transactions
│
├─▶ 4. BuildDOWAnalysis ← weeks × 7 grid + Z-scores
│ └─ Global DOW stats (all weeks) + Recent DOW stats (last 6 weeks)
│
├─▶ 5. BuildDOMAnalysis ← months × 31 grid
│ └─ Per-day count, vol%, 3-day rolling vol%, EOM/BOM hit counts
│
└─▶ 6. PredictPayday (5 phases)
├─ Phase 0: Gatekeeper (total volume)
├─ Phase 1: Signal harvesting (anchors)
├─ Phase 2: Conflict detection (Whale vs Habit)
├─ Phase 3: Semi-monthly veto
├─ Phase 4: DOW waterfall ← returns Weekly or Bi-Weekly
└─ Phase 5: DOM peak fallback ← returns Monthly or Semi-Monthly
The pipeline is implemented as a series of pure functions that share state via the analysisState struct — this keeps the analyzer deterministic and trivially testable.
Stage 1: Transaction Fetch
getTxnsForFMForecasts (in floatme_predictor.go):
-
Calls
TxnsService.GetTransactionsfor the range[today-93d, today]. -
Fetches the user’s primary account (the FloatMe-designated main bank account).
-
Filters transactions to:
-
Amount < 0(credits only, since Plaid/Transactions Service represents credits as negative) -
AccountId == primaryAccount.AccountId(exclude secondary accounts)
-
-
Flips the sign — amounts are stored as absolute positive cents from this point forward.
All downstream analysis operates on positive cents (int64). This matches the Python reference implementation’s payroll_proxy column.
Stage 2: FilterIncomeTxns — The 4-Pass Filter
Raw credit transactions contain far more than just payroll: refunds, transfers between accounts, gambling winnings, investment dividends, early-wage-access advances, etc. FilterIncomeTxns whittles them down to a set of payroll-like candidates using four sequential passes, then dedupes to one transaction per calendar day.
Base Masks (applied to every pass)
Every pass applies two base masks before its own logic:
| Mask | Condition |
|---|---|
|
Transaction date is strictly before |
|
Transaction amount is less than |
The $5,000 cap is a hard ceiling. Real paychecks above $5,000 net/paycheck exist (high earners paid weekly or biweekly), but the false-positive risk from transfers/loans is judged more damaging than the false-negative on high earners.
|
Pass 1 — Non-Integer Amount (the "cents" heuristic)
| Constant | Value |
|---|---|
|
$300 ( |
Condition |
|
Real net-pay amounts almost always have cents (tax withholding, 401k contributions, and benefit deductions rarely produce round dollar amounts). A $1,204.32 deposit is strong evidence of a paycheck; $1,000.00 is more likely a transfer.
Pass 2 — Recurring Name (round-number rescue)
Round-number deposits are only trusted if the same cleaned transaction name appears across multiple distinct calendar days.
| Constant | Value |
|---|---|
|
$800 ( |
|
2 distinct calendar days |
Condition |
|
This rescues round-number payrolls (salaried employees paid in round thousands) without opening the door to one-off round transfers.
Pass 3 — Override Keywords (always include)
Transactions whose cleaned name contains "payroll" are always included (subject to base masks), regardless of amount or recurrence.
overrideKeywords = ["payroll"]
This catches low-dollar payroll edge cases (first paychecks, partial periods, bonus deposits) that would otherwise fall below the amount thresholds.
Pass 4 — Blacklist Keywords (always exclude)
Transactions whose cleaned name matches any blacklist keyword are always excluded, even if they passed one of the previous passes.
| Category | Keywords |
|---|---|
Gambling |
|
Early Wage Access / Cash Advance |
|
Reversals / Failures |
|
Investment / Brokerage |
|
Each keyword is a lowercase plain substring match against the cleaned transaction name (cleanTxnName strips non-alpha characters and normalises whitespace).
Combine + Dedup
The three "include" passes are unioned (keyed by TransactionID), then the blacklist is applied as a final subtractive filter. Finally, results are sorted by (date ASC, amount DESC) and deduped by calendar date — only the highest-amount transaction per day survives.
This mirrors the Python reference’s drop_duplicates(subset=['event_date']) behavior and ensures that a single day contributes exactly one data point to the downstream analysis.
Example input (same day, user has 2 accounts → we already filtered to primary):
2024-01-12 $2,403.52 "ACME CORP PAYROLL" → Pass 1 ✓
2024-01-12 $ 500.00 "VENMO TRANSFER" → Pass 1 ✗ (round), Pass 2 ✗ (< $800)
After dedup: only ACME $2,403.52 remains for 2024-01-12.
Stage 3: Volume & Count Gates
Before any analysis runs, two gates must pass:
| Gate | Condition |
|---|---|
Total volume |
|
Minimum count |
|
If either gate fails, V2 returns ErrNotEnoughIncomeData without running the analysis. The caller (Predictor.EvaluateNextPayday) logs the failure and falls back to the Pave or FM Legacy result.
The earliest-due date is also set here: p.EarliestDue = today + 2 days. This 2-day buffer is used later to ensure the predicted payday is not too soon (a 24h or same-day prediction would collapse the loan repayment window).
Stage 4: Day-of-Week (DOW) Analysis
BuildDOWAnalysis constructs a weeks × 7 matrix (DowRawGrid) where each cell holds the transaction amount for that (week, day-of-week) pair.
Grid Construction
The grid’s row 0 is aligned to the Sunday on or before the earliest filtered transaction. Row count extends to the current week. Columns are [Sun, Mon, Tue, Wed, Thu, Fri, Sat].
StartDate = Sunday on or before earliest txn
NumWeeks = floor((today - StartDate) / 7 days) + 1
DowRawGrid[weekIdx][dayIdx] = amount (cents) for that slot, 0 if none
Example filtered txns:
2024-01-10 Wed $500
2024-01-15 Mon $500
2024-01-24 Wed $500
StartDate = 2024-01-07 (Sun)
Sun Mon Tue Wed Thu Fri Sat
┌─────┬─────┬─────┬─────┬─────┬─────┬─────┐
wk 0 │ $0 │ $0 │ $0 │$500 │ $0 │ $0 │ $0 │
wk 1 │ $0 │$500 │ $0 │ $0 │ $0 │ $0 │ $0 │
wk 2 │ $0 │ $0 │ $0 │$500 │ $0 │ $0 │ $0 │
└─────┴─────┴─────┴─────┴─────┴─────┴─────┘
Bypass Filtering
Before populating the grid, weeklyBypassKeywords = ["social security", "vacp", "ssa treas"] are stripped from the pattern-matching input. These recurring government deposits have very distinctive patterns (1st/3rd of month) that would otherwise distort the DOW analysis. They are still counted in GrandTotalFull (the global denominator) so their volume isn’t lost — they just don’t influence the grid.
Z-Score Calculation
For each DOW column, DowZScores[week][day] = (amount - mean) / stddev is computed over the non-zero cells in that column. This surfaces anomalies — a week where a Friday paycheck was $10,000 vs. the usual $2,500 will have z ≈ +3 and will be excluded from downstream "valid week" counts.
Columns with ≤ 1 non-zero value are skipped (insufficient data for z-scores).
The z-score acceptance band is [dowZLow, dowZHigh] = [-1.5, +2.5]. Asymmetric on purpose: underpayments (partial weeks, delayed deposits) are more tolerated than overpayments (bonuses, tax refunds) because the latter signal non-payroll.
Global DOW vs Recent DOW
The grid is split into two overlapping views:
| View | Rows included |
|---|---|
Purpose |
|
All rows |
Long-term pay rhythm. Vol% denominator is |
|
Last |
For each DOW column, both views compute:
-
Count— number of valid weeks (non-zero amount AND z-score in[-1.5, +2.5]) -
VolPct— fraction of total volume that fell on this DOW -
Density—Count / NumWeeks(global) orCount / recentWindowWeeks(recent)
The dual view is the heart of V2’s "identity vs recent-change" reasoning in Phase 4.
Stage 5: Day-of-Month (DOM) Analysis
BuildDOMAnalysis constructs a months × 31 matrix (DomGrid) where each cell holds the transaction amount for that (month, day-of-month) pair.
mIdx = (txn.Year - startMonth.Year)*12 + (txn.Month - startMonth.Month)
dayIdx = txn.Day - 1 (0-based)
DomGrid[mIdx][dayIdx] = amount
day: 1 2 3 ··· 15 ··· 29 30 31
dayIdx: 0 1 2 14 28 29 30
┌─────┬────┬────┬ ··· ┬─────┬ ··· ┬────┬────┬────┐
mIdx=0 Jan │$2500│ │ │ │$2500│ │ │ │$100│
mIdx=1 Feb │$2450│ │ │ │$2450│ │ │$100│ │
mIdx=2 Mar │$2600│ │ │ │$2600│ │ │ │$100│
└─────┴────┴────┴ ··· ┴─────┴ ··· ┴────┴────┴────┘
Per-Day Stats
For each day column (1–31), the analysis computes:
| Stat | Meaning |
|---|---|
|
Number of distinct months that had a transaction on this day-of-month. |
|
Fraction of total DOM volume that fell on this specific day. |
|
Fraction of total DOM volume in a 3-day rolling window ending on this day. Handles "I got paid on the 14th this month because the 15th was a Sunday" — calendar anchors naturally smear ±1 day due to weekend shifts, so a rolling window is more robust than a point count. |
EOM and BOM Hit Counts
Two additional scalars are tracked:
-
EomScore— number of transactions withday >= LastDayOfMonth - 2(i.e. the last 3 calendar days of the month) -
BomScore— number of transactions withday ⇐ 3(i.e. the first 3 calendar days)
These are used by Phase 5 to classify the pattern as end-of-month vs beginning-of-month anchoring.
Stage 6: PredictPayday — The 5-Phase Pipeline
PredictPayday is the prediction state machine. It uses the two analyses to classify the user’s pay pattern and project the next payday date.
Phase 0: The Gatekeeper
if GrandTotal < $3,000 → return nil (no prediction)
A redundant guard (EvaluatePayday already checks this before calling PredictPayday), but kept here so PredictPayday is safe to call directly in tests.
Phase 1: Signal Harvesting (Anchors)
Scan the 31 DOM columns. A day is an anchor if:
-
Count >= domAnchorCount = 2(occurred in at least 2 distinct months) -
VolPct >= supplementalAnchorThreshold = 0.10(contributed ≥ 10% of total DOM volume)
Anchors are the calendar days that look like "scheduled pay dates" — typically the 1st, 15th, and/or last day of month.
Phase 2: Conflict Detection — "The Whale" vs "The Habit"
The analyzer must resolve a classic tension:
-
The Whale (Calendar Spike) — one or more calendar days concentrate ≥ 25% of total volume. Signals a salaried employee paid on fixed dates (1st, 15th, EOM).
-
The Habit (DOW Density) — one day-of-week shows up in most weeks. Signals a wage worker paid every Friday regardless of the calendar.
When both signals exist, the analyzer applies three tests to decide which to trust:
| Signal | Threshold | Meaning |
|---|---|---|
|
any DOM day with |
A single calendar day concentrates a quarter of all income. Strong calendar signal. |
|
any DOW with |
A single weekday concentrates > 60% of all volume, OR that weekday fires in > half of recent weeks. Either way, the weekly habit is too strong to overrule. |
|
any DOW with |
A weaker weekly signal — a weekday shows up in at least a third of weeks. Used as a tiebreaker. |
Resolution:
trustCalendar = !isHabitUnbreakable && hasCalendarSpike
If the habit is unbreakable, the calendar signal is ignored regardless of how strong it looks.
Phase 3: The Semi-Monthly Veto
A user paid on both the 1st and the 15th produces two DOM anchors AND a weekly rhythm (since the 1st-to-15th gap is 14 days, it often aligns with a single DOW). Without the veto, Phase 4 would see a weekly pattern and miss the true semi-monthly cadence.
isSemiMonthlyPattern = len(anchors) >= 2
OR (len(anchors) == 1 AND EomScore >= 2)
forceCalendarMode = switch {
isHabitUnbreakable: false // habit wins regardless
isSemiMonthlyPattern: !hasSomeHabit // force calendar unless
// there's a weekly habit
// overriding it
default:
hasSomeHabit ? false : trustCalendar
}
forceCalendarMode = true means skip Phase 4 entirely and go straight to Phase 5.
Phase 4: DOW Habit Waterfall (Weekly / Bi-Weekly)
Runs only if forceCalendarMode == false. Implemented in runDOWWaterfall + ResolveDOW.
Both Recent and Global DOW stats are ranked by VolPct descending. Then:
bestRecentDOW = argmax(RecentDOW.VolPct)
bestGlobalDOW = argmax(GlobalDOW.VolPct)
isIdentityMatch = (bestRecentDOW == bestGlobalDOW)
Path A — Recent history is strong:
if RecentDOW[bestRecent].VolPct >= 0.35:
candidates = isIdentityMatch ? [bestRecent] : all 7 recent-ranked DOWs
for each candidate DOW (Count >= 2):
return ResolveDOW(dowIdx, isIdentityMatch)
Path B — Global history is strong (fallback):
if GlobalDOW[bestGlobal].VolPct >= 0.35:
for each global-ranked DOW (Count >= 2):
return ResolveDOW(dowIdx, forceGlobalCadence=true)
The isIdentityMatch branch handles the "new job" case: if bestRecent != bestGlobal, the user has likely changed employers. The waterfall then iterates through all recent DOWs in rank order rather than just the top one — this tolerates a noisy "first few weeks of new job" signal.
ResolveDOW: Cadence + Projection
Given a target DOW index, determine whether the pattern is Weekly or Bi-Weekly:
densityScore = forceGlobalCadence ? max(Recent.Density, Global.Density)
: Recent.Density
cadence = (densityScore >= 0.60) ? WEEKLY (7 days)
: BIWEEKLY (14 days)
Then find the last valid transaction on that weekday (z-score in [-1.5, +2.5]) and project forward:
nextPayday = lastValidDate + cadenceDays
while nextPayday < EarliestDue: // today + 2 days
nextPayday += cadenceDays
The "roll forward" loop handles the case where the last valid txn was recent enough that adding 7 or 14 days still lands before EarliestDue — skip forward one cadence at a time until we’re past the buffer.
Phase 5: DOM Peak Fallback (Monthly / Semi-Monthly)
Runs only if Phase 4 returned nil OR forceCalendarMode == true. Implemented in RunDOMPeakFallback + ResolveDOM.
Gated by: EomScore >= 2 OR len(anchors) > 0. If neither is true, V2 returns nil (no prediction).
Every DOM day is scored:
peak.Score = DomStats.VolPct + DomStats.Roll3dPct
peak.IsAnchor = (day in anchors)
peak.Vol = DomStats.VolPct
Peaks are sorted with anchors prioritised, then by score descending. The top peak is the primary peak. A secondary peak is searched for:
for each peak (ranked, starting from #2):
if |peak.Day - primary.Day| >= peakSeparationDays = 10 AND
peak.Vol >= supplementalAnchorThreshold = 0.10:
secondary = peak.Day; break
The 10-day separation avoids a "phantom secondary" from a day adjacent to the primary peak (e.g. 14th and 15th should be treated as a single anchor, not two).
| Case | Result |
|---|---|
Secondary peak found |
Cadence = |
No secondary peak |
Cadence = |
ResolveDOM: Weekend & Holiday Adjustment
For each target day:
-
Construct the target date in the current month (
safeDateclamps day 31 to Feb 28/29, etc.). -
Apply weekend adjustment based on shift direction (see below).
-
If the adjusted date is before
EarliestDue, advance to next month and re-adjust.
The shift direction is determined per-day by isForwardShiftDay:
isForwardShiftDay(day):
govt_vol = sum(txn.amount for txn on this DOM matching bypass keywords)
total_vol = sum(txn.amount for txn on this DOM)
return (govt_vol / total_vol) > 0.50
The rationale: government deposits (Social Security, VA benefits) shift Sat→Mon and Sun→Mon (forward), while corporate payrolls shift Sat→Fri and Sun→Fri (backward). The shift direction is decided per-calendar-day because a user could have a mix of both (e.g. SSA on the 3rd + private payroll on the 15th).
AdjustWeekendBackward (corporate): AdjustWeekendForward (government):
Sat → Fri (-1 day) Sat → Mon (+2 days)
Sun → Fri (-2 days) Sun → Mon (+1 day)
| V2 does not adjust for federal bank holidays. The FM Legacy predictor does; V2 does not. This is a known gap — a payday that lands on a Monday holiday will currently be predicted as that Monday, not Tuesday. |
Output
On success, EvaluatePayday returns:
{
"payday": "2024-01-26",
"frequency": "Fri BIWEEKLY",
"payday_cadence": "BIWEEKLY",
"predictor": "FLOATME_V2_PAYDAY_CALCULATION"
}
frequency is human-readable; payday_cadence is the enum used by downstream code. The prediction is also captured:
fmdatacapture DynamoDB:
SK: FM_PAYDAY_PREDICTED_PAYDAY#{today}
Body: { "next_payday": NextPayday }
One record per user per day (same-day re-runs overwrite).
Cadence Summary
The four possible cadence outcomes and where they originate:
| Cadence | Phase | Typical User |
|---|---|---|
|
Phase 4 |
Hourly worker paid every Friday. DOW density ≥ 0.60 on one weekday. |
|
Phase 4 |
Salaried employee paid every other Friday. DOW vol% ≥ 0.35 on one weekday, density < 0.60. |
|
Phase 5 |
Paid on 1st and 15th (or 15th and EOM). Two DOM anchors separated by ≥ 10 days. |
|
Phase 5 |
Paid on a single fixed calendar day (e.g. the 15th, or EOM). |
|
(enum only) |
Never emitted by the current pipeline — V2 returns |
Known Limitations
| Limitation | Detail |
|---|---|
Federal bank holidays |
Not adjusted for. A predicted Monday that is a federal holiday will not shift to Tuesday. |
$5,000 amount cap |
High earners with single paychecks above $5K are excluded entirely from the analysis. |
93-day window |
Users with fewer than 93 days of transaction history at their primary account will have thinner grids and may fail the volume/count gates. |
Primary-account only |
Users whose payroll goes to a non-primary account receive no prediction. |
3-txn minimum |
Monthly-paid users with < 93 days of history may have fewer than 3 transactions and fail the count gate. |
EOM vs calendar-day-31 |
The EOM classification uses |
Debugging a Prediction
When a V2 prediction needs investigation, the two fmdatacapture DynamoDB records for the user on the prediction date provide a complete trail:
| SK prefix | Contents |
|---|---|
|
The full list of income candidates that made it through |
|
The final prediction (payday, cadence, frequency). Doesn’t capture the intermediate grids; for those, re-run locally with the filtered txns from the first record. |
The insight.payday.v2.comparison Datadog metric tags every prediction with is_equal:{bool}, fm_off_by:{days}, and pave_empty:{bool} for offline analysis of V2 vs Pave disagreement.
Related Pages
-
Forecasts & Payday — how V2 fits alongside Pave and FM Legacy, result-selection priority
-
Feature Flags —
insight.payday_prediction.v2.enabledandinsight.payday_prediction.v2.rollout -
DynamoDB Tables —
fmdatacapturerecord schemas used for data capture -
Architecture — Transactions Service dependency and the
apiLambda that hosts the V2 predictor