Plaid Mining Pipeline
Overview
Plaid mining is the process of fetching, transforming, and persisting financial data (transactions, accounts, balances, and liabilities) from Plaid into FloatMe’s DynamoDB tables. It runs automatically in two ways:
-
Webhook-driven — Plaid sends a webhook event when new transaction data is available for a connected item.
-
Login-driven — When a user logs in, the listener Lambda checks whether their data is stale and synthetically triggers a mine if so.
The pipeline is composed of five Lambda functions working in sequence: webhook → miner → (optionally) throttle-miner → refiner → feeder.
Full Pipeline Flow
Plaid or Auth0 event
│
▼
┌───────────────────────┐
│ prod-txn-webhook │ Verify request signature
│ (API Gateway) │ Enqueue to SQS
└───────────┬───────────┘
│
▼
┌───────────────────────┐
│ prod-txn-plaid- │ SQS queue
│ webhooks (SQS) │
└───────────┬───────────┘
│
▼
┌───────────────────────┐
│ prod-txn-miner │ Route by webhook type/code
│ (Lambda) │ Fetch from Plaid
│ │ Publish raw data to Kinesis
└───────────┬───────────┘
│
├── if more pages ──► prod-txn-throttle (SQS, 30s delay)
│ │
│ ▼
│ prod-txn-throttle-miner
│ (fetch next page → Kinesis)
│
▼
prod-txn-plaid-transactions (Kinesis)
│
▼
┌───────────────────────┐
│ prod-txn-refiner │ Convert raw Plaid data → FloatMe format
│ (Lambda) │ Write accounts + transactions to DynamoDB
└───────────┬───────────┘
│
▼
DynamoDB (prod-txn-transactions)
│
▼ (DynamoDB Streams)
┌───────────────────────┐
│ prod-txn-feeder │ Publish ACCOUNT + TRANSACTION changes
│ (Lambda) │ to Kinesis for downstream services
└───────────┬───────────┘
│
▼
prod-txn-floatme-transactions (Kinesis)
Webhook Types Handled
TRANSACTIONS Webhooks
| Code | Mine Window | Description |
|---|---|---|
|
Last 14 days |
New transactions are available. Mines the most recent 14 days. |
|
Last 365 days |
Plaid has backfilled up to a year of transaction history. Pages through all available batches. |
|
Last 30 days |
Initial webhook after a new item is connected. Mines the first 30 days. |
|
N/A |
Plaid has removed transactions (corrections, duplicates). Marks them as removed in DynamoDB. |
ITEM Webhooks
| Code | Action |
|---|---|
|
Sends an item-repaired notification event via Segment/Iterable. |
|
Sends an item-errored notification event with the error code. |
|
Removes the item from Plaid (revokes access token), removes from DynamoDB, records an admin note. |
|
Logged only — no action taken. |
AUTH Webhooks
| Code | Action |
|---|---|
|
If the main account had its auth data updated, calls the Payments Service to delete the stored bank account, forcing re-encryption on next fetch. |
LIABILITIES Webhooks
| Code | Action |
|---|---|
|
Fetches updated credit, mortgage, and student loan liabilities from Plaid and stores them. Only processed when the |
Miner Webhook Routing
Webhook arrives from SQS
│
├─ Type = IDENTITY_VERIFICATION
│ └─ Code = STATUS_UPDATED → fetch KYC → create/update IDV record
│ └─ status = failed → create rejected loan application in LOC
│
├─ Type = SCREENING
│ └─ Code = STATUS_UPDATED → fetch watchlist → update IDV watchlist status
│
├─ (all other types)
│ │
│ ▼
│ Look up item in DynamoDB
│ │
│ ├─ Item not found → skip (log warning)
│ ├─ Item is REMOVED + source = FROM_LISTENER → skip (prevent synthetic webhook on removed item)
│ └─ Item found → refresh item status from Plaid
│ │
│ ├─ Type = TRANSACTIONS
│ │ ├─ DEFAULT_UPDATE → mine(now-14d, now)
│ │ ├─ HISTORICAL_UPDATE → mine(now-365d, now)
│ │ ├─ INITIAL_UPDATE → mine(now-30d, now)
│ │ └─ TRANSACTIONS_REMOVED → send removed IDs to Kinesis
│ │
│ ├─ Type = ITEM
│ │ ├─ LOGIN_REPAIRED → send repaired notification
│ │ ├─ USER_PERMISSION_REVOKED → remove from Plaid + DynamoDB
│ │ ├─ ERROR → send error notification
│ │ └─ WEBHOOK_UPDATE_ACKNOWLEDGED → log only
│ │
│ ├─ Type = AUTH
│ │ └─ DEFAULT_UPDATE → if main account changed, delete from Payments Service
│ │
│ └─ Type = LIABILITIES (if enabled)
│ └─ DEFAULT_UPDATE → fetch + store liabilities
Throttling and Pagination
Plaid returns transactions in batches of up to 500. When more pages exist, the miner enqueues a MineRequest onto the prod-txn-throttle SQS queue (which has a 30-second delivery delay).
mine(startDate, endDate)
│
▼
GET /transactions from Plaid (offset=0, count=500)
│
├─ total_transactions <= 500 → done, publish to Kinesis
│
└─ total_transactions > 500
│
▼
Publish current batch to Kinesis
│
▼
Enqueue next MineRequest
(offset += 500, same date range)
onto prod-txn-throttle (30s delay)
│
▼
prod-txn-throttle-miner Lambda
(fetches next page → Kinesis → repeat if more pages remain)
The 30-second delay prevents hitting Plaid’s rate limits when a large historical backfill is in progress.
Login-Driven Mining (Listener)
When a user logs in, the listener Lambda determines whether their transaction data is fresh enough. If not, it generates a synthetic webhook and sends it directly to the prod-txn-plaid-webhooks SQS queue to trigger the miner.
Auth0 Log Stream event arrives (via EventBridge → SQS)
│
▼
Event type = gd_auth_succeed, sepft, or sertft?
│ No → skip
│ Yes
▼
Look up most recent item for user
│
├─ No item found → skip (user not yet linked)
│
└─ Item found
│
▼
Query webhook history for item
│
├─ No webhooks found
│ └─► Send HISTORICAL_UPDATE to miner SQS (first sync)
│
├─ HISTORICAL_UPDATE found + DEFAULT_UPDATE within last 24h
│ └─► Skip (data is fresh)
│
├─ HISTORICAL_UPDATE found + no recent DEFAULT_UPDATE
│ └─► Send DEFAULT_UPDATE to miner SQS (refresh last 14 days)
│
└─ No HISTORICAL_UPDATE found
└─► Send HISTORICAL_UPDATE to miner SQS (backfill needed)
The synthetic webhook includes source: "FROM_LISTENER" so the miner can skip processing if the item has since been removed.
Refiner: Raw → FloatMe Format
The refiner Lambda consumes the prod-txn-plaid-transactions Kinesis stream and converts raw Plaid data into FloatMe’s internal format.
Kinesis record received (prod-txn-plaid-transactions)
│
▼
Parse Kinesis wrapper (user_id, item_id, accounts, transactions)
│
├─ wrapper.Removed is set (TRANSACTIONS_REMOVED)
│ │
│ └─ For each removed txn_id:
│ Create Remove record in DynamoDB
│ Mark existing Transaction record as removed
│
└─ Regular update (accounts + transactions)
│
▼
Process accounts:
For each account from Plaid:
Write Account record to DynamoDB
Write AccountBalance snapshot to DynamoDB (with TTL)
│
▼
Process transactions:
For each transaction from Plaid:
├─ Already marked removed (out-of-order) → write Remove record
└─ New/valid → batch write to Transaction table
│
▼
If last page (wrapper.LastPage = true) AND transactions > 0:
Emit user_new_txns_batch_completed event to EventBridge
See Event-Driven Flows for full details on the feeder and Kinesis output.
Error Handling
| Component | Behavior |
|---|---|
|
Visibility timeout: 900s. Max receive count: 5. Messages that exhaust retries move to a DLQ. |
|
Visibility timeout: 900s. Delivery delay: 30s. Max receive count: 5. DLQ on exhaustion. |
Miner Lambda |
On batch processing error, individual SQS message failures are returned as batch item failures, causing only the failed record to be retried. |
Throttle-miner Lambda |
Same batch item failure behavior as miner. |
Refiner Lambda |
Processes Kinesis records with parallelization factor 2. On error, uses bisect-batch-on-function-error to isolate the failing record. |
Related Pages
-
Architecture — System context and component overview
-
Item Lifecycle — How item add/remove interacts with mining
-
Event-Driven Flows — Listener, refiner, and feeder details
-
KYC (Identity Verification) — Identity verification webhook handling
-
Plaid: Items — Item data model
-
Plaid: Transactions — Transaction data model
-
Plaid: Webhooks — Webhook storage schema