Plaid Mining Pipeline

Overview

Plaid mining is the process of fetching, transforming, and persisting financial data (transactions, accounts, balances, and liabilities) from Plaid into FloatMe’s DynamoDB tables. It runs automatically in two ways:

  • Webhook-driven — Plaid sends a webhook event when new transaction data is available for a connected item.

  • Login-driven — When a user logs in, the listener Lambda checks whether their data is stale and synthetically triggers a mine if so.

The pipeline is composed of five Lambda functions working in sequence: webhookminer → (optionally) throttle-minerrefinerfeeder.

Architecture

system context plaid mining

Full Pipeline Flow

Plaid or Auth0 event
        │
        ▼
┌───────────────────────┐
│  prod-txn-webhook     │  Verify request signature
│  (API Gateway)        │  Enqueue to SQS
└───────────┬───────────┘
            │
            ▼
┌───────────────────────┐
│  prod-txn-plaid-      │  SQS queue
│  webhooks (SQS)       │
└───────────┬───────────┘
            │
            ▼
┌───────────────────────┐
│  prod-txn-miner       │  Route by webhook type/code
│  (Lambda)             │  Fetch from Plaid
│                       │  Publish raw data to Kinesis
└───────────┬───────────┘
            │
            ├── if more pages ──► prod-txn-throttle (SQS, 30s delay)
            │                             │
            │                             ▼
            │                    prod-txn-throttle-miner
            │                    (fetch next page → Kinesis)
            │
            ▼
prod-txn-plaid-transactions (Kinesis)
            │
            ▼
┌───────────────────────┐
│  prod-txn-refiner     │  Convert raw Plaid data → FloatMe format
│  (Lambda)             │  Write accounts + transactions to DynamoDB
└───────────┬───────────┘
            │
            ▼
DynamoDB (prod-txn-transactions)
            │
            ▼ (DynamoDB Streams)
┌───────────────────────┐
│  prod-txn-feeder      │  Publish ACCOUNT + TRANSACTION changes
│  (Lambda)             │  to Kinesis for downstream services
└───────────┬───────────┘
            │
            ▼
prod-txn-floatme-transactions (Kinesis)

Webhook Types Handled

TRANSACTIONS Webhooks

Code Mine Window Description

DEFAULT_UPDATE

Last 14 days

New transactions are available. Mines the most recent 14 days.

HISTORICAL_UPDATE

Last 365 days

Plaid has backfilled up to a year of transaction history. Pages through all available batches.

INITIAL_UPDATE

Last 30 days

Initial webhook after a new item is connected. Mines the first 30 days.

TRANSACTIONS_REMOVED

N/A

Plaid has removed transactions (corrections, duplicates). Marks them as removed in DynamoDB.

ITEM Webhooks

Code Action

LOGIN_REPAIRED

Sends an item-repaired notification event via Segment/Iterable.

ERROR

Sends an item-errored notification event with the error code.

USER_PERMISSION_REVOKED

Removes the item from Plaid (revokes access token), removes from DynamoDB, records an admin note.

WEBHOOK_UPDATE_ACKNOWLEDGED

Logged only — no action taken.

AUTH Webhooks

Code Action

DEFAULT_UPDATE

If the main account had its auth data updated, calls the Payments Service to delete the stored bank account, forcing re-encryption on next fetch.

LIABILITIES Webhooks

Code Action

DEFAULT_UPDATE

Fetches updated credit, mortgage, and student loan liabilities from Plaid and stores them. Only processed when the ENABLE_LIABILITY_MINING feature flag is enabled.

IDENTITY_VERIFICATION Webhooks

Code Action

STATUS_UPDATED

Fetches the KYC session from Plaid. Creates or updates the identity verification record in DynamoDB. If status is failed, creates a rejected loan application in the LOC Service.

SCREENING Webhooks

Code Action

STATUS_UPDATED

Fetches the individual watchlist screening from Plaid. Updates the watchlist status on the existing identity verification record.

Miner Webhook Routing

Webhook arrives from SQS
        │
        ├─ Type = IDENTITY_VERIFICATION
        │    └─ Code = STATUS_UPDATED → fetch KYC → create/update IDV record
        │         └─ status = failed → create rejected loan application in LOC
        │
        ├─ Type = SCREENING
        │    └─ Code = STATUS_UPDATED → fetch watchlist → update IDV watchlist status
        │
        ├─ (all other types)
        │    │
        │    ▼
        │  Look up item in DynamoDB
        │    │
        │    ├─ Item not found → skip (log warning)
        │    ├─ Item is REMOVED + source = FROM_LISTENER → skip (prevent synthetic webhook on removed item)
        │    └─ Item found → refresh item status from Plaid
        │         │
        │         ├─ Type = TRANSACTIONS
        │         │    ├─ DEFAULT_UPDATE → mine(now-14d, now)
        │         │    ├─ HISTORICAL_UPDATE → mine(now-365d, now)
        │         │    ├─ INITIAL_UPDATE → mine(now-30d, now)
        │         │    └─ TRANSACTIONS_REMOVED → send removed IDs to Kinesis
        │         │
        │         ├─ Type = ITEM
        │         │    ├─ LOGIN_REPAIRED → send repaired notification
        │         │    ├─ USER_PERMISSION_REVOKED → remove from Plaid + DynamoDB
        │         │    ├─ ERROR → send error notification
        │         │    └─ WEBHOOK_UPDATE_ACKNOWLEDGED → log only
        │         │
        │         ├─ Type = AUTH
        │         │    └─ DEFAULT_UPDATE → if main account changed, delete from Payments Service
        │         │
        │         └─ Type = LIABILITIES (if enabled)
        │              └─ DEFAULT_UPDATE → fetch + store liabilities

Throttling and Pagination

Plaid returns transactions in batches of up to 500. When more pages exist, the miner enqueues a MineRequest onto the prod-txn-throttle SQS queue (which has a 30-second delivery delay).

mine(startDate, endDate)
        │
        ▼
GET /transactions from Plaid (offset=0, count=500)
        │
        ├─ total_transactions <= 500 → done, publish to Kinesis
        │
        └─ total_transactions > 500
             │
             ▼
        Publish current batch to Kinesis
             │
             ▼
        Enqueue next MineRequest
        (offset += 500, same date range)
        onto prod-txn-throttle (30s delay)
             │
             ▼
        prod-txn-throttle-miner Lambda
        (fetches next page → Kinesis → repeat if more pages remain)

The 30-second delay prevents hitting Plaid’s rate limits when a large historical backfill is in progress.

Login-Driven Mining (Listener)

When a user logs in, the listener Lambda determines whether their transaction data is fresh enough. If not, it generates a synthetic webhook and sends it directly to the prod-txn-plaid-webhooks SQS queue to trigger the miner.

Auth0 Log Stream event arrives (via EventBridge → SQS)
        │
        ▼
Event type = gd_auth_succeed, sepft, or sertft?
        │ No → skip
        │ Yes
        ▼
Look up most recent item for user
        │
        ├─ No item found → skip (user not yet linked)
        │
        └─ Item found
             │
             ▼
        Query webhook history for item
             │
             ├─ No webhooks found
             │    └─► Send HISTORICAL_UPDATE to miner SQS (first sync)
             │
             ├─ HISTORICAL_UPDATE found + DEFAULT_UPDATE within last 24h
             │    └─► Skip (data is fresh)
             │
             ├─ HISTORICAL_UPDATE found + no recent DEFAULT_UPDATE
             │    └─► Send DEFAULT_UPDATE to miner SQS (refresh last 14 days)
             │
             └─ No HISTORICAL_UPDATE found
                  └─► Send HISTORICAL_UPDATE to miner SQS (backfill needed)

The synthetic webhook includes source: "FROM_LISTENER" so the miner can skip processing if the item has since been removed.

Refiner: Raw → FloatMe Format

The refiner Lambda consumes the prod-txn-plaid-transactions Kinesis stream and converts raw Plaid data into FloatMe’s internal format.

Kinesis record received (prod-txn-plaid-transactions)
        │
        ▼
Parse Kinesis wrapper (user_id, item_id, accounts, transactions)
        │
        ├─ wrapper.Removed is set (TRANSACTIONS_REMOVED)
        │    │
        │    └─ For each removed txn_id:
        │         Create Remove record in DynamoDB
        │         Mark existing Transaction record as removed
        │
        └─ Regular update (accounts + transactions)
             │
             ▼
        Process accounts:
          For each account from Plaid:
            Write Account record to DynamoDB
            Write AccountBalance snapshot to DynamoDB (with TTL)
             │
             ▼
        Process transactions:
          For each transaction from Plaid:
            ├─ Already marked removed (out-of-order) → write Remove record
            └─ New/valid → batch write to Transaction table
             │
             ▼
        If last page (wrapper.LastPage = true) AND transactions > 0:
          Emit user_new_txns_batch_completed event to EventBridge

See Event-Driven Flows for full details on the feeder and Kinesis output.

Error Handling

Component Behavior

prod-txn-plaid-webhooks SQS

Visibility timeout: 900s. Max receive count: 5. Messages that exhaust retries move to a DLQ.

prod-txn-throttle SQS

Visibility timeout: 900s. Delivery delay: 30s. Max receive count: 5. DLQ on exhaustion.

Miner Lambda

On batch processing error, individual SQS message failures are returned as batch item failures, causing only the failed record to be retried.

Throttle-miner Lambda

Same batch item failure behavior as miner.

Refiner Lambda

Processes Kinesis records with parallelization factor 2. On error, uses bisect-batch-on-function-error to isolate the failing record.