Pave Mining
Overview
The Pave mining pipeline uploads a user’s Plaid transactions into the Pave third-party analytics platform and retrieves the financial insights Pave computes from them. The pipeline is split across four Lambdas:
| Lambda | Trigger | Role |
|---|---|---|
|
Kinesis ( |
Uploads transactions and balances to Pave; tracks per-user pagination state |
|
API Gateway (Pave webhook, HMAC-verified) |
Receives Pave callbacks; enqueues mining jobs on |
|
SQS ( |
Fetches processed insights from Pave; writes to DynamoDB; emits |
|
SQS ( |
Manual replay — refetches Pave data with custom parameters; no downstream events |
The feeder and miner are intentionally decoupled. The feeder pushes data to Pave asynchronously; Pave does its own processing and signals completion via a webhook callback. The webhook Lambda bridges these two phases by enqueuing a mining job.
Feeder
Lambda: prod-insight-feeder
Trigger: Kinesis stream prod-txn-plaid-transactions (LATEST position, batched)
The feeder runs on every Plaid transaction event from the Transactions Service. Its job is to upload the user’s current transactions and balances to Pave and record that the upload happened.
Pagination State
Pave returns insights in pages. The feeder tracks two DynamoDB entities per user:
| Entity | Purpose |
|---|---|
|
Marks whether the user’s Plaid data has ever been labeled in Pave. Written once on first upload; used to avoid re-labeling. |
|
Records the latest pagination request ID for a user. The miner checks this entity to decide whether all pages for a given insight run have been processed before emitting the |
Upload Logic
For each Kinesis record, the feeder:
-
Extracts the user ID and transaction batch from the event.
-
Checks and sets the
pave_labelentity if this is the user’s first upload. -
Uploads the transaction batch and account balances to the Pave API in paginated requests.
-
Saves the
last_pageentity with the latest request ID and page metadata.
GrowthBook feature flags control the transaction batch size and which Pave upload method is used (configurable per-environment).
Webhook
Lambda: prod-insight-webhook
Trigger: API Gateway (HMAC signature-verified endpoint)
The webhook Lambda receives all callbacks Pave sends to the FloatMe webhook URL. Each request is validated via HMAC signature before any processing occurs; requests with invalid signatures are rejected with a 401.
Supported Event Types
| Pave Event Type | Behaviour |
|---|---|
|
Insights are ready for this user. Enqueues a mining job (containing |
|
Acknowledged and logged. No downstream action taken. |
|
Acknowledged and logged. No downstream action taken. |
Any other type |
Logged as a warning. No downstream action taken. |
Non-200 status fields in the Pave webhook payload (indicating Pave-side processing errors) are also logged as warnings without enqueuing a mining job.
Miner SQS Message Shape
The SQS message body enqueued for each USER_DATA_INSIGHTS_READY event:
{
"user_id": "<floatme_user_id>",
"request_id": "<pave_request_id_from_webhook_metadata>"
}
The request_id is carried through from the Pave webhook metadata and used by the miner to correlate with the last_page entity in DynamoDB when deciding whether to emit the user_new_insights_available event.
Miner
Lambda: prod-insight-miner
Trigger: SQS (prod-insight-miner, batch processing)
The miner dequeues mining jobs and fetches the full set of processed insights from Pave for the given user. It writes all entity types to DynamoDB, then conditionally emits an EventBridge event to notify downstream services that fresh insights are available.
What Is Fetched and Saved
For each SQS message the miner:
-
Gets unified insights from Pave — returns recurring expenses, ritual expenses, and income sources in a single call.
-
Writes to DynamoDB:
-
recurring— recurring expense records (merchant, due date, amount, bill type fixed/variable, transactions) -
ritual— ritual expense records (normalized merchant, frequency, average amount, transactions) -
income— income source records (recurring income sources, total recurring income)
-
-
Gets cash-advance score from Pave — repayment probability scores across 15d, 30d, and 45d intervals.
-
Writes
scoresto DynamoDB. -
Saves bank account balance snapshot (
balanceentity) by querying the Transactions Service for the user’s current checking and savings accounts.
user_new_insights_available Emission
The miner only emits the user_new_insights_available EventBridge event when it determines it has processed the last page of a user’s insight run. This avoids notifying downstream services (e.g., the Float Service) multiple times per insight cycle when Pave returns results across multiple pages.
The check is done by reading the last_page DynamoDB entity for the {user_id, request_id} pair. If is_last_page = true, the event is emitted.
| EventBridge Field | Value |
|---|---|
Bus |
|
Source |
|
Detail-Type |
|
Payload |
|
Failures to emit the event are logged as errors but do not cause the SQS record to be retried — this prevents insight data from being rewritten unnecessarily when only the event emission failed.
Error Handling
The miner processes SQS messages in batch mode and returns per-message failure responses (SQSBatchItemFailure). A message is added to the failure list if any of the following occur: SQS message body cannot be unmarshalled/parsed, unified insight fetch fails, or cash-advance score fetch fails. The balance snapshot and event emission are best-effort — failures are logged but do not add to the batch failure list.
Failed messages are retried up to the queue’s configured maximum receive count before being routed to the dead-letter queue (prod-insight-miner-dlq).
Replay Feeder
Lambda: prod-insight-replay-feeder
Trigger: SQS (prod-insight-replay-feeder)
The replay feeder is a manual operations tool. It accepts SQS messages that specify a user ID plus optional date or parameter overrides, then calls the Pave API to refetch data using those parameters and updates the DynamoDB last_page state accordingly.
The replay feeder does not enqueue to the miner SQS and does not emit any EventBridge events. It is used to re-synchronise pagination state or trigger a fresh Pave data pull without waiting for new Plaid transaction events.
Replay SQS message
└─▶ prod-insight-replay-feeder
├─▶ Pave API (refetch with custom params)
└─▶ DynamoDB (update last_page state)
Related Pages
-
Architecture — System context and full Lambda inventory
-
Event Flows —
user_new_insights_availableevent routing and downstream consumers -
DynamoDB Tables —
recurring,ritual,income,scores,pave_label,last_pageentity schemas